Experiments with the semantic similarity measure between sentences for the LexRank Text Summarization System

My dear friends,

sorry for a long absense 🙂

Today I want to write about one of my previous experimental projects in the field of the automatic extractive text summarization.

Text summarization is the process of automatically creating a compressed version of a given text that provides useful information for the user. I want to focus on the generic multi-document text summarization, where the goal is to produce a summary of the many documents on the same unspecific topic, chosing a subset of the most relevant sentences. For example, having a set of news articles on the same topic, our system creates a short summarization of the most relative information from these articles.

I re-implemented an existing LexRank approach (graph-based lexical centrality as salience) and replaced the cosine similarity measure with a combination of features from ECNU [3], a new system for semantic similarity between sentences. This similarity approach is the ensemble of 3 machine learning algorithms and 4 deep learning models by averaging these 7 scores (EN-seven) and is one of the best approaches for calculating the semantic similarity (2016-2017).

Continue reading “Experiments with the semantic similarity measure between sentences for the LexRank Text Summarization System”


AI describing images: Natural Language Generation (NLG) using textual attributes

Data-to-text System using encoder-decoder architecture with attention, BiRNN and LSTMs

“Speak English!” said the Eaglet. “I don’t know the meaning of half those long words, and, what’s more, I don’t believe you do either!” — “Alice in Wonderland”, Chapter 3

Dear friends,

let’s teach computers to speak 😉

Today you will read about Natural Language Generation AI that can describe images given some textual attributes.

Keywords: natural language generation (NLG), data-to-text, natural language processing (NLP), image description, encoder-decoder architecture, sequence-to-sequence architecture, biderectional recurrent neural networks (BiRNNs), long short-term memory neural networks (LSTMs), Attention mechanism, neural word embeddings, Machine Learning, Deep Learning, structured data

Why do we need such a system?

  • for data-to-text systems which generate textual summaries of databases and data sets. Research has shown that textual summaries can be more effective than graphs and other visuals for decision support
  • produce weather forecasts from weather data
  • summarise financial and business data
  • commercially in automated journalism
  • for chatbots, question-answering (QA) systems
  • generate product descriptions for e-commerce sites
  • summarise medical records
  • enhance accessibility (for example by describing graphs and data sets to blind people)
  • assist human writers and make writing process more efficient and effective
  • as basis for video descriptions, for example a very interesting task of generating the descriptions during the World Cup 2022, tracking the players and their actions;
  • for creative language generation (jokes generation);
  • for expert systems (generate expert answers using structured/unstructured information)
  • automatically generate product reviews

and many other useful things 🙂 Continue reading “AI describing images: Natural Language Generation (NLG) using textual attributes”


Again with you – a small overview of topics/projects

What I did in the last months and what we are going to talk about in future

Dear-dear Friends,

After quite a long pause I am again with you with an ocean of very important and  interesting information 🙂

This year was for me a year of inspiration, Continue reading “Again with you – a small overview of topics/projects”


Prepare your data. Part 1: Pre-processing

Pre-process unstructured Data

Hello, my dear friends. In the last article we had an overview of some interesting datasets for Natural Language Processing and Machine Learning. Let’s learn, how to work with them!


Data preparation is the A and B for every data scientist. Despite having good data, you cannot access the information in it, unless it is processed. Only about 20% of information today is available in structured form. Majority of data is presented in text form, which is highly unstructured in nature.

In order to produce actionable insights from data, you have to prepare it. In this article we’ll learn, how to pre-process text information. Continue reading “Prepare your data. Part 1: Pre-processing”


Find Data for AI

How to find Good Data for AI projects

Dear AI friends,

all you need is … DATA! And love, of course 🙂

Yes, I don’t joke. The fuel for any AI System is Data. No matter how clever the technologies are, they depend on data. More importantly, they depend on “good” data. If you have good Data, you have already solved 50% of your problem. Any AI System is “data-hungry” and can only be as smart as the information you provide it with.

So, before we start with clever ML algorithms, let’s ensure we know, how to find Data for your AI 😉 Continue reading “Find Data for AI”


Let’s create a chatbot

Creating a bot in Python with Telegram (an echo-bot, an else-if bot and a bot with Levenshtein Distance)

Hello, my dear friends 🙂

Today, we are going to create our own communication bot.


There are two reasons for it.

  • Reason Number One: Today it’s a TREND and a MUST

First of all, bots become a real craze of the modern world. It’s about humans talking directly to machines. It’s about science fiction. It’s about future. Continue reading “Let’s create a chatbot”


AI Blog

About me

I’m Tatjana Chernenko, a writer and computer scientist living in Germany.

I am working as a Software Developer at SAP SE in Germany, was studying  Computational Linguistics with focus on Machine Learning and NLP at the University of Heidelberg, was living in four different countries, speak five languages, have 14 years of job experience (four years in software development and research, ten years in business area, working in big companies and having my own company). My technical interests focus on science, Artificial Intelligence, Machine Learning and Natural Language Processing. I am a  supporter of open data, women in technology and Artificial Intelligence in real life and research.

Every day I gain a lot of interesting experience and valuable knowledge. I found my passion in the world of science and AI. Every single step in a journey as a computer scientist is a kind of magic. I decided to start this blog to share my knowledge with you.

Continue reading “AI Blog”

COVID-19 and Big Data

Global crises show the importance of the government-run Big Data platforms.

For every country it is extremely important to make data easily accessible and available to make effective decisions and take initiative in difficult situations.

The ease of availability of data allows AI-based regulation, technology-based logistics and immediate response based on real-time data and ML predictions.

Positive example: South Korea has an advanced digital platform for big-data mining and AI-based warning systems, which has already allowed the country to bring the coronavirus situation under the control. Push mobile systems immediately provide every person witht with an infected person. AI Data analysis informs government about possible clusters of the virus. Medical services are mobilizing their initiatives in the areas with most risk. Supply and distribution of masks and other items are also regulated by AI.

Result: the number of coronavirus patients in South Korea is coming down.

To support the urgent research on the Coronavirus, Kaggle platform provides a COVID-19 studies dataset for all AI experts, researchers and volunteers.