Featured

Experiments with the semantic similarity measure between sentences for the LexRank Text Summarization System

My dear friends,

sorry for a long absense 🙂

Today I want to write about one of my previous experimental projects in the field of the automatic extractive text summarization.

Text summarization is the process of automatically creating a compressed version of a given text that provides useful information for the user. I want to focus on the generic multi-document text summarization, where the goal is to produce a summary of the many documents on the same unspecific topic, chosing a subset of the most relevant sentences. For example, having a set of news articles on the same topic, our system creates a short summarization of the most relative information from these articles.

I re-implemented an existing LexRank approach (graph-based lexical centrality as salience) and replaced the cosine similarity measure with a combination of features from ECNU [3], a new system for semantic similarity between sentences. This similarity approach is the ensemble of 3 machine learning algorithms and 4 deep learning models by averaging these 7 scores (EN-seven) and is one of the best approaches for calculating the semantic similarity (2016-2017).

Continue reading “Experiments with the semantic similarity measure between sentences for the LexRank Text Summarization System”

Sponsored Post Learn from the experts: Create a successful blog with our brand new courseThe WordPress.com Blog

Are you new to blogging, and do you want step-by-step guidance on how to publish and grow your blog? Learn more about our new Blogging for Beginners course and get 50% off through December 10th.

WordPress.com is excited to announce our newest offering: a course just for beginning bloggers where you’ll learn everything you need to know about blogging from the most trusted experts in the industry. We have helped millions of blogs get up and running, we know what works, and we want you to to know everything we know. This course provides all the fundamental skills and inspiration you need to get your blog started, an interactive community forum, and content updated annually.

Featured

AI describing images: Natural Language Generation (NLG) using textual attributes

Data-to-text System using encoder-decoder architecture with attention, BiRNN and LSTMs

“Speak English!” said the Eaglet. “I don’t know the meaning of half those long words, and, what’s more, I don’t believe you do either!” — “Alice in Wonderland”, Chapter 3

Dear friends,

let’s teach computers to speak 😉

Today you will read about Natural Language Generation AI that can describe images given some textual attributes.

Keywords: natural language generation (NLG), data-to-text, natural language processing (NLP), image description, encoder-decoder architecture, sequence-to-sequence architecture, biderectional recurrent neural networks (BiRNNs), long short-term memory neural networks (LSTMs), Attention mechanism, neural word embeddings, Machine Learning, Deep Learning, structured data

Why do we need such a system?

  • for data-to-text systems which generate textual summaries of databases and data sets. Research has shown that textual summaries can be more effective than graphs and other visuals for decision support
  • produce weather forecasts from weather data
  • summarise financial and business data
  • commercially in automated journalism
  • for chatbots, question-answering (QA) systems
  • generate product descriptions for e-commerce sites
  • summarise medical records
  • enhance accessibility (for example by describing graphs and data sets to blind people)
  • assist human writers and make writing process more efficient and effective
  • as basis for video descriptions, for example a very interesting task of generating the descriptions during the World Cup 2022, tracking the players and their actions;
  • for creative language generation (jokes generation);
  • for expert systems (generate expert answers using structured/unstructured information)
  • automatically generate product reviews

and many other useful things 🙂 Continue reading “AI describing images: Natural Language Generation (NLG) using textual attributes”

Featured

Again with you – a small overview of topics/projects

What I did in the last months and what we are going to talk about in future

Dear-dear Friends,

After quite a long pause I am again with you with an ocean of very important and  interesting information 🙂

This year was for me a year of inspiration, Continue reading “Again with you – a small overview of topics/projects”

Featured

Prepare your data. Part 1: Pre-processing

Pre-process unstructured Data

Hello, my dear friends. In the last article we had an overview of some interesting datasets for Natural Language Processing and Machine Learning. Let’s learn, how to work with them!

WHY?

Data preparation is the A and B for every data scientist. Despite having good data, you cannot access the information in it, unless it is processed. Only about 20% of information today is available in structured form. Majority of data is presented in text form, which is highly unstructured in nature.

In order to produce actionable insights from data, you have to prepare it. In this article we’ll learn, how to pre-process text information. Continue reading “Prepare your data. Part 1: Pre-processing”

Featured

Find Data for AI

How to find Good Data for AI projects

Dear AI friends,

all you need is … DATA! And love, of course 🙂

Yes, I don’t joke. The fuel for any AI System is Data. No matter how clever the technologies are, they depend on data. More importantly, they depend on “good” data. If you have good Data, you have already solved 50% of your problem. Any AI System is “data-hungry” and can only be as smart as the information you provide it with.

So, before we start with clever ML algorithms, let’s ensure we know, how to find Data for your AI 😉 Continue reading “Find Data for AI”

Featured

Let’s create a chatbot

Creating a bot in Python with Telegram (an echo-bot, an else-if bot and a bot with Levenshtein Distance)

Hello, my dear friends 🙂

Today, we are going to create our own communication bot.

Why?

There are two reasons for it.

  • Reason Number One: Today it’s a TREND and a MUST

First of all, bots become a real craze of the modern world. It’s about humans talking directly to machines. It’s about science fiction. It’s about future. Continue reading “Let’s create a chatbot”

Featured

AI Blog

About me

I’m Tatjana Chernenko, a writer and computer scientist living in Germany.

I am working as a Software Developer at SAP SE in Germany, was studying  Computational Linguistics with focus on Machine Learning and NLP at the University of Heidelberg, was living in four different countries, speak five languages, have 14 years of job experience (four years in software development and research, ten years in business area, working in big companies and having my own company). My technical interests focus on science, Artificial Intelligence, Machine Learning and Natural Language Processing. I am a  supporter of open data, women in technology and Artificial Intelligence in real life and research.

Every day I gain a lot of interesting experience and valuable knowledge. I found my passion in the world of science and AI. Every single step in a journey as a computer scientist is a kind of magic. I decided to start this blog to share my knowledge with you.

Continue reading “AI Blog”

Journals & Conference Proceedings for Computational Linguistics

Conference proceedings

If you want to be on the cutting edge, first of all you have to look at the conference proceedings, because not all NLP papers are published in journals or it takes time to publish them.

Main conferences in the field of Natural Language Processing:

ACL: Association for Computational Linguistics
EMNLP: Empirical Methods in Natural Language Processing
NAACL: North American Chapter of the Association for Computational Linguistics
EACL: European Chapter of the Association for Computational Linguistics
COLING: International Conference on Computational Linguistics
CoNLL: Conference on Natural Language Learning

Based on your personal focus, it makes sense also to have look at the conferences proceedings in the area of your special interest (Information Retrieval, Data Mining, Machine Translation, AI, etc.), like SIGIR, AAAI, etc.

Journals

Journals are another good source of knowledge. Journals’ standards are still high and articles can cover more than conference papers.

Journal of Computational Linguistics – CL Quarterly (March, June, September, December). It is the primary archival forum for research on Computational Linguistics and Natural Language Processing. The journal is sponsored by the Association for Computational Linguistics, published by MIT; Open Access.

Transactions of the Association for Computational Linguistics – is an ACL-sponsored journal published by MIT Press that publishes papers in all areas of computational linguistics and natural language processing. TACL publishes conference-length papers, but has a journal-style reviewing process.

International Journal of Computational Linguistics (IJCL) – a peer reviewed open access bi-monthly journal providing a scientific forum where computer scientists, experts in artificial intelligence, mathematicians, logicians, cognitive scientists, cognitive psychologists, psycholinguists, anthropologists and neuroscientists can present theoretical research and experimental studies.

More

Journal of Memory and Language

Journal of Information Retrieval

Journal of Machine Learning

SIGIR: Special Interest Group on Information Retrieval

AAAI: Association for the Advancement of Artificial Intelligence

ICML: International Conference on Machine Learning

ICDM: International Conference on Data Mining

Resources you have to skip / ignore

If you are looking for the reviewed sources of truth, the following resources you should better skip:

  • Workshops and papers coming from workshops on application areas. They are mainly for people who work in similar subfields. Workshops on NLP subfields are acceptable.
  • Invited talks, keynotes. They are mostly not reviewed.
  • Conferences reviewed by abstract only.
  • Conferences & journals outside the field – the worst NLP papers are there, it is definitely the source of misleading information.
  • Academic papers about deployable systems (unless coming from focused on NLP research teams like in Google, IBM and Microsoft). They are mostly outside NLP skill sets.

COVID-19 and Big Data

Global crises show the importance of the government-run Big Data platforms.

For every country it is extremely important to make data easily accessible and available to make effective decisions and take initiative in difficult situations.

The ease of availability of data allows AI-based regulation, technology-based logistics and immediate response based on real-time data and ML predictions.

Positive example: South Korea has an advanced digital platform for big-data mining and AI-based warning systems, which has already allowed the country to bring the coronavirus situation under the control. Push mobile systems immediately provide every person witht with an infected person. AI Data analysis informs government about possible clusters of the virus. Medical services are mobilizing their initiatives in the areas with most risk. Supply and distribution of masks and other items are also regulated by AI.

Result: the number of coronavirus patients in South Korea is coming down.

To support the urgent research on the Coronavirus, Kaggle platform provides a COVID-19 studies dataset for all AI experts, researchers and volunteers.