How to baseline in NLP and where to go from there
04-17, 15:10–15:40 (Europe/Berlin), A03-A04

In this talk, we will explore the build-measure-learn paradigm and the role of baselines in natural language processing (NLP). We will cover the common NLP tasks of classification, clustering, search, and named entity recognition, and describe the baseline approaches that can be used for each task. We will also discuss how to move beyond these baselines through weak learning and transfer learning. By the end of this talk, attendees will have a better understanding of how to establish and improve upon baselines in NLP.


In this talk, we will explore the role of baselines in natural language processing (NLP) and discuss how to move beyond these baselines through weak learning and transfer learning.

First, I will introduce the build-measure-learn paradigm, which is a framework for developing and improving products or systems. This paradigm involves building a solution, measuring its performance, and learning from the results to iteratively improve the solution. Baselines are an essential part of this process because they provide a starting point for comparison and a benchmark to measure against.

Next, I will delve into the common NLP tasks of classification, clustering, search, and named entity recognition (NER). For each task, I will describe the baseline approaches that can be used. These baselines may not be the most advanced or sophisticated solutions, but they are often quick and easy to implement, and they can serve as a useful reference and guidance for further improvement.

Finally, I will discuss how to move on from these baselines. One option is to use insights from the baselines to build a weak learning system, which is a machine learning model that relies on human-generated rules or patterns rather than a large dataset. Another option is to leverage transfer learning, which involves adapting a pre-trained model to a new task or domain by fine-tuning its parameters on a smaller dataset.

In conclusion, this talk will provide a practical guide to establishing baselines in NLP and moving beyond them through weak learning and transfer learning.


Expected audience expertise: Domain

Intermediate

Expected audience expertise: Python

Novice

Abstract as a tweet

Join us for a talk on baselines in NLP! We'll cover common tasks like classification, clustering, search, and NER, and discuss how to establish and improve baselines using weak learning. Don't miss out on this opportunity to gain a deeper understanding of NLP baselines!

Tobias Sterbak is a Data Scientist and Software Developer from Berlin. He has been working as a freelancer in the field of Machine Learning and Natural Language Processing since 2018. On the blog www.depends-on-the-definition.com he occasionally writes about these topics. In his private life he is interested in data privacy, open source software, remote work and dogs.