2022-06-13 –, Maschinenhaus
Language models have drawn a lot of attention in NLP in recent years. Despite their short history of development, they have been employed and delivered astonishing performances in all sorts of NLP tasks, such as translation, question answering, information extraction and intelligent search.
However, we should not forget that giant language models are not only data hungry, but also energy hungry. State-of-the-art language models such as BERT, RoBERTa and XLNet process millions of parameters, which is only possible with the help of dozens of sophisticated and expensive chips. The CO2 generated in the process is also massive. Being responsible for such high energy consumption is not easy in times of climate change.
In order for companies to benefit from the performance of state-of-the-art language models without putting too much strain on their computing costs, the models used must be reduced to a minimum. Of course, performance should not suffer as a result. One possible means to achieve this is the so-called knowledge distillation, which is one common technique among model compression methods. In this presentation, we will show you how you can use knowledge distillation to generate models that achieve comparable performances as state-of-the-art language models effectively, and in a resource-saving manner.
The Search track is presented by OpenSource Connections
Get your ticket now!
Register for Berlin Buzzwords in our ticket shop! We also have online tickets and reduced tickets for students available and you can find more information about our Diversity Ticket Initiative here!
Qi Wu works as a Machine Learning Engineer at ontolux to translate current research results into usable applications for our customers. She works on topics such as training and optimizing models, with a focus on finetuning and distillation. During her master's degree in statistics, she has already worked with Prof. Dr. Alan Akbik on the NLP framework FLAIR and worked on machine learning in the area of natural language processing, such as information extraction.