Gabriel Martín Blázquez
Gabriel is a Machine Learning Engineer focused on NLP. From academia to industry, he is now working on Argilla, where we have contributed to the backend of Argilla and also in the development and design of distilabel, a library for generating synthetic data using LLMs.
@gabrielmbmb_
Notable open source projects that you contribute to. Add URLs, one per line. –https://github.com/argilla-io/argilla
https://github.com/argilla-io/distilabel
https://github.com/zenml-io/zenml
Session
04-05
14:00
25min
🧼 From GPU-poor to data-rich: data quality practices for LLM fine-tuning
Gabriel Martín Blázquez, David Berenstein
If you are GPU-poor you need to become data-rich. I will give an overview of what we learned from looking at Alpaca, LIMA, Dolly, UltraFeedback and Zephyr and how we applied that to fine-tuning a state-of-the-art open source LLM called Notus and Notux by becoming data-rich.
Data
Room 111