EuroSciPy 2024

Data augmentation with Scikit-LLM
08-28, 11:40–12:00 (Europe/Berlin), Room 7

Scikit-LLM is an innovative Python library, seamlessly integrates Large Language Models into the Scikit-Learn framework. Scikit-LLM becomes a powerful tool for natural language processing (NLP) tasks within the Scikit-Learn pipeline, and I'll showcase a data augmentation action to build features using zero-shot text classification and text vectorization.


Scikit-learn is one of the most well-known and widely-used open-source Python libraries in the field of machine learning by data scientists due to its wide range of models and friendly use. You are able to solve any task, from regression to classification, from clustering to dimensionality reduction, using just one library. Scikit-LLM is a Python library that embodies large language models into the scikit-learn framework.
It’s a tool to perform natural language processing (NLP) tasks all within the Scikit-Learn pipeline.
The features provided by Scikit-LLM are
-Zero-Shot Text Classification
-Few-Shot Text Classification
-Dynamic Few-Shot Text Classification
-Multi-Label Zero-Shot Text Classification
-Text Vectorization
-Text Translation
-Text Summarization.
Will be presented an use case of data augmentation for flood event from the US storm events database using zero-shot text classification and embeddings techniques.


Abstract as a tweet

Scikit-LLM: Scikit-Learn API with LLMs Under the Hood

Category [Machine and Deep Learning]

Generative Models

Expected audience expertise: Domain

some

Expected audience expertise: Python

some

Public link to supporting material

https://github.com/claudio1975/Medium-blog/tree/master/Scikit-LLM

I'm an actuary moving towards a freelance data science job.
I was a trainee in several SDS & SwissText conference editions.
I was a speaker in several Insurance Data Science Conference editions and meetups (Zurich & Munich).
I was an assistant professor for Insurance Statistics at the Catholic University of Milan.
I started my data science journey with kaggle and hackathons.

This speaker also appears in: