2024-08-28 –, Room 7
Scikit-LLM is an innovative Python library, seamlessly integrates Large Language Models into the Scikit-Learn framework. Scikit-LLM becomes a powerful tool for natural language processing (NLP) tasks within the Scikit-Learn pipeline, and I'll showcase a data augmentation action to build features using zero-shot text classification and text vectorization.
Scikit-learn is one of the most well-known and widely-used open-source Python libraries in the field of machine learning by data scientists due to its wide range of models and friendly use. You are able to solve any task, from regression to classification, from clustering to dimensionality reduction, using just one library. Scikit-LLM is a Python library that embodies large language models into the scikit-learn framework.
It’s a tool to perform natural language processing (NLP) tasks all within the Scikit-Learn pipeline.
The features provided by Scikit-LLM are
-Zero-Shot Text Classification
-Few-Shot Text Classification
-Dynamic Few-Shot Text Classification
-Multi-Label Zero-Shot Text Classification
-Text Vectorization
-Text Translation
-Text Summarization.
Will be presented an use case of data augmentation for flood event from the US storm events database using zero-shot text classification and embeddings techniques.
Scikit-LLM: Scikit-Learn API with LLMs Under the Hood
Category [Machine and Deep Learning] –Generative Models
Expected audience expertise: Domain –some
Expected audience expertise: Python –some
Public link to supporting material –https://github.com/claudio1975/Medium-blog/tree/master/Scikit-LLM
I'm an actuary moving towards a freelance data science job.
I was a trainee in several SDS & SwissText conference editions.
I was a speaker in several Insurance Data Science Conference editions and meetups (Zurich & Munich).
I was an assistant professor for Insurance Statistics at the Catholic University of Milan.
I started my data science journey with kaggle and hackathons.