2025-10-01 –, Gaston Berger
Discover how Insee transitioned from fastText to a PyTorch-based model for text classification by developing and open-sourcing the torchFastText package. This presentation will cover the creation, deployment, and practical applications of torchFastText in modernizing automatic coding systems, benefiting Insee and other European National Statistical Institutes (NSIs).
Insee, France's National Institute of Statistics and Economic Studies, has long relied on fastText for automatic coding tasks. Recognizing the need to modernize and future-proof this critical functionality, we developed torchFastText — an open-source Python package that enables easy training and deployment of a PyTorch-based model for text classification, paving the way for further innovation in this domain.
This session will delve into the motivations behind replacing fastText, the design and implementation of torchFastText, and its integration into Insee's production environment. We'll discuss the challenges faced during this transition, including model compatibility, performance optimization, and user adoption.
Attendees will gain insights into:
-
The rationale for moving from fastText to a PyTorch-based model in production
-
Packaging a PyTorch-based model architecture and open-source collaboration
-
Key features and architecture of torchFastText
-
Deployment strategies within a public administration (MLOps, cloud native tools, security)
-
Lessons learned and best practices for similar transitions
This talk is intended for data scientists, machine learning engineers, and practitioners interested in NLP, model deployment, and open-source tool development.
Data Scientist at the Innovation Lab, Insee
Cédric Couralet, Data Scientist at Insee, is an open-source enthusiast, with expertise in software architecture and secure system design.