Dhairya Gandhi
Sessions
Large Language Models (LLMs) have become ubiquitous in several areas. But so far, the Julia ecosystem has lacked an efficient way to train these models, hindering the adoption of Julia by ML practitioners and users. In this talk we demonstrate parallel training of LLMs using Dagger.jl and Flux.jl. We discuss the various components required to write an efficient parallel training routine. Further, we present the scaling performance achievable with this method, and discuss the future developments.
Machine learning model development, characterized by iterative experimentation and adjustments, often leads to complex model iterations, making tracking and debugging challenging. This talk explores the application of CI/CD methodologies to machine learning, using Julia's Pkg ecosystem, Buildkite, GitHub, and MLflow. We showcase a streamlined process for efficient model development and tracking that can lead to mass robust experimentation for machine learning workflows