Juliacon 2024

Evaluate LLM synthesized Julia code
07-12, 15:40–15:50 (Europe/Amsterdam), For Loop (3.2)

HumanEval and MBPP are two of the most frequently used benchmarks to evaluate LLM's performance in code generation. However, they mainly focus on the Python programming language only. In this talk, we will analyze SOTA code LLMs' performance in Julia. Results will be updated constantly at HumanEval.jl


HumanEval and MBPP are two of the most frequently used benchmarks to evaluate LLM's performance in code generation. However, they mainly focus on the Python programming language only. In this talk, we will analyze SOTA code LLMs' performance in Julia. Results will be updated constantly at HumanEval.jl

See also: GitHub

Software Engineer @01.ai