Evaluate LLM synthesized Julia code Juliacon 2024

Evaluate LLM synthesized Julia code
.ical

2024-07-12 15:40–15:50, For Loop (3.2)

HumanEval and MBPP are two of the most frequently used benchmarks to evaluate LLM's performance in code generation. However, they mainly focus on the Python programming language only. In this talk, we will analyze SOTA code LLMs' performance in Julia. Results will be updated constantly at HumanEval.jl

Evaluate LLM synthesized Julia code .ical 2024-07-12 15:40–15:50, For Loop (3.2)

Evaluate LLM synthesized Julia code
.ical

2024-07-12 15:40–15:50, For Loop (3.2)