Kanta Suga PyCon Hong Kong 2025

Kanta Suga
.ical

I'm a Machine Learning engineer at Mercari,inc.
Master (eng): Bioinformatics.

Country / City: Abkhazia

Session

10-11

11:30

30min

Mercari LLM Benchmark: Building a Practical LLM Benchmark for Your Business

Prashant Anand, Kanta Suga

Every new LLM comes with glowing performance on English-centric benchmarks. This makes it difficult to predict how that performance will translate to business use cases in other languages or specialized domains. At Mercari, Japan's largest C2C marketplace, we faced this exact problem with Japanese. Inspired by Kagi, Wolfram, and Aider benchmarks, we are building our own continuously updated internal benchmark to evaluate major LLMs on unpolluted, business-critical tasks that models have not seen in their training data. The talk will cover task design, an evaluation pipeline in Python, a comparison of the latest models on accuracy, cost, and latency, and practical lessons for creating your own benchmark tailored to your needs.

Track C (LT-16)

Kanta Suga .ical

Session

Kanta Suga
.ical