Prashant Anand
Prashant is a Staff ML Engineer at Mercari, Inc. in Tokyo, Japan, where he has spent over 5 years building scalable, high-performance production ML systems. He leads the exploration and application of machine learning, NLP, and LLMs to transform customer support experiences at one of Japan's largest e-commerce platforms.
With a B.Tech from IIT Delhi (2019), Prashant brings deep technical expertise in bridging cutting-edge ML research with real-world production systems. He is passionate about sharing knowledge with the Python community, having previously spoken at PyCon JP 2024 and PyCon APAC 2023.
Away from the keyboard, you’ll find him obsessing over specialty coffee and perfecting his hand-drip and French-press brews.
Session
Every new LLM comes with glowing performance on English-centric benchmarks. This makes it difficult to predict how that performance will translate to business use cases in other languages or specialized domains. At Mercari, Japan's largest C2C marketplace, we faced this exact problem with Japanese. Inspired by Kagi, Wolfram, and Aider benchmarks, we are building our own continuously updated internal benchmark to evaluate major LLMs on unpolluted, business-critical tasks that models have not seen in their training data. The talk will cover task design, an evaluation pipeline in Python, a comparison of the latest models on accuracy, cost, and latency, and practical lessons for creating your own benchmark tailored to your needs.