Building an Open-source Framework for Generating Embedding Vectors
06-14, 16:00–16:40 (Europe/Berlin), Palais Atelier

The combination of big data and deep learning has fundamentally changed the way we approach search systems, allowing us to index audio, images, video, and other human-generated data based on an embedding vector instead of an auxiliary description. These advancements are backed by new and often times increasingly complex machine learning (ML) models, leading to an even wider research-to-industry gap despite the introduction of MLOps platforms and a variety of model hubs. We summarize some of the challenges facing practical machine learning in 2022 and beyond as follows: 1) many ML applications require a combination of multiple models, leading to a lot of overly complex and difficult-to-maintain auxiliary code, 2) many engineers are unfamiliar with ML and/or data science, making it difficult for them to train, test, and integrate ML models into existing infrastructure, and 3) constant architectural updates to SOTA deep learning models creates significant overhead when deploying said models in production environments.

In this talk, we discuss lessons learned from building an open-source (https://github.com/towhee-io/towhee) and scalable framework for generating embedding vectors purpose-built to tackle the above challenges. Early on, we communicated with dozens of industry partners to understand their application(s) and architected our platform around their requirements. This open source project is currently being used by 3 major corporations ($10B+ market value) and a number of small- and mid-size startups in proof-of-concept and production systems.

The Search track is presented by OpenSource Connections


Get your ticket now!

Register for Berlin Buzzwords in our ticket shop! We also have online tickets and reduced tickets for students available and you can find more information about our Diversity Ticket Initiative here!

Frank Liu is an engineering manager and tech lead at Zilliz, where he leads a team of multi-national engineers while serving as a maintainer for the Towhee open-source project (https://towhee.io). In a span of less than three months, he and his team turned Towhee into an easy-to-use platform in use by 3 major corporations ($10B+ market value) and a number of small- and mid-size startups in proof-of-concept and production systems.

Prior to Zilliz, Frank co-founded a indoor localization startup based in Shanghai and worked as a deep learning engineer at Yahoo in San Francisco. Frank holds a MS and BS degrees in Electrical Engineering from Stanford University.