PyConDE & PyData Berlin 2024

FlixBus CitySnap: How we use GenAI and not only to collect captivating images for cities and confirm their locations
2024-04-24 , A1

Have you ever wondered how travel e-commerce companies gather photos of cities? While I can't speak for everyone, I will demonstrate the innovative approach we are using at Flix.

In recent years, text-to-text models like ChatGPT and text-to-image models such as DALL-E 3 have become increasingly integrated into various industries. The main aim of these initiatives is typically to generate text or images. In our presentation, we propose a slightly different approach to leveraging these models commercially. Our objective is to gather images for thousands of cities that inspire travel. We utilize ChatGPT to tailor prompts for our business requirements, enabling efficient image retrieval through API queries from free stock image services. Then we apply image-to-text models to confirm the images' locations. Finally, we need to adjust the resolution of images for display across various platforms, such as social media campaigns on Instagram, email marketing, and on our website. To achieve this, we have used an automated cropping service to get images in the required aspect ratios, followed by Lanczos sampling for downscaling the images. This integration of cutting-edge models has resulted in an automated, highly flexible process that aligns with varied business needs. Our approach is cost-efficient; processing several hundred cities amounts to only a few euros, and we have utilized commonly available services, making replication easy for everyone.


Flix's buses serve over 5,000 cities, and to elevate our customers' experience, we aim to collect captivating photos for each city. Photo city collection task is not new, but previously, it was predominantly addressed with human resources. However, due to the extensive number and the growing scale of our bus network, manually gathering photos for each city is unfeasible and non scalable. In this talk, we will demonstrate how we built a fully automated end-to-end pipeline to achieve this goal. Our pipeline comprises three main steps.

The first step involves collecting city images from free image stock services like Pixabay and Pexels, via API. Simple queries by city names yielded poor results as not every image is enticing enough to inspire visits to the city. People often travel to see a city's landmarks, which is why we utilized ChatGPT to gather images of prominent landmarks for each city.

The second and most complicated step is to verify that the images accurately represent the targeted cities. Initially, we relied on metadata from the image stock services, such as tags from photographers. However, this information is often not sufficient to validate an image's location. To improve accuracy, we investigated various services. Models like DALLE from OpenAI can predict image locations but currently lack an API for full automation. We found two services from the Google Cloud Platform with APIs suitable for location validation: the Gemini multimodal and the landmark detection service.

The third and final step of our pipeline involves adjusting the images to various resolutions for display across different platforms, such as social media campaigns on Instagram, email marketing, and our website. This is achieved by cropping images to the desired aspect ratios using Google Cloud Vision API's smart cropping service, followed by Lanczos sampling for image downscaling, which is available in various open-source Python libraries.

Our pipeline is a cost-efficient approach using widely available services, thereby facilitating easy replication. During this presentation, we will share our results across several countries, discuss the most challenging problems we encountered, and offer insights into how this pipeline could be improved with the release of upcoming cutting-edge models. We believe that our case shows how the industry can use Generative AI not only to create a new context, but also to find, analyze and filter publicly available information for different business needs.


Expected audience expertise: Domain:

Novice

Expected audience expertise: Python:

Intermediate

Abstract as a tweet (X) or toot (Mastodon):

Unlocking City Charisma: Leveraging Generative AI for Automated Image Collection and Elevated Customer Experience 🌟 Dive into Flix's innovative approach

Career:
Since 2022, I have continued my career as a data scientist at FlixBus.
From 2018 to 2022, I worked as a Data Scientist in banking.

Education:
From 2021 to 2022, I received a micro master's degree in Finance.
From 2019 to 2021, I received a master's degree in computer science.
From 2015 to 2019, I received a bachelor's degree in applied math.