PyLadiesCon 2024

Designing data tool-kit: Your Open-Source Atlas
2024-12-07 , Main Stream
Language: English

Navigating the vast ocean of data tools can be overwhelming, but with the right approach, it becomes manageable and even empowering. Via this session, we'll explore how to create a tailored data toolkit using open-source solutions. I’ll share practical steps and guiding questions to help you choose suitable tools, ensure reproducibility, and build flexible workflows that grow with your needs.


It's easy to get lost in the ocean of tools and technologies in the world of data. To stay on course, I came up with a simple mental model to help create an effective tech toolkit. These guiding principles inspired the creation of DataJourney.

Questions, which I ask to get started

Start with the Why: What Are Your Goals?
Ask: What’s the main purpose of this workflow? What problem are we solving?

Identify the Pain Points: Where Are the Gaps?
Ask: What challenges or bottlenecks are we facing? What’s missing in our current setup?

Simplify the Options: What Tools Fit Our Needs?
Ask: Which tools address our pain points without adding unnecessary complexity?

Focus on Reproducibility: How Do We Ensure Consistency?
Ask: How can we make this workflow easy to repeat and share with others?

Build with Flexibility: How Can We Adapt Over Time?
Ask: Are the tools chosen scalable?

Test and Learn: How Does It Work in Practice?
Ask: How can we test this setup quickly? What feedback can we gather to improve it?

Iterate and Improve: What’s the Next Step?
Ask: What worked well, and what needs tweaking? How can we optimize further?

Outcome is a tool-kit which is customised, and scalable. All powered with open-source 💜

I'm a data generalist, strategist, and community builder with a passion for Open Science. I thrive on creativity and innovation, bringing new ideas to life. From exploring the depths of the ocean to scaling mountains, I love venturing into uncharted territories.