EuroSciPy 2025

Adam Staniszewski

I am a fifth-year computer science student at AGH University of Science and Technology (AGH UST) in Kraków, Poland, where I am currently conducting research in the fields of Natural Language Processing (NLP) and Chemoinformatics. My academic work is focused on developing innovative solutions that leverage computational techniques to analyze language data and chemical information. In addition to my studies, I am professionally working as a backend engineer, where I develop, and maintain server-side applications, ensuring scalability, efficiency, and reliability of systems. This combination of research and industry experience allows me to stay on the cutting edge of technology while applying practical solutions to real-world problems.


Your pronouns

He/Him

Affiliation

AGH UST/AGH Chemoinformatics and Machine Learning Lab


Session

08-21
16:00
20min
How To Accelerate Molecular Insights - Efficient Distance Calculations In Python
Adam Staniszewski

In the rapidly evolving field of chemo- and bioinformatics, the efficient computation of molecular distances plays a crucial role in applications such as drug discovery, molecular clustering, and structure-activity relationship modeling. The ability to accurately and efficiently measure molecular similarity is essential for tasks ranging from virtual screening to predictive modeling. As molecular datasets continue to grow in size and complexity, scalable and computationally efficient distance metrics become increasingly necessary to facilitate large-scale analysis.

In this work, we explore how Python’s numerical computing capabilities can be leveraged to implement a diverse range of molecular distance metrics. We focus on optimizing computations for vectorized molecular representations, ensuring that performance remains competitive with highly optimized C++-based solutions. By utilizing efficient numerical libraries, we demonstrate that Python can achieve substantial execution speed while maintaining the flexibility and ease of implementation that make it a preferred choice for many researchers.

Beyond implementation, we conduct a comprehensive performance evaluation by comparing our Python-based methods against state-of-the-art libraries written in C++. Our benchmarking includes assessments of computational efficiency, memory usage, and scalability on large molecular datasets. The results illustrate that, with appropriate optimizations, Python-based approaches can serve as

Life Sciences and Biomedicine
Small room