JuliaCon 2026

juTarget: A Julia-powered Pipeline built with a Hybrid Machine Learning method for M. tuberculosis Drug Resistance Prediction
2026-08-14 , Room 2

juTarget is a browser based Linux application for M. tuberculosis tNGS data analysis that applies a hybrid ML method for predicting drug resistance. Industry standard toolkit has been used to identify the variants compared against the WHO catalogue. Furthermore, the resistance profile for novel variants is predicted by Random Forest using a feature set of 30 biophysical properties of amino acids. Molecular Drug Susceptibility Report is generated which can be used as a clinical decision-making tool.


The Structure of the Talk is as follows:

Background: (2 mins; Will brief clinical challenges, limitations of the existing tools)
The emergence and persistence of drug resistant Mycobacterium tuberculosis has attracted researchers to come up with modern rapid diagnostic methods like tNGS. Nevertheless, the data analysis to extract the valuable information requires either cumbersome or costlier methodologies and so, we have developed juTarget, a dockerized browser based Linux application.

juTarget Pipeline: (3 min; Will detail application architecture)
We integrated the industry standard tools into Julia such as: minimap2 for long read alignment, samtools for BAM manipulation, bcftools for variant calling. A customized Julia-native module is then applied to standardize chromosome names and apply quality filters. SnpEff was then used to annotate the variants to identify the deleterious mutations.

The Hybrid ML Method: (4 min; Will describe our novel two-tier classification method)
The heart of this application is a hybrid machine learning method for classifying drug resistance.

  • Catalogue-based Identification: SnpEff-annotated variants are first compared with the extensively curated database of known resistance mutations, the WHO catalogue, prioritizing the clinical and biological relevance.

  • Predictive Classification: For novel uncatalogued variants, a Random Forest model classifies the mutations based on a feature set of 30 biophysical features of the amino acids.

Validation and Conclusion: (3 min; Will conclude presenting the validation results)
The application has been validated by the experts with the clinical samples at the National Institute for Research in Tuberculosis (NIRT), Chennai, The accuracy of the method was confirmed with high concordance between the drug resistance profiles generated by juTarget and camspred, a locally developed pipeline, on a shared dataset. This project demonstrates Julia’s applicability for developing complex, reproducible bioinformatics pipelines integrating the industry standard tools and custom scripts with ML models, providing a scientifically validated platform for Mycobacterium tuberculosis tNGS data analysis.

Technical Details: The application runs on a local server and it is dockerized for portability. It is developed using Genei.jl framework in a user-friendly GUI ecosystem and the user does not have to be an expert Linux user. Parallel processing has been implemented considering that computer’s RAM capacity. Status bars indicating the process status has been incorporated for easy monitoring. The results are stored locally which can be retrieved through the application interface at any time. The code is available on https://github.com/drbenedictpaul/jutarget.

See also:

Dr. Paul is an Assistant Professor of Biotechnology and a Biomedical Data Scientist specializing in the integration of computational biology with experimental research. He holds a PhD in Computational Biology from VIT University, Vellore, with a research focus on breast cancer and aromatase inhibitors. With a unique academic foundation spanning Medical Laboratory Technology (CMC Vellore) and Biotechnology, his work bridges the gap between clinical context and molecular analysis. Dr. Paul is a strong advocate for the Julia programming language in scientific application development, having developed jSeqTB, a machine-learning integrated GUI application for Next-Generation Sequencing (NGS) data analysis of Mycobacterium tuberculosis. His lab focuses on developing high-performance tools like juProt and exploring TNGS-based diagnostics, aiming to address complex biomedical challenges through data science.