JuliaCon 2022 (Times are UTC)

TintiNet.jl: a language model for protein 1D property estimation
2022-07-27 , Green

AI has inaugurated a new era in Bioinformatics, to the point where contemporary language models can extract structural information from processing single protein sequences. Contributing to this field, we built TintiNet.jl, a 100% open-source, open-data and Julia-based portable language model to predict 1D protein structural properties. Our model achieves top performance - computational and predictive -, when compared to other modern algorithms, with only a fraction of the parameter count.


The objective of TintiNet.jl is to improve the current state of single-sequence-based prediction of 1D protein structural properties by drastically reducing the size of the models employed while preserving or improving their raw predictive power.

Our main design principles were to avoid intra-serialized processing layers (such as recurrent neural networks) and to employ encoding layers that could grow deeper without a steep increase in computational complexity. Our solution was to develop a hybrid convolutional-transformer architecture, employing the Julia Language, The Flux.jl framework and the Transformers.jl contributed layers to Flux.jl, as well as some BioJulia packages (BioSequences.jl, BioStructures.jl, BioAlignments.jl and FASTX.jl). The project is 100% open-source and open-data, and scripts and procedures to implement the methodology presented are available at https://github.com/hugemiler/TintiNet.jl.

By training and evaluating our model in an extensive collection of over 30000 protein sequences, we demonstrate that this architecture can achieve a similar degree of merit (classification accuracy and regression error) when compared to the three most modern, state-of-the-art models. Since it has a much reduced number of parameters compared to its alternatives, it occupies much less memory and generates predictions up to 10 times faster.

Computational Chemist (Ph.D. Candidate) @ University of Campinas, Brazil; incoming Postdoc Researcher @ Wellesley College, USA. Chemistry teacher, Julia developer, Scientific Computing and Machine Learning enthusiast, casual MMORPG gamer, bartender and home cook. Programming makes Science even cooler! (He/Him/His)