ProtoSyn: a Julia based platform for molecular modelling

ProtoSyn aims to be a simple and intuitive package for molecular manipulation with an emphasis on peptide design and simulation. The main goal of ProtoSyn is to be a basis on top of which new tools and protocols can be experimented and prototyped. Taking advantage of Julia’s environment, ProtoSyn has been built with emergent technologies in mind, such as distributed computing, GPU and SIMD acceleration and machine learning models usage. Version 1.0 is scheduled to be released in the end of 2021.

The rational placement of aminoacids in a sequence directly correlates to the 3D structure of the peptide, once folded, which, in turn, dictates the interactions with the environment and therefore the function of the peptide. Being able to design new peptides for specific functions would unlock the potential of unknown conformations not yet explored by nature, with applications in medical fields, agriculture, biological remediation, enzymatic synthesis, among others.

This was traditionally explored by random blind mutagenesis which is an expensive and time intensive experimental practise. With the evolution of computational power over the last couple of decades, computational design of small proteins has become the focus of scientific breakthroughs. Using computer simulations saves precious time and monetary costs of experiments, focusing efforts on simulated prototypes that have shown promising results. As such, multiple software solutions have been proposed over the years. An example would be Rosetta (and its Python wrap, PyRosetta), which has been indisputably invaluable as a platform for molecular manipulation and peptide design.
However, as it happens with so many scientific-purposed software packages, Rosetta has fallen into the two-language problem, where the core of the simulation code is written in C with a more user-friendly wrap in Python that exposes some of the functionality. This has gravelly impaired the community ability to upgrade and modify this package, as well as imposing a steep learning curve to non-specialized would-be-users.

ProtoSyn, empowered by the Julia language ecosystem, aims to put forward a simple and easy to use platform for molecular manipulation and peptide design. A Julia-based solution to this challenge would naturally benefit from the native features of the language, such as easy parallelization and distributed computing, GPU acceleration and machine learning tools, among others. A list of current features of ProtoSyn includes: the removal, addition and mutation of an indiscriminate number of aminoacids, rigid body docking of ligands, Monte-Carlo and steepest descent drivers and rotamer library search, among others. As a whole, this list of features constitutes the core of ProtoSyn: a playground for the development and prototyping of new sequences and structures of proteins. A set of extra features has been identified and initial support for them has been prepared. These include branched structures, user interface with a UI and even molecular dynamic drivers, but further development in these modules is needed.

For the steepest descent and Monte-Carlo simulations, a critical component of any molecular manipulation software is the definition of an energy function: a fitness evaluating function that measures how realistic a given molecular structure is. “TorchANI” is a machine learning model trained on DFT results, providing realistic results with up to 10e6 speedup in computation times, and has been developed in Python by a team of researchers from Florida. Using PyCall package, this external Python package is loaded and used in Julia as the core piece of the ProtoSyn’s energy function. Other components include the Caterpillar Solvation Energy as well as a Contact Map restraint.

Overall, ProtoSyn is able to provide an easy to use, intuitive and robust experience in molecular manipulation, allowing even non-specialized users access to basic simulation tools that can greatly reduce prototyping costs and experimental time. ProtoSyn is also fast, being accelerated by both GPU and SIMD technologies, and easily employing distributed computing protocols allows us to speed simulations even further. In conclusion, the first development cycle of ProtoSyn is hopefully coming to a close, with final tests, documentation, examples and tutorials being prepared. The open-source code in finally ready for the community to check and improve upon.