PRS.jl: Fast Polygenic Risk Scores
2021-07-29 , Purple

Determining one’s risk of developing various diseases throughout one’s lifetime is important for pursuing good health. An emerging method for performing this calculation is the Polygenic Risk Score, or PRS. A PRS method allows one to construct a model of risk of acquiring a certain disease given one’s own genome and provides a simple numerical result representing that risk. We will describe how we ported a widely used PRS program to Julia and the performance and usability that we gained.


The PRS-CS Python library calculates the relationship between genetic features and traits, eventually producing a single numerical result representing a person’s genetic susceptibility to a given disease. It does this using a novel Markov Chain Monte Carlo approach, allowing it to capture information from more genetic features than previous approaches.
As collection and storage of genetic data increases globally, more diseases are studied at once. However, when calculating these scores for many diseases while maintaining high accuracy, the computational burden becomes increasingly expensive. Because of the limitations of PRS-CS in making top-notch accuracy fast, we developed PRS.jl.

PRS.jl started as a direct port of PRS-CS, and without any special treatment produces results with the same accuracy but in a fraction of the time (or, depending on the configuration, better accuracy for the same amount of time). Today, PRS.jl boasts additional features and improved usability over PRS-CS, while maintaining low compute times per trait (among 9 tested) from an average of 80 hours for PRS-CS to just 15 for PRS.jl.

In this talk, I will introduce the concept of polygenic risk scores and describe how the they are used in biology and medicine. Next, I'll demonstrate how the program works and what aspects we improved upon. Finally, I will show areas where users can contribute improvements to the package.

I am a doctoral candidate in Vanderbilt’s human genetics PhD training program.