Unlocking Genetic Data: The Risks of Polygenic Risk Scores
Genetic data can be analyzed to estimate the risk of developing specific health conditions. Science Photo Library / Alamy
Polygenic risk scores (PRS) summarize an individual’s likelihood of developing particular health conditions, revealing insights into a person’s DNA through advanced mathematical methods. These scores could potentially be leveraged by health insurance companies to reconstruct genetic data from summary genomic reports, uncovering health risks that patients might not disclose. Furthermore, individuals sharing their scores anonymously could be identified by extracting genetic data and querying public genealogy databases.
Understanding Polygenic Risk Scores
Polygenic risk scores measure the impact of variations in tens to thousands of specific letters in the genome, known as single nucleotide polymorphisms (SNPs). Researchers and DNA testing companies like 23andMe use these scores to summarize potential health risks, which may also be made public by individuals seeking advice on score interpretation.
Solve a polygenic risk score is akin to deducing a phone number, only knowing that the digits sum up to a specific number, illustrating a mathematical challenge known as the knapsack problem. This complexity makes PRS considered to have a low privacy risk.
However, each SNP value in the score is multiplied by a highly precise weight—up to 16 orders of magnitude—reflecting its contribution to overall disease risk. This makes even low-risk models vulnerable to data attacks.
Research Findings on Genetic Risk Scores
According to Gamze Gyursoy at Columbia University, “The final polygenic risk score can be estimated with a high degree of accuracy because it is constrained by the finite methodology used to reach that figure and the statistically probable arrangement of the underlying SNPs.” Gyursoy, alongside Kiril Nikitin, also from Columbia, conducted experiments using 298 polygenic risk models based on data from 2,353 individuals. They worked backwards to calculate all possible genomes that could generate each score while excluding those with numerous rare mutations.
As a result, they were able to reconstruct donor genotypes with an impressive 94.6% accuracy and accurately predicted 2,450 SNPs per person. Testing revealed that just 27 SNPs were sufficient to identify an individual from a pool of 500,000 samples, with up to 90% accuracy in predicting family relationships. Interestingly, individuals of African and East Asian descent were easier to identify, largely due to underrepresentation in available genetic databases.
Mitigating Risks and Ethical Considerations
Gyursoy highlights that 447 small, high-precision models in the public database of polygenic scores are susceptible to such attacks. “I wanted to emphasize that the risk is low; however, [certain conditions] still present the potential for data leakage, which must be considered in study planning, especially when involving vulnerable populations,” Gyursoy states.
Researchers at Massachusetts General Hospital believe existing data protection methods and computational barriers limit the potential misuse of polygenic risk scores. “These findings serve as a crucial reminder that small models should be treated as sensitive data in clinical reporting and informed consent discussions,” they add.
Source: www.newscientist.com












