Scientists have discovered a way of identifying a people contained in distinct, theoretically anonymous databases from just 13 tiny snippets of DNA.
Being able to infer so much on the basis of so little information raises privacy concerns as scientists from Stanford University found that by starting with just 13 genetic markers they could infer hundreds of thousands more and potentially reveal a wealth of genetic information.
Scientists started with just 13 genetic markers because until recently 13 was the number of genetic markers the FBI needed for its Combined DNA Index System (CODIS) database.
In 2013, the Supreme Court’s 2013 ruled in the case of Maryland v. King that the state of Maryland could retain DNA from anyone who’d been arrested there.
Since the genetic markers used to compile the nationwide CODIS database could not be used to infer private health data or other traits, the state argued the benefits of recording DNA from anyone even suspected of a crime outweighed those suspects’ privacy concerns.
The discovery, to be published in Proceedings of the National Academy of Sciences, contradicts that and scientists said when the same person is included in more than one genetic database, it may be possible to infer genetic traits from CODIS data or to find matches across different sets of DNA markers.
Scientists reached their conclusion by analysing two sets of genetic data from 872 human genomes.
The first set comprised just 13 markers, while the second contained a much broader dataset of 642,563 genetic markers that did not overlap with the first set.
Noah Rosenberg, a professor of biology and the paper’s senior author, and his team found there were strong enough patterns in the DNA that they could match upward of 90% of the records.
The team also noted that if they added in 17 more forensic markers, bringing the total to 30, they could match more than 99% of the records in the two datasets, so with the right combination of databases, it may be possible to infer a wealth of genetic information based on a very small set of markers.
However, the team also said that the discovery could aid the police by allowing them to improve the forensic marker system while maintaining backward compatibility.
With just 13 or even 20 genetic markers, authorities face a substantial risk of false positive matches.
Using larger and more modern marker sets would reduce false positive rates, but would introduce another problem: it might not be possible to check for matches against decades of profiles collected with the 13 markers that have been used to date.
The new results, Rosenberg said, give a proof of principle that it may be possible to develop a forensic genetic system with new marker sets and still be able to test for matches against databases assembled with the earlier 13 CODIS markers.