- Hollis Johnson/Business Insider
- It may be getting easier to link your private and anonymized DNA data to your identity.
- That means the genetic data you share with a testing company – which may include sensitive health information like your risk of cancer – could one day be matched with your name by an unintended party.
- While some at-home DNA tests like 23andMe have privacy protocols to protect against this, they’re not a guarantee, experts say. Other companies have fewer safeguards.
- One key issue is the ability for users to upload their private DNA data to publicly-accessible genetic databases like the one used in the Golden State Killer case.
The data you shared with a genetic testing startup like 23andMe is private – for now.
But maintaining that privacy, which rests on your data being kept anonymous and secure, is getting harder, according to privacy experts, bioethicists, and entrepreneurs.
Your DNA data contains highly sensitive information about your health and identity. Everything from your ancestry to your risk of cancer to information about allergies and predisposition to Alzheimer’s are often included in a genetic test report. Whether it’s a political figure claiming indigenous heritage or a CEO with a genetic risk for mental illness, any one of these factors could be used against someone if they got into the wrong hands.
The most prolific genetic testing companies take thorough steps to protect your privacy, such as scraping personal identifiers like your name from your genetic code before they sell that data to researchers or drug companies. They also typically store your personal information and your genetic data in separate environments to protect against a potential hack.
But those protocols do not protect against several key vulnerabilities, experts say.
One involves what can happen to the data outside of the tough-to-define walls of a DNA testing service. While genetic testing companies can and frequently do share anonymized genetic data with researchers and drug companies, individual users can also upload their private, non-anonymous DNA reports to public databases like GEDmatch. That service, which was used to home in on the Golden State Killer suspect, allows for the identification of relatives who haven’t even taken a genetic test.
Even large pools of anonymized genetic data can theoretically be tied to an individual. For at least the past decade, researchers have demonstrated that by cross-referencing anonymous DNA data with datasets that include personal information, such voter or census rolls, they can correctly “re-identify” significant portions of participants.
Plus, most of the leading genetic testing services allow customers to download their raw genetic data – the As, Gs, Ts, and Cs that make up their genetic code – using their email and profile login.
Privacy experts and bioethicists say all of these issues make the current landscape of genetic testing ripe for potential calamity.
“This is not video games that can be downloaded and shared without your permission, or even bank information,” Matt Mitchell, the director of digital safety and privacy for advocacy organization Tactical Tech, told Business Insider.
“You can cancel your credit card. You can’t change your DNA,” he added.
The case of the Golden State Killer: how private and protected DNA data can be exploited in public databases
- Justin Sullivan/Getty Images
When you mail your saliva sample to a company like 23andMe, Ancestry, Helix, or any one of a handful of current DNA testing startups, they run an analysis of the genetic data it contains. That DNA data includes your unique genetic code and it also includes your ancestry data, which can point to relatives.
To protect your privacy, most of these companies make that data anonymous: they remove your personal information, such as your name, from the data, and they store the DNA data separately from your personal information.
Spokespeople from Ancestry, 23andMe, and Helix all told Business Insider that their privacy policies are designed to protect people’s data within the walls of their platforms. But what happens outside of their domains is up to the individual customer.
In the case of the Golden State Killer, law enforcement agents uploaded their suspect’s DNA to the open personal genomics and genealogy database GEDmatch using a sample from a crime scene. Then, with the help of a team of experts, they were able to comb through and compare several sets of data until they found their suspect, Joseph James DeAngelo. Key to their discovery was the fact that 24 of DeAngelo’s relatives had participated in GEDmatch.
You share a lot of your DNA with your parents and siblings, and less with more distant relatives. But by comparing an anonymous DNA sample with identified ones, researchers can triangulate in on a person’s relatives, and then, identify the person themselves.
None of the leading genetic testing companies allow users to upload raw DNA samples like GEDmatch does. But you can download your Ancestry or 23andMe genetic data and share it with GEDmatch or another public genealogy database.
“Today when you have a de-identified dataset and a complementary resource you can compare that data with – such as something like GEDmatch – you can begin to identify individuals from that,” James Hazel, a biomedical researcher at Vanderbilt University who recently reviewed the privacy policies of several genetic testing companies, told Business Insider.
‘Data is data – once it’s out there, it’s very hard to control’
- Hollis Johnson/Business Insider
Until very recently, researchers considered the risk of re-identification – when someone correctly matches your anonymous DNA data with your personal information – to be extremely low. But as more people participate in genetic testing and as data analysis tools become faster and easier to use, this risk is on the rise, they say.
Hazel said the current risk of re-identification is “significant.”
Dawn Barry, the president and cofounder of genetic research startup LunaDNA and a 12-year veteran of biotech giant Illumina, agreed.
“We need to prepare for a future in which re-identification is possible,” she told Business Insider during a meeting on the sidelines of a health conference organized by the Wall Street Journal.
Since roughly 2009, researchers have demonstrated that by comparing large sets of supposedly anonymous DNA data with public datasets from censuses or voter lists, they could correctly identify between 40% and 60% of all genetic testing participants.
DNA databases have grown significantly since that 2009 experiment.
As of last fall, more than 19 million people had taken a private Ancestry or 23andMe test. On the heels of their growth, participation in public databases like Promethease and GEDmatch have ballooned as well.
“Data is data – once it’s out there, it’s very hard to control,” Hazel said.
David Koepsell, a Yale bioethicist and the cofounder and CEO of blockchain-enabled genomics company EncrypGen, agreed.
“Re-identification is a real concern and people have done it with public databases. It’s not science fiction,” he told Business Insider.
Last November, Yaniv Erlich, a geneticist and the chief science officer of ancestry company MyHeritage, led a study published in the journal Science in which he looked at DNA data from GEDmatch and MyHeritage. Erlich concluded that with a genetic database of 1.3 million US residents, roughly 60% of all white Americans could be traced to a third cousin. This finding was independent of whether people had themseleves participated in a genetic test.
“In the near future,” Erlich wrote in the paper, “the technique could implicate nearly any US individual of European descent.”
Spokespeople from Ancestry, 23andMe, and Helix all outlined comprehensive privacy policies that are designed to protect people’s data when their data remains within the platforms.
“To protect against re-identification, we strip customers’ personally identifiable information from their genetic information, storing the two sets of data in separate, walled-off computing environments,” a 23andMe representative told Business Insider via email.
Helix and Ancestry spokespeople shared similar policies.
‘It could go wrong’: Experts warn against downloading your personal DNA data
- Hollis Johnson
But Ancestry, 23andMe, and Helix all allow users to download their raw DNA data. The download is free from Ancestry and 23andMe but costs $499 with Helix. A Helix spokesperson said the fee was because Helix provides a more comprehensive genetic dataset than the other platforms.
In most cases to download their DNA data, a user must log into the platform and select “download my raw DNA.” Then they get an email where they must confirm the download. After clicking confirm, a text file download begins.
- Business Insider / Erin Brodwin
Once a customer downloads their genetic data, however, it is no longer protected by any of company’s security measures.
“What you do with your data is your responsibility, whether that means sharing your login name and password with others, sharing through 23andMe, downloading your data or anything else,” 23andMe’s website reads.
Experts say this setup does not adequately protect users. At minimum, they say the platforms should encrypt the genetic data from the time it is sent to the time it is received. They also pointed out that a person’s login information may be the same as their email, another potential security weakness.
“This is Privacy 101,” Mitchell told Business Insider. “These companies need to have the highest level of security and they don’t.”
Mitchell and Hazel both said they believed genetic testing companies should use two-factor or multi-factor authentication, a security step enforced by many banks and data companies. It requires users to give two or more pieces of evidence (such as their phone number and a pin) before allowing access to sensitive data.
“This is something a lot of companies do,” said Mitchell. “If someone really cares about your data they’re going to handle it with the utmost caution. Downloading raw data is dangerous and it could go wrong.”
Hazel thinks more users should be aware of these vulnerabilities, as well as the various ways their data may be used that go beyond their initial intentions.
“It comes down to the trade-off,” he said. “How comfortable are you with how the data might be shared and used?”