UK Biobank Health Data Leaked Online Dozens of Times, Investigation Reveals
Confidential health data from the UK Biobank, a flagship medical research project storing genetic and medical information from 500,000 British volunteers, has been exposed online on numerous occasions, according to an exclusive Guardian investigation. This revelation raises significant questions about the safeguarding of patient records in one of the world's most comprehensive health data repositories, which has been instrumental in driving breakthroughs in cancer, dementia, and diabetes research.
Inadvertent Data Exposures by Researchers
The leaks appear to stem from scientists approved to access Biobank's sensitive data, who have sometimes been careless with security protocols. While the exposed files do not include names or addresses, they still pose privacy risks. For instance, one dataset discovered by the Guardian contained millions of hospital diagnoses and associated dates for over 400,000 participants, along with details like sex and month and year of birth.
To assess the risk of re-identification, the Guardian collaborated with Biobank volunteers. In one case, using only a volunteer's month and year of birth and details of a major surgery, an external data scientist was able to pinpoint extensive hospital diagnosis records. The volunteer, a woman in her 70s, expressed surprise, stating, "Effectively you were rehearsing the main parts of my medical history to me without me having given you any information at all." She added concerns about Biobank breaking its agreement to hold data securely, though she remains supportive of the project's importance.
Scale and Persistence of the Problem
Data experts have described the scale of the leaks as "shocking", particularly in an era where AI and social media facilitate easy cross-referencing of information online. The issue has emerged because journals and funders increasingly require researchers to publish analysis code, leading some to accidentally upload partial or entire Biobank datasets to platforms like GitHub. UK Biobank prohibits such sharing and has introduced further training for researchers.
In response, Biobank has taken action, issuing 80 legal notices to GitHub between July and December 2025, resulting in the removal of data from about 500 repositories. However, many files remain accessible online. A Biobank spokesperson defended the project, stating that no identifying data was provided to researchers and that re-identification risks only arise if participants share personal health information publicly. They emphasized proactive measures, including searching GitHub and issuing takedown notices.
Expert Criticisms and Ethical Tensions
Privacy experts argue that Biobank's approach is unrealistic, as many people reasonably share health information online, which AI can easily cross-reference. Prof Felix Ritchie of the University of the West of England questioned, "Are these people aware that the internet exists?" Dr. Luc Rocher from the Oxford Internet Institute noted that removing identifiers often fails to guarantee anonymity, with details like birthdays and medical event dates potentially pinpointing records to reveal sensitive information such as psychiatric diagnoses or HIV test results.
Prof Niels Peek of the University of Cambridge acknowledged Biobank's efforts but highlighted the persistent nature of the leaks, stating, "The scale and persistence with which this has happened demonstrates that there are huge tensions between the ambition to drive health research with data at scale and the legal and ethical imperative to protect people's privacy." Experts also doubt whether Biobank can fully regain control of the leaked data, as many files remained available on code archive sites until recently.
Founded in 2003 by the Department of Health and medical research charities, UK Biobank holds genome sequences, scans, blood samples, and lifestyle data. Last month, the government extended access to volunteers' GP records, underscoring the project's ongoing significance despite these security challenges.
