Technology VarSome API VarSome.com

Does Genomics Have a Security Problem?

Jason Armstrong | May 12, 2025

Mapping cyber threats in sequencing, analysis, and reporting.

Next-generation sequencing (NGS) is now routine in research and the clinic, yet its digital architecture leaves it vulnerable to new forms of attack. A recent study (1) set out to map every cyber-biosecurity risk in the NGS workflow. The authors present “the first structured cyber-biosecurity threat taxonomy for NGS.” Their analysis shows that vulnerabilities exist from DNA extraction to clinical reporting, some of which are unlike anything conventional IT frameworks cover. Here, we break down the study, its implications, and what you can do to mitigate the risks. At the end of this article, you’ll find a full breakdown of the risks identified in the paper.

Method Snapshot

The global team screened 3332 publications using PRISMA guidelines, selecting 22 core studies and adding real-world incident reports. They then modelled threats across four NGS stages: raw data generation, quality control, bioinformatics, and interpretation. Using this information, Anjum et al. produced a taxonomy that links specific tools, files, and hardware to the attack techniques most likely to hit them.

High-impact Threats & Mitigations

DNA-encoded Malware

Synthetic oligonucleotides (oligos) can be engineered to exploit vulnerabilities in sequencing software. In a proof-of-concept described in the study, DNA strands are designed to execute code, granting remote access to the attacker once sequenced.

Mitigation: Strict wet-lab barcode hygiene, vendor screening of custom oligos, signed firmware, and runtime sandboxing of basecalling tools.

Firmware Tampering & Hardware Backdoors

Sequencer cameras and embedded controllers often lack secure boot or code signing. An attacker who flashed rogue firmware can corrupt images, render instruments inoperable, or leak raw reads.

Mitigation: Vendor-supplied firmware signing, chain-of-custody logs for updates, and physical access controls that match ISO 27001 hardware clauses.

Supply-chain Compromise in Open-source Tools

Popular tools such as Bcl2fastq, FastQC, and Trimmomatic are rarely subject to the same scrutiny applied to clinical software. A single compromised library can provide remote code execution or silent data alterations.

Mitigation: Confirm that the software version in use matches the official release, maintain clear records of third-party software components (SBOMs), and review code before deployment. Laboratories operating under CLIA or ISO 15189 can fold these checks into their existing validation requirements.

Adversarial AI reads & Model Poisoning

Deep-learning variant callers can be fooled by crafted inputs or by tainted training data. The authors link this to wider concerns about generative AI tools that lower the barrier to designing such attacks.

Mitigation: Train variant-calling models to resist adversarial inputs, track changes in their behaviour over time, and prepare for compliance with upcoming AI regulations such as the EU AI Act.

Large-scale Re-identification Attacks

Even low-coverage data or trimmed reads can be statistically imputed against public panels to reveal hidden genotypes, threatening GDPR and HIPAA compliance.

Mitigation: Differential-privacy filters on shared datasets, query-rate limiting, and federated analysis that keeps raw reads behind each institution’s firewall.

Real-world Examples

2017 Merck NotPetya outage - ransomware halted biologics production.
2024 Synnovis breach - diagnostic blood data leaked; similar pathways could expose genomic files.
DNA-encoded exploit proof-of-concept - active malware delivered via a library preparation tube.
Multiple documented vulnerabilities in some widely used bioinformatics packages - show that the supply-chain risk is not theoretical.

Many of the attacks outlined are technically feasible but not yet widespread. But that does not make them irrelevant. Some threats, like ransomware and cloud breaches, are already affecting healthcare infrastructure. Others, such as DNA-encoded malware or adversarial AI inputs, remain largely theoretical but not out of reach. Motivations range from financial disruption to the misuse of sensitive data, however, the presence of a vulnerability is often enough incentive for experienced hackers. Clinical and research systems are high-value, time-sensitive, and often under-protected. Even if the motivation is not always clear, the cost of waiting until an attack becomes real is high. Recognising these risks early allows the genomics community to build resilience before it is forced to.

How to Secure Your Genomic Data

Audit your pipeline.
Map every software component, note its maintainer, and check for signed releases.
Harden the sequencer.
Enable vendor secure-boot options, disable unused network services, and log firmware updates.
Screen custom DNA orders.
Adopt supplier screening and barcode collision tests before samples enter the lab.

Conclusions

NGS will only grow more critical to diagnostics and discovery. So will the incentives to attack it. The taxonomy presented in this study offers a practical lens through which scientists, clinicians, and IT teams can identify risks across biological, computational, and organizational domains.

While some threats, like DNA-borne malware, appear speculative, the controls are largely in reach: signed code, zero-trust networking, rigorous quality control, and informed policy. By acting on these findings now, the community can continue to drive genomic innovation without allowing security to become its Achilles’ heel.

Threat Tables

Use these tables to help you prioritize your mitigations at each step of your pipeline.

Raw Sequencing Data Generation

Sub-step	Main threats	Notes
DNA extraction	Physical threat or contamination of samples. Insider manipulation of protocols. Re-identification via short-tandem-repeat profiling.	Attackers can recover the identity even from small sample volumes if databases are available.
Library preparation	Synthetic DNA carrying executable payloads (DNA Malware). Malicious barcodes that bleed into other multiplexed samples. Compromised liquid-handling robots.	DNA code executes when downstream software parses the sequence file.
Cluster generation/ imaging	Firmare backdoors or forced downgrades that corrupt camera/laser operation. Physical sabotage or mis-calibration of optics. Insider-modified amplification parameters.	Firmware signing and secure boot are often missing on older sequencers.
Base-calling	Ransomware on control PCs halting runs. Supply chain insertion in basecalling utilities. Cloud account takeover of vendor platforms. Early re-identification attacks on raw FASTQ or BAM files.	A single compromised library can skew an entire run’s quality metrics.

Quality Control & Processing

Threat class	Impact
Manipulated QC metrics	Bad data passes filtering, downstream calls become unreliable.
Back-doored open-source tools (FastQC, Trimmomatic, Cutadapt, QIIME, etc.)	Remote-code execution or silent data alteration inside common QC pipelines.
Deliberate sequence corruption	Injected errors propagate to variant calling and interpretation.

Bioinformatics Analysis

Threat class	Impact
Remote-code execution vulnerabilities in aligners and variant callers	Full system compromise; attacker can alter, exfiltrate, or delete genomic data.
Adversarial input reads targeting ML-based callers	Misclassification of variants; false negatives/positives in clinical pipelines.
Data poisoning during model training	Systematic bias embedded in future analyses.
Genetic-imputation attacks	Recovery of masked genotypes or sensitive traits from partial data.
Time-of-check / time-of-use (TOCTOU) races	Parameters or reference files swapped after validation, before execution.
Compromised reference databases or metadata	Alignment errors, misleading annotations, and cross-site scripting in web portals.
Weak access controls on public NGS repositories	Bulk theft or mass de-identification of uploaded datasets.

Interpretation & Reporting

Threat class	Impact
Tampering with final reports (clinical or research)	Misdiagnosis, flawed scientific conclusions, and regulatory exposure.
Unauthorised disclosure of VCFs or variant tables	Re-identification, genetic discrimination, privacy breaches.
Latent corruption propagating from earlier stages	Results appear plausible but are scientifically incorrect; hard to detect without full provenance checks.