Skip to content
Community Q&A

Long Reads, Close Relatives.

Jason Armstrong
Jason Armstrong |

Non-B DNA structures revealed across gapless genomes of humans and our closest relatives - Smeds et al., 2025.

Sometimes, DNA doesn’t stick to its classic double helix configuration. These non-canonical (non-B) DNA structures are thought to influence gene regulation, replication, and genome stability, and have been implicated in cancer and genetic disorders. However, many lie in repetitive regions that were previously difficult to assemble with short-read sequencing.

Using Telomere-to-Teleomere (T2T) assemblies generated with long-read sequencing, researchers at Penn State University and the Czech Academy of Sciences (1) charted eight classes of non-B DNA structures (e.g., G-quadruples, Z-DNA, hairpins) across complete genomes from humans and other great apes. 

They show that 9-15% of autosomes, and up to 38% of the Y chromosome, can adopt these alternative conformations. Long-read technology exposed centromeres, telomeres, and acrocentric short arms that short-read methods had left unresolved, revealing structure-rich regions tied to gene control and chromosomal instability. While computationally predicted, these findings provide a strong basis for further functional studies and cellular assays to confirm these structures in vivo

How Long-read Technology Helped

Read Length Matters

PacBio HiFi and Oxford Nanopore’s ultra-long reads exceed the size of satellite arrays. This enables gapless assembly of highly repetitive centromeres and telomeres, which were previously inaccessible to short-read technologies. 

Gap-free Consensus

The T2T pipeline assembles long reads into complete, single-contig chromosomes, eliminating unresolved gaps in previous assemblies. This is followed by base-level polishing to ensure high accuracy and confident annotation of unknown or repetitive regions. 

New Sequence, New Insights

The authors show that non-B DNA motifs are disproportionately abundant in the newly added regions, especially in centromeric satellite DNA. 

A Little Help From Our Friends

Smeds et al. applied the same long-read strategy to genomes from all living great apes (chimpanzee, bonobo, gorilla, Sumatran, and Bornean orangutans) and the lesser ape, siamang. This enabled the annotation of non-B DNA motifs across fully resolved chromosomes in each species and the direct comparison of the distribution of these motifs across apes. The authors revealed both conserved and lineage-specific patterns, particularly in previously inaccessible and repetitive regions. 

Key Findings

Widespread Alternative DNA

  • Non-B DNA motifs cover 9-15% of autosomes, 9-11% of chromosome X, and 12-38% of chromosome Y. 
  • Newly assembled centromeres and telomeres show the highest motif density.

Repeat-driven Enrichment 

  • Satellite DNA is the main source of motifs, explaining inter-species differences.
  • Gorilla, the most repeat-rich genome studied, carries the heaviest motif load.

Functional Clustering

  • G-quadruplexes and Z-DNA congregate at promoters, enhancers, and replication origins. Their distribution suggests that key changes in DNA shape may regulate access to these regions, adding a structural layer of control behind the genetic code.
  • Of the active centromeres analyzed across great ape genomes, 60% showed enrichment in at least one non-B DNA motif. These motifs were prevalent in alpha satellite regions, known to associate with kinetochore proteins, while inactive centromeres showed little or no enrichment.

Instability Hotspots

  • Z-DNA clusters are disproportionately concentrated at genomic breakpoints. One locus on chromosome 21, implicated in Down syndrome translocations, shows a 97-fold enrichment. 

Implications for Genomic Medicine

Diagnostics: Maps built from long-read assemblies sharpen breakpoint prediction in clinical karyotyping and refine interpretation of structural variants. 

Therapeutics: Small molecules that stabilise or disrupt G-quadruplexes or Z-DNA could be aimed at the precise regulatory or repetitive loci. 

Cancer Research: Many rearrangements arise at secondary structure hotspots; knowing their exact coordinates supports risk modelling and targeted sequencing. 

Limitations & Next Steps

  • The motifs in this study were computationally predicted; large-scale cellular assays are needed to confirm structure formation in vivo. 
  • Functional studies are required to test whether altering these structures changes gene expression or replication timing. 
  • Extending long-read sequencing to diverse human populations will reveal polymorphic gains or losses with potential phenotypic effects. 

Conclusions

By combining T2T genome assemblies with long-read sequencing, this study presents a comprehensive map of predicted non-B DNA structures across the genomes of humans, five other great apes, and the lesser ape, the siamang. These structures show marked enrichment in regions previously unresolved by short-read methods, particularly centromeres, telomeres, and satellite DNA. 

The analysis demonstrates that non-B motifs are widespread and often coincide with functionally significant elements such as promoters, replication origins, and genomic breakpoints. These findings point to potential roles for DNA secondary structures in gene regulation, genome organization, and structural instability.

Although the motifs were predicted computationally, they provide a valuable basis for experimental validation. Ongoing work with functional assays and long-read sequencing in diverse human populations will help clarify the biological relevance and variability of these structures. 

References

1. Smed L, Kamali K, Kejnovská I, Kejnovský E, Chiaromonte F, Makova KD. Non-canonical DNA in human and other ape telomere-to-telomere genomes. Nucleic Acids Res. 2025;53(7). doi:10.1093/nar/gkaf298

Share this post