Long-Read Sequencing Brings Hidden Autism Variants into View
Autism spectrum disorder (ASD) has a strong genetic component, with rare variants and de novo mutations playing a central role in risk. Large sequencing studies over the past decade have identified hundreds of genes associated with neurodevelopmental disorders. Yet a substantial proportion of the genetic architecture of ASD remains unresolved.
Part of this gap may reflect technical limits in variant detection. Most genomic studies rely on short-read sequencing (SRS), which performs well for SNVs but has limited ability to resolve structural variants (SVs) and tandem repeats. These forms of variation can disrupt genes, alter gene regulation, or affect repeat length in regulatory regions, but they are often difficult to detect in repetitive or structurally complex parts of the genome.
Long-read whole-genome sequencing (LR-WGS) offers a way to address these limitations by reading much longer fragments of DNA. This allows complex rearrangements and repeat expansions to be resolved directly, rather than inferred from short fragments. In a new study published in Cell Genomics1, researchers applied LR-WGS to families affected by ASD to assess how much additional genetic variation can be detected and whether these variants contribute to the unexplained heritability of the disorder.
Methods & Findings
Mortazavi et al. (2026) used LR-WGS to sequence 267 individuals from 63 families affected by ASD, including 243 individuals in complete parent-offspring trios (117 offspring: 76 cases, 41 unaffected controls, 74 males, 43 females, and 126 parents). Both PacBio HiFi and Oxford Nanopore Technologies sequencing platforms were used, allowing the detection of SVs, tandem repeats, and DNA methylation patterns from the sequencing signal.
Comparisons with short-read WGS data showed that LR-WGS increased the discovery of several variant classes that are difficult to resolve using conventional approaches. Detection of gene-disrupting SVs increased by 33%, while identification of tandem repeat variants increased by 38%.
The study also identified previously unresolved structural rearrangements, including recurrent nested duplication-deletion (DUP-DEL) events. In one case, a maternally inherited DUP-DEL event caused loss-of-function of CDC42 Binding Protein Kinase Alpha (CDC42BPA), a gene expressed in the brain that regulates cytoskeletal dynamics. This finding is of particular interest given that haploinsufficiency of its paralog, CDC42BPB, has been associated with neurodevelopmental phenotypes, including ASD. LR-WGS allowed the researchers to resolve the complete structure of variants, which had been fragmented or missed entirely in short-read data.
Because long-read sequencing (LRS) provides information about DNA methylation, the researchers also examined how structural and repeat variants influence gene regulation. They observed deletions affecting imprinted regions, including ADNP2, though the individual carrying this deletion also had XYY syndrome, a known contributor to ASD, and no evidence implicating ADNP2 in ASD was found in a separate large dataset. The authors note that this variant could at most represent a potential genetic modifier.
They also found that variation in the CGG repeat region of the FMR1 promoter correlated with changes in promoter methylation independent of X chromosome inactivation. However, RNA sequencing in a subset of individuals found no significant difference in FMR1 allelic expression between gray-zone carriers and controls, and neither CGG repeat length nor methylation ratio was significantly associated with ASD case status. The functional and clinical significance of this epigenetic finding, therefore, remains unclear.
To estimate the contribution of these variants to ASD risk, the researchers modelled genetic effects across variant classes. Rare variants collectively explained 11.7% of the variance in ASD case status, equivalent to 7.4% of heritability on the liability scale (95% CI: 2.7%–17%). When polygenic scores were included, the total variance explained increased to 13.8%, corresponding to 8.9% of heritability. These results indicate that SVs and repeat expansions account for a measurable component of ASD risk but remain incompletely characterised in existing datasets.
Interpretation
These results show that part of the unresolved genetic architecture of ASD lies in variant classes that are difficult to detect with SRS. SVs and tandem repeats can alter gene dosage, disrupt coding regions, or affect regulatory elements, but their complexity often places them outside the reach of standard analysis pipelines.
The results suggest that current short-read pipelines systematically under-detect certain classes of variation, particularly SVs and tandem repeats in complex genomic regions. It is worth noting that short-read WGS retains advantages for detecting some large coding SVs, where higher read depth supports more sensitive coverage-based signals. The two technologies are therefore best understood as complementary.
LR-WGS also revealed additional variation missed in previous SR-WGS analyses of the same cohort. While most coding SVs were detectable with short-read WGS, long reads captured smaller and more complex SVs, identified substantially more tandem repeat variants, and uncovered de novo SVs not detected previously.
The discovery of DUP-DEL structural rearrangements illustrates how complex genomic events can disrupt genes in ways that remain invisible when genomes are reconstructed from short fragments. In this case, long reads allowed the full structure of the rearrangement to be resolved and linked to the disruption of a gene expressed in the brain.
The ability to measure DNA methylation alongside sequence variation adds another layer of interpretation. SVs and repeat expansions can alter the regulatory environment of genes, and LRS makes it possible to observe these effects in the same dataset. The association between repeat length and methylation at the FMR1 promoter illustrated how structural variation and epigenetic regulation can interact.
Taken together, these findings suggest that SVs and tandem repeats contribute to ASD risk but remain underrepresented in existing genetic studies. Here, LR-WGS does not resolve the missing heritability of ASD, but it expands the range of detectable variation and provides additional information about how those variants affect gene regulation.
Outlook
LRS is still less commonly used than short-read approaches in human genetics, largely due to cost, data storage requirements, and analytical complexity. These barriers are gradually decreasing as throughput improves and analysis tools mature, but SRS remains more practical for very large cohorts. In the near term, many studies are likely to adopt hybrid strategies, using LRS to resolve complex regions or follow up variants that are difficult to interpret with short-read data alone.
Most large-scale ASD studies to date have relied on SRS or exome sequencing, which remain more practical for cohorts involving tens of thousands of individuals. As a result, SVs and repeat expansions are likely to be underrepresented in existing datasets. The cohort analysed in this study remains small compared with the sample sizes typically required for gene discovery. Larger studies will therefore be needed to determine how much of ASD risk is attributable to these variant classes and whether specific genes or pathways can be identified.
As sequencing technologies continue to scale, long-read approaches may become more feasible for population-level studies. Integrating SVs, tandem repeats, SNVs, and polygenic risk in the same datasets could help provide a more complete picture of the genetic architecture of ASD. For now, the results highlight both the promise and current limits of LRS for explaining the remaining heritability of ASD.