Next-generation Sequencing (NGS) is becoming increasingly more adopted by the clinical community as a primary tool for diagnostics and monitoring of many diseases, uncovering millions of variants previously unknown. However, the sheer quantity of NGS data presents challenges, especially in the interpretation of the clinical significance of genetic variation, and as such may have serious implications for treatment decisions and further medical outcomes. Thus, innovative analytical approaches are critical for scaling up the adoption and diagnostic yield of NGS-based methodology in clinical settings.In our previous article VarSome’s Big Data we explained how VarSome provides access to over 30 public genomics-related data sets through its purpose-built data storage system called MolecularDB. To assure the highest data quality possible, MolecularDB runs daily comprehensive data integrity checks and ensures genomics data are meticulously integrated and cross-referenced, and insertions and deletions are matched consistently across all the data resource available on VarSome. Besides that, you can rest assured there are always up-to-date data on VarSome. However, that’s not all.
In particular, VarSome is also a thriving global Human Genomics Community of healthcare professionals and researchers sharing knowledge in the form of variant classifications, publication links, or discussions, hence further enriching the VarSome’s aggregated knowledge base.
One of the benefits of such a massive aggregated and harmonized database is that it can be applied in further downstream processes, such as automated variant classification according to the guidelines of the American College of Medical Genetics and Genomics (ACMG).
In 2015, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) published updated standards and guidelines for the clinical interpretation of sequence variants with respect to human diseases on the basis of 28 criteria . However, variability between individual interpreters can be extensive because of reasons such as the different understandings of these guidelines and the lack of standard algorithms for implementing them. To address these problems, VarSome has implemented 21 ACMG criteria for automated interpretation of the clinical significance of sequence variants with a manual adjustment step.
To this extent, the main aim of VarSome’s ACMG implementation is to correctly present the most salient data available and help the users to quickly identify those variants that require additional clinical scrutiny. The ACMG classification is provided for research and educational purposes only, as indeed are the ACMG guidelines themselves.
VarSome’s implementation of ACMG guidelines consists of two major steps:
- Automated scoring on each of the 18 pieces of criteria.
- Manual review and adjustment on specific criteria to arrive at a final interpretation.
During the first step, VarSome's proprietary database consisting of more than 30 different databases serves to obtain necessary annotation information on variants for interpretation of pathogenicity of a given genomic variant. By doing so, VarSome gathers and presents all relevant evidence for subsequent manual review. Automated scoring is based on default parameters and users are advised to examine detailed evidence and use prior knowledge on ethnicity and/or disease to perform manual adjustments. In certain cases, we have taken into consideration expert opinions from VarSome’s Scientific Advisory Board and VarSome’s global community.
During the second step, the user can manually adjust each of the criteria on the basis of prior information (such as a variant’s de novo status) or his or her own domain knowledge to reach a final interpretation.
Manual ACMG Classification
We have striven to make the best use of the available data and implement as many of the rules as possible. Many of the rules require judgment calls, which whilst we have codified as many as possible, still require a human to review the final verdict. Some rules require information for which we can find no publicly available sources of data, or require patient and family member data which we do not currently support.
Once you make the judgment you may want to store the final verdict to avoid the need to go through the same process should you encounter the same variant again. Such a feature is available in VarSome Clinical, our clinical tool for processing and interpretation of NGS data, starting from FASTQ or VCF, which allows you to privately reclassify the default ACMG verdict or set up completely custom classifications and comments for variants.
Once you annotate your samples in VarSome Clinical, the data presented is a snapshot of what was known at the time you annotated the variants and will not change over time, whilst the ACMG classification displayed in the free VarSome.com always uses the most recent data available. You may, therefore, see different data in VarSome.com than VarSome Clinical, particularly if the analysis was performed 6 months or a year ago. In VarSome Clinical you may even filter out the variants where the ACMG verdict has changed with respect to the latest classifications available.
VarSome's MolecularDB underpins the ACMG classification, allowing for extremely efficient look-up operations against more than 30 different databases. More specifically:
- Transcripts: RefSeq & Ensembl, from which we deduce coding & functional impacts.
- Frequencies: GnomAD exomes & genomes frequencies & coverage.
- Pathogenicity: UniProt, ClinVar and VarSome user classifications.
- Proteins: UniProt regions.
- Splice Site Prediction: scSNV via dbNSFP, note: this is limited to SNVs only.
- Conservation: GERP++.
- Genes: CGD for the mode of inheritance & links to diseases, ExAC for probabilities of tolerance or loss-of-function.
- Computational Predictions: DANN, GERP, Cosmic, FATHMM, and other databases via dbNSFP (SNVs only): LRT, MetaLR, MetaSVM, MutationAssessor, MutationTaster, PROVEAN, FATHMM-MKL & SIFT.
Please note that VarSome’s ACMG classification works best with hg19 as the reference genome. Regarding hg38, in gnomaAD there is no coverage data available, which renders ACMG guidelines less useful. GnomAD promised they will produce native data from the same samples aligned to hg38 in the future. Also, we will soon add the BRAVO dataset, that contains data from more whole genomes than gnomAD.
Transparency & Clarity
Each ACMG rule implemented in VarSome provides a detailed explanation of why it has been triggered or not. This makes the workings of the system clear and transparent, whilst also ensuring that the explanations are always fully consistent with the coded logic.
All thresholds used are explicitly visible in the explanation, and the annotation itself only uses the data available in the VarSome’s aggregated database. This not only guarantees consistency, but it also makes it possible for the user to verify the classifications by looking up the corresponding data.
VarSome’s ACMG annotation methodology is constantly under review following feedback from its global community and from our Scientific Advisory Board.
The ACMG guidelines were intended for human interpretation rather than machines, and whilst there are cases with strictly defined thresholds (variant population frequencies for example) many of the rules are really up to human judgment and experience.
VarSome’s ACMG implementation uses approximately 30 internal constants for these sorts of heuristics. The values for these thresholds have been established by looking at well-known variants, taking direction from our Scientific Advisory Board and reputable members of VarSome global community.
VarSome maintains a database of "Known Variants" that is used for rules that require statistical heuristics (hotspots, protein functional domains, gene spectra of variation, etc.). This database will be used to deduce, for example, that "most synonymous variants in gene BRAF are benign", or that "missense variants in gene IDS are most likely pathogenic".
The database is constructed using all the variants in ClinVar, UniProt or that have been manually classified by VarSome users. We discard ClinVar entries that are "literature only" in order to improve the data quality (but we do not use review stars here).
VarSome assigns a unique coding impact (exon deletion, splice junction loss, nonsense, frameshift, stop loss, start loss, missense, in-frame indel, synonymous, non-coding) to each variant. Where there are multiple transcripts with differing coding impacts, VarSome picks the single "most serious" coding impact, so each variant is only counted once, thus avoiding any duplication.
When counting variants within a gene, protein domain or region, VarSome groups Pathogenic & Likely Pathogenic variants, and Benign or Likely Benign. Uncertain Significance variants are ignored.
This database of known variants is then used to compute statistics with a region or a gene as follows:
- Gene Statistics: Many genes have a defined spectrum of pathogenic and benign variation. VarSome displays a table with the numbers of all known benign/pathogenic variants for a given gene, grouped by their coding impact. These are used in rules PP2, BP1 & PVS1.
- Protein Domains: If a variant falls within a known functional domain (per UniProt) we then count the pathogenic/benign variants within the domain in order to trigger rules PM1. If there are at least 10 known variants, and 2/3rds of them (66.7%) are pathogenic, then that will trigger rule PM1.
- Hotspots: To determine whether a variant is in a mutational hotspot, we count all the known variants within 18 base-pairs (6 codons) of the variant, effectively scanning a region of 36 base-pairs centered on the variant. These counts are then used for PM1 as above.
- Rule BP3 also uses this database of known variants to verify that there are no known pathogenic variants within (or near) a repeat region.
Currently, there are limited configurable options: users can only instruct VarSome to filter by ClinVar stars or disable databases such as UniProt or ClinVar.
In the future, VarSome may allow users to adjust the thresholds used internally and store common configurations to be used in their organization or laboratory group.