Filtering and ranking
Moon generates its result by sequentially filtering and then ranking variants.
Firstly, low quality variants, common variants in the general population and known benign variants listed in ClinVar or the lab's KB, are excluded from the analysis. In the case of a family analysis, segregation of the variants is also taken into account, although some non-segregating variants might still be retained. These initial filtering steps are followed by ranking of the remaining variants based on annotation from both public and proprietary data sources, including info on predicted variant effect and phenotype overlap.
With regard to variant quality, the following criteria are applied for filtering:
- sequencing depth (DP in VCF) > 3 (if QUAL > 30) or > 8 (if QUAL =< 30)
- allele depth (AD in VCF) > 2 per allele (if QUAL > 30) or > 4 per allele (if QUAL =< 30)
- allele ratio (AD of allele 1 divided by total AD) has to be between 0.2 and 0.8
- genotype quality (GQ) >=20
Moon uses annotations from public data sources including:
- gnomAD: frequency of variants within the general population
- dbNSFP: annotations regarding protein prediction scores and conservation scores
- dbscSNV: splice site prediction annotation
- ClinVar: variant-level clinical classification
A proprietary disorder model (called Apollo) is used to match the input phenotype of the patient with symptoms of known human diseases with Mendelian inheritance. In-house developed artificial intelligence algorithms provide efficient filtering and ranking of the variants with respect to the patient’s phenotype.
The results are presented as Primary findings, consisting of variants that fulfil all applied filter criteria, and as Primary carrier findings. The list of carrier findings consists of variants that are in accordance with all filter criteria except for zygosity, as these are all heterozygous variants located in genes that are associated with recessive conditions.
Similar to the SNV analysis, Moon generates a list of potential provisional diagnoses by sequentially filtering and ranking the structural variants in the uploaded VCF.
First, known benign CNVs (DGV) are filtered out, as these are unlikely to contribute to a disease phenotype. CNVs are excluded if they are completely comprised by a known benign CNV. For this filter step, Moon checks overlap with the stringent set of CNVs in the CNV map presented by Zarrei et al. (2015) (denoted as ‘Nat. Rev. Gen. CNV Map - Stringent’ on DGV).
From the remaining variants, only those SVs involving known disease genes are retained (based on Moon's disorder model Apollo). If a family analysis is performed, segregation of the variants is taken into account. In family analyses, SVs with at least 80% overlap are considered to be identical. Non-Mendelian inheritance patterns (eg. incomplete penetrance, imprinting) are also supported in SV analysis.
Finally, Moon performs a final filter step based on phenotype overlap between the input HPO terms and known disease presentations (Apollo). At least 1 matching HPO term between known diseases and the input phenotype is required.
Only variants that meet quality criteria in terms of DP and GQ are retained. AD and allelic imbalance are not taken into account for mitochondrial variants.
Next, variants listed LB/B in ClinVar or the KB are filtered out. In addition, variants listed as polymorphism on Mitomap and having an allele frequency > 0.2% in GenBank are filtered out. Remaining variants with known disease associations (LP/P in ClinVar or KB, 'disease' variants in Mitomap (either confirmed or reported) are excluded if present at more than 2% in GenBank. Note that although the overall GenBank frequency of a variant might be low, the frequency of this variant in a certain population might be high enough to exclude pathogenicity of the variant. Haplogroup specific frequencies can be manually reviewed on Mitomap, for which a link is provided by Moon.
In a subsequent step, only non-synonymous coding variants in protein coding genes, or any variants in non-protein coding genes are retained, in addition to known pathogenic (LP/P) variants in either ClinVar or the KB. Finally, variants are filtered with respect to the input phenotype of the patient and ranked according to their overall relevance.