Filtering and ranking
Moon generates its result by sequentially filtering and then ranking variants.
Firstly, low quality variants, common variants in the general population and known benign variants listed in ClinVar or the lab's KB, are excluded from the analysis. In the case of a family analysis, segregation of the variants is also taken into account, although some non-segregating variants might still be retained. These initial filtering steps are followed by ranking of the remaining variants based on annotation from both public and proprietary data sources, including info on predicted variant effect and phenotype overlap.
With regard to variant quality, the following criteria are applied for filtering:
- sequencing depth (DP in VCF) > 3 (if QUAL > 30) or > 8 (if QUAL =< 30)
- allele depth (AD in VCF) > 2 per allele (if QUAL > 30) or > 4 per allele (if QUAL =< 30)
- allele ratio (AD of allele 1 divided by total AD) has to be between 0.2 and 0.8
- genotype quality (GQ) >=20
Moon uses annotations from public data sources including:
- gnomAD: frequency of variants within the general population
- dbNSFP: annotations regarding protein prediction scores and conservation scores
- dbscSNV: splice site prediction annotation
- ClinVar: variant-level clinical classification
A proprietary disorder model (called Apollo) is used to match the input phenotype of the patient with symptoms of known human diseases with Mendelian inheritance. In-house developed artificial intelligence algorithms provide efficient filtering and ranking of the variants with respect to the patient’s phenotype.
The results are presented as Primary findings, consisting of variants that fulfil all applied filter criteria, and as Primary carrier findings. The list of carrier findings consists of variants that are in accordance with all filter criteria except for zygosity, as these are all heterozygous variants located in genes that are associated with recessive conditions.
Similar to the SNV analysis, Moon generates a list of potential provisional diagnoses by sequentially filtering and ranking the structural variants in the uploaded VCF.
First, known benign CNVs (DGV) are filtered out, as these are unlikely to contribute to a disease phenotype. CNVs are excluded if they are completely comprised by a known benign CNV. For this filter step, Moon checks overlap with the stringent set of CNVs in the CNV map presented by Zarrei et al. (2015) (denoted as ‘Nat. Rev. Gen. CNV Map - Stringent’ on DGV).
From the remaining variants, only those SVs involving known disease genes are retained (based on Moon's disorder model Apollo). If a family analysis is performed, segregation of the variants is taken into account. In family analyses, SVs with at least 80% overlap are considered to be identical. Non-Mendelian inheritance patterns (eg. incomplete penetrance, imprinting) are also supported in SV analysis.
Finally, Moon performs a final filter step based on phenotype overlap between the input HPO terms and known disease presentations (Apollo). At least 1 matching HPO term between known diseases and the input phenotype is required.