The commoditised genotyping array market has generated increased interest with various large direct-to-consumer (DTC) genotyping companies in the market, including 23andme, AncestryDNA, DNAFit and MyHeritage. Contrary to whole genome sequencing, genotyping arrays are cheaper and provide the economy of scale for mass adoption at a consumer level. In 2018, it was estimated that up to 10M genotypes were generated through this technology . As such, personal genetic information is becoming one of the largest resources for biomedical research to uncover genetic associations in human disease.
However, as genotyping only covers about 0.2% of the genome and misses a lot of valuable information, DTC genetic testing companies use a process called imputation to ‘fill in the blanks’.
What is imputation?
Imputation is the process of statistically inferring unobserved genotypes from known haplotypes in a population and is usually performed on Single Nucleotide Polymorphisms (SNPs). When you impute, you assume that if one particular SNP is present then a certain number of other SNPs should be present as well. Effectively, imputation replaces missing genotypes by basing results on existing information using predictive and statistical algorithms.
The most common downstream analyses requiring Imputation
Genome-Wide Association Studies
Genome-Wide Association Studies (GWAS) allow scientists to pinpoint genes involved in human disease by identifying the genetic variants present in patient cohorts that are absent from healthy individuals. GWAS studies are crucial to contributing to developing better diagnosis, treatment and prevention methods to tackle human diseases.
GWAS rely on indirect associations between marker SNPs at different locations in the genome. Importantly, the more SNPs available across the genome, the more statistical power the analysis has to detect associations. Researchers need to impute genomes before performing GWAS as it increases the number of SNPs identified in genotyping studies, which are often designed using different platforms and genetic markers.
Genetic ancestry testing is a way for individuals to learn more about their genealogy through their DNA, as certain patterns of genetic variation are often shared among people of similar backgrounds. The three types of ancestry testing commonly used for Ancestry are:
- Y-chromosome testing to explore ancestry in the direct male line,
- Mitochondrial DNA (mDNA) testing to provide genetic information about the direct female ancestral line, and
- SNP testing which evaluates a large number of variations across a person’s genome.
To improve the accuracy and resolution of an individual’s ancestry, a larger number of variants (SNPs) need to be taken into account. As such, performing imputation analysis prior to running these Ancestry tests is critical.
Reporting on genetic risks & traits
DTC genetic testing companies are going beyond disease risks (GWAS) and Ancestry by offering specific lifestyle recommendations based on genetic risks and traits. Recommendations can range from diet changes to preventing certain diseases. For companies to provide informed recommendations, imputation has to be performed on the genotyping data to amplify the number of SNPs used during analysis.
The evolution & challenges of imputation methods
Across the years imputation methods have been refined to accommodate panels containing thousands of individuals and to improve statistical accuracy by taking into consideration more population studies.
Yet, the quality of imputed datasets is heavily reliant on the software used, as well as the reference and the size of the population chosen. Furthermore, the time and costs of running such analysis can be prohibitive for the economies of scale to take effect. Lastly, the high technical expertise to perform such analysis hinders a wider offering from DTC companies, especially beyond the genomics market.
There is an urgent need for a solution that can provide accurate imputation in a streamlined, large-scale and cost-and-time-effective manner. Such a solution has to also efficiently provide the computational and storage resources and infrastructure required tackle an ever-increasing amount of genotyping data, and routine and updated analysis. Currently, there is no platform or software in the market that provides such services to third parties…until now.
Lifebit’s CloudOS Platform provides the world’s first enterprise-level, robust & scalable solution to imputation
At Lifebit, we solve the challenges of imputation by evaluating the latest developments in the field (i.e. Beagle 5.0 ) and implementing them in a new method that is:
- Federated & Ultra-secure: able to run over distributed data in your own secure cloud of choice.
- Ultra-rapid: >35% faster than the leading industry standard (Michigan Imputation Server)
- Scalable: run millions of samples in parallel
- Easy-to-run: only 3-clicks of the mouse
- Cost-effective: $2 per sample
This allows anyone to simply run their analysis by connecting their cloud and data to the platform, without requiring further engineering and bioinformatics expertise. Effectively, Lifebit provides an out-of-the-box solution for any imputation needs DTC genetic testing companies may have.
We would like to know what you think! Please fill out the following form or contact us at firstname.lastname@example.org. We welcome your comments and suggestions!