How to enable a genomic revolution without risking a data privacy catastrophe
Data security and compliance are recurring themes in the many conversations I have with researchers and organisations in life sciences.
The democratisation of Next Generation Sequencing technologies has made omics data more available than ever before. Bioinformaticians are now confronted with tackling the NGS data deluge in order to deliver actionable insights. Transforming raw data into impactful insights requires bioinformaticians to process omics data with bioinformatics pipelines. To date, however, no validated standards have been widely established by the bioinformatics community.
Over the last few years open-source has been increasingly becoming the norm, even in Bioinformatics. The number of high-quality applications which are freely available on GitHub and other Git providers is increasing, such as the pipelines that the Broad institute uses for production.
In December 2017, the team at Google Brain joined the effort by releasing an open source, deep learning based variant caller: DeepVariant. DeepVariant outperforms its competitors by accuracy – it even won the accuracy award at the precision FDA Truth Challenge.