I want you to hold your hand up in front of your face if you know someone who has had cancer.
Now, I want you to keep it there if the person you knew underwent the pain of chemotherapy.
Study your hand, your fingerprints, the uniqueness of your hand that makes it different to anyone else’s on the planet. Now, what if I told you that the key to stopping cancer was not outside our bodies, but hidden deep inside them. In our individual genomes.
By taking a patient’s tumour DNA, the secret biological code that defines the cancer, we can make sense of the code through computational analysis, eventually unveiling which mutations caused the cancer. Oncologists can then use these insights to select targeted drugs for treatment. And once a genome is sequenced, the data is valid for the detection of genetic abnormalities in other disease contexts. Although personalised treatment is far superior to traditional approaches which are time-consuming and imprecise, it is still in its infancy and requires further refinement.
Sequencing a genome is a straightforward process which involves the sampling of blood or a tumour biopsy followed by a biochemical extraction of the DNA contained in the original sample. In the final sequencing step, the DNA is fed to powerful machines that can transcribe the code into 4 different letters: A, T, C, and G, representing DNA base pairs which make up the DNA. In today’s market, the process of sequencing costs as much as buying an iPhone and takes less than a day, both of which are expected to decrease due to advances made by Illumina and Oxford Nanopore. This begs the question, why doesn’t every patient get their genome sequenced if it is so cheap and fast?
Although the issues associated with sequencing have mostly been resolved, the raw data generated from the sequencing process is useless unless you can give it meaning. That’s where the unsung superheroes come in: bioinformaticians. Their job is to take raw data generated during sequencing, and decipher the intricate code hidden within the tumour’s DNA. Effectively, the analysis of the raw data leads to actionable insights, such as the identification of existing mutations that can cause cancer or other genetic diseases. This is an extremely time-consuming process which requires the expertise of highly-skilled bioinformaticians, who have both an understanding of the biological underpinnings of genomics and the computational know-how. Unfortunately, the generation of sequencing data today is outpacing their ability to analyse data, resulting in a chronic shortage of bioinformaticians.
Even if the shortage were somehow resolved, the process of analysing genomic data is so drawn-out that insights would no longer be actionable in a clinical setting as the disease would have already progressed, requiring up-to-date sequencing data. Further compounding this problem is that most bioinformaticians do not work within clinical environments, but rather for pharmaceutical/biotech companies and research organisations.
Regardless, all bioinformaticians waste precious time engineering efficient and reproducible solutions that allow them to run and scale analyses over HPC and cloud infrastructure. What’s more, they are often disconnected from collaborators since the computational ecosystem is not always identical, thereby complicating reproducibility between analyses. By working in silos, bioinformaticians are less efficient which considerably slows down the whole process.
What if there was a way of making bioinformaticians 100x more efficient, while also providing a solution that assures maximum synchronization between bioinformaticians present within the ecosystem?
Bioinformaticians to the rescue: the story behind Lifebit
Being bioinformaticians ourselves, my co-founder, Pablo, and I set on a journey to address these issues that we dealt with on a daily basis, eventually leading to the creation of Lifebit.
Our first objective was to empower scientists with limited computational skills and/or access to computational resources, by facilitating access to scalable computational power on-demand. Simply put, we wanted to enable individuals to utilize the cloud anytime, anywhere in order to efficiently complete their analytical tasks.
In parallel, we wanted to bridge the gap between three essential components of every biodata analysis: data, workflows/pipelines, and computational resources. Data is normally stored on either local storage systems, public databases or cloud storage systems. Workflows, that allow bioinformaticians to systematically perform data process and analysis, on the other hand, live within private or public Git repositories (GitHub, BitBucket, GitLab) or container hubs (DockerHub). Computational resources, which are the workhorses that power these complex and massive analyses, are distributed among on-premise clusters, high performance computing systems, or on the cloud.
Every time an analysis is ran, a bioinformatician needs to access computational resources, then needs to make sure that these can access data by either mounting the storage volumes or moving the data to the compute resources. Only afterwards can the bioinformatician pull the workflows from the Git platform or the container hub, and configure and provision the compute environment, to finally run the analysis. This is unsustainable in the long run, even for superheroes.
Several platforms exist that offer gold-standard datasets, out-of-the-box applications, and rather complicated ways of migrating workflows. Although useful in some cases, these systems greatly suffer from the lack of flexibility and don’t integrate with the way most bioinformaticians work. In effect, these platforms operate like black boxes, as they fail to provide the user with full control and transparency. What’s more, costs can be high and lockdowns are usually enforced, which bars the user from accessing past work if they decide to no longer use the platform. Lastly, the engineering and design principles behind these systems/platforms promote working in silos, rather than empowering bioinformaticians to interact more broadly with the complete ecosystem.
A new superpower: Introducing Deploit
To provide a definite solution, we launched Deploit – an intelligent data-analysis management system. Deploit operates as an interface layer that allows users to bring and manage data, workflows/pipelines and the necessary computational resources all in one place, so that running an analysis becomes an optimised task within a centralized ecosystem, rather than a laborious disconnected undertaking.
With zero onboarding and zero config needed, Deploit allows for a quick start where the only step required is to link cloud account(s) – AWS, Azure, and Google Cloud (coming soon). Next, users can access data by either directly uploading from their computer, or by synching data from private/public cloud storage or any publicly available database. Workflows are then ported by simply copy-pasting the GitHub, BitBucket or DockerHub URL. In posterior analyses, existing workflows can be reused and managed to obtain consistent and reproducible results. Deploit provides the seamless management of collaborative projects by facilitating workflow and configuration sharing among users of the same team, while also eliminating the cost of data duplication and transfer. And while the user only focuses on the analysis and collaboration logic, Deploit does the heavy lifting of cloud configuration, elastic cluster deployment and provisioning, container orchestration, monitoring, audit trails, security and so much more.
We designed Deploit in the hopes of it becoming an essential tool allowing the acceleration of bioinformaticians’ work, thereby minimizing the time it takes to make sense of unique genomes. With this new superpower, bioinformaticians will be able to usher in a new wave of genomic knowledge gain, translating into optimised personalised treatments in oncology and other disease areas. This is our best chance to win the fight against certain diseases and cancer, and in turn, significantly reduce the number of hands that went up.
We are still only beginning and we have big plans for the future! Don’t miss a thing and let us keep you updated on our upcoming posts by signing up to our Newsletter or by visiting us on Twitter, LinkedIn, and Facebook.
We’re also actively looking for great engineers and bioinformaticians who want to help us shape Lifebit – if this is something you’d be interested in, we’d love to hear from you: firstname.lastname@example.org.