Genomics at Imagia : How AI can help unlock the clinical power of genomic data

Gabrielle Bertier DHDP Partnership Manager

 

The power of genomics

From my uncontrollable distaste for Cilantro, to my family’s increased risk of developing breast cancer, a lot of information is encoded in my genome. These 3.2 billion letters-long sequences of A, T, C, and Gs contained in every single cell in my body not only impacts how I taste food, and my health risks, but is also both completely unique to me, and can therefore be used to identify me. It is also partly shared within my family, the ethnic groups I belong to, and the entire human population. Because this information is so unique, and so powerful, it was thought that accessing it would have the potential to eradicate disease altogether. So how is genetic data used in healthcare today?

Genetic testing is available in a variety of clinical scenarios. From prenatal genetic testing, to newborn screening, hereditary cancer screening, rare disease diagnosis, or determining a patient’s likelihood of responding well to specific treatments, genetics has permeated many domains of medicine. In my family for instance, where many women were affected by breast and other cancers, doctors decided to investigate if genetics played a role in our family’s health. They first identified a BRCA-2 genetic mutation in my great aunt, which gave her an increased risk of developing Breast or Ovarian Cancer. They recommended that all women in the family get tested, and those who tested positive for the mutation were offered frequent follow ups, and even preventative surgeries to minimize their risk to develop the disease. The predictive power of this genetic information is just one of the ways in which genomics can impact cancer care. Indeed, all cancerous cells harbor genetic alterations that, if identified and understood properly, can help us detect cancer early, predict how a specific tumor will respond to a treatment, and match a patient with a specific drug.

Genetic sequencing technologies are most commonly used in oncology, cardiology and immunology, and are continuously improved. From testing for a specific letter change or “single nucleotide variant” in a precise location of the genome, to the analysis of the 3.2 billion letters or “base pairs” that compose the entire human genome, technologies have improved dramatically, and the cost (in time and money) of producing this data has dropped at a remarkable rate. To demonstrate this, we geneticists like to compare what it took to first sequence the human genome in the 1980s (over 2 billion dollars, an international team of hundreds of scientists, and a total of 13 years [1]) to what it takes today (a whole genome sequence can be produced on one machine in a couple of days, for less than 2.000 dollars).

However, one may argue that there is still a lot of progress to be made. Indeed, we are far from having solved all major health issues, and the prognosis for most patients diagnosed with cancer today is still grim. Of course, genetics can’t solve it all, and many other factors – such as our environment, diet and lifestyle – play a major role in our likelihood of developing diseases. Still, we have virtually no understanding of what the majority of our genome actually does (the sum of all 20,000 genes represents only around 2% of our full genome sequence!), and have only scratched the surface of how our genes interact with each other and with other elements in our bodies. So, let’s recap. Each of the 30 trillion cells in a human body contains a 3,2 billion letter code, composed of 4 letters, within which are contained 20,000 genes, and 64 million letters outside of genes… Every second, each cell activates a specific combination of genes to perform its core functions. When errors accumulate in cells, it can produce tumors and lead to cancer… In the end, it seems that understanding genomics is a “big data” problem, so, could AI help move the needle?


Source: https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data

Current limitations, and how we at Imagia are addressing them

First, even though it is becoming more common in Canada, generating clinical grade genomic data is still expensive, not part of routine clinical care for most patients, and available only in large research hospitals. The data produced is so large (several GB per patient for a whole genome sequence) that it is not stored in hospital Electronic Health Records (EHR), and is therefore not readily available for research. The genetic information stored in patients’ health records is often in the form of a text report from a clinical geneticist describing the presence or absence of genetic mutations, and an interpretation of how this affects patient care.

To address this gap, at Imagia we have launched research projects that look at ways to infer genetic status by analyzing standard-of care clinical images. For instance, we are analyzing Computed Tomography (CT) and Positron Emission Tomography (PET) scan images of patients with Lung Cancer who have had a genetic test (RNAseq, or the sequencing of RNA, which is the product of active genes). In these cancers, which are the most common in adults in Canada [2], genetic tests are used in the clinic to define what treatment is most appropriate. However, it requires a biopsy of the tumor, which is an invasive procedure, and results can be lengthy to obtain. If our machine learning algorithms can find markers on the image that predict genetic test results, this could allow a faster, more efficient matching of patients with the best treatment. Being able to generate genetic insights without ever having to run a full range of expensive genetic tests, could mean increasing access to personalized medicine in Canada.

Second, clinically generated genomic data that is accessible for research is not only scarce, but it also critically lacks diversity. Indeed, most genomic data generated to date is from people of European ancestry [3] and this heavily impacts our ability to interpret genetic mutations, which frequency and mechanism of action sometimes differs across populations [4].

Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5089703/

Just like in the genomics community, the issue of biases is also heavily discussed in the Artificial Intelligence community, and researchers globally are grappling with this problem [5]. One of the ways to solve this issue is to share data – because more powerful, reproducible and generalizable results can be achieved if more, and more diverse data are produced and shared across institutions, and across provinces.

However, two issues arise when you want to share patient data: competition is fierce, and there are privacy and other legal concerns. Indeed, sharing patient data broadly can be perceived as risky:  there are complex federal and provincial regulations at play in order to protect patient privacy, especially if it contains personal information, or is considered “identifiable”, such as a whole genome sequence. As data custodians, healthcare institutions are in charge of ensuring this data is secured and appropriately protected, which sometimes generates a reluctance to share. For companies, this data may also contain information and knowledge protected by Intellectual Property provisions. And for researchers, who rely on scarce and extremely competitive funding to produce data and generate publishable results, sharing data can mean losing one’s competitive advantage.

To address these problems, Imagia has developed a technological solution: Our EVIDENS platform is based on the concept of federated learning, where raw patient data always remains within the institution in which they have been produced, and only insights on the data are shared. Clinicians and researchers are able to collaborate across multiple institutes without ever sharing any raw data, which allows us to overcome this lack of diversity and sample size while alleviating major privacy concerns (for more details you can refer to our previous blog post)

This innovative approach is also used in the new Digital Health and Discovery Platform, a federally funded, pan Canadian initiative co-led by Imagia and the Terry Fox Research Institute. The DHDP aims to accelerate precision medicine by bringing together leaders in the fields of Artificial Intelligence and healthcare. Partners in the DHDP are also developing ways to engage public and private partners in mutually beneficial projects to stimulate innovation and commercialization of clinical products.

Third, there is a lack of standardization in genomic data generation, analysis and interpretation. Although a great majority of genomic data is produced to date on machines engineered by the global industry leader Illumina, the way clinicians and researchers go from sequencing machine outputs to clinical interpretation varies greatly. This is not to say that there are no standardization efforts in progress, the most notable being led by the Global Alliance for Genomics and Health or GA4GH. Imagia is actively participating in this effort, by working directly with Illumina on a project aiming notably at generating and testing the efficiency of standard genomic pipelines. (see our press release here). The problem with standards is that even when they exist, it is challenging to stimulate large groups to use them. In order to incentivize the community to use a standardized approach, state of the art technological and software tools that we and others are developing will be baked directly into our DHDP platform, and we will help fund projects that use them, giving Canadian researchers a strong incentive to include them in their research practices.

Finally, genomic data is most useful if interpreted in the context of a patient’s clinical journey. Genetic data alone is often not enough to gain a full understanding of a patient’s condition, which can only be achieved when combining multiple sources of data: patient records, clinician reports, medical test results (e.g laboratory blood tests), imaging data, etc…

Source: https://www.healthcatalyst.com/insights/social-determinants-health-todays-data-imperative

As a response to this challenge, our EVIDENS platform supports ingestion of multiple sources of data, and we have developed advanced Artificial Intelligence methods to efficiently and reliably combine these rich datasets. For instance, we are working on a project to develop a machine learning (ML) algorithm that can process a combination of clinical data, pathology report, genomic data and clinical imaging data in lung cancer patients. This allows us to generate more powerful models and increases our potential for discoveries.

Our hope for the future

Our vision at Imagia is that genomic data, combined with other clinical data, and analyzed via cutting edge AI/ML technologies, has the potential to help more patients affected by high burden diseases in Canada. In order to take on this challenge, we are partnering with Canadian and global leaders in genomics. Because patients are at the center of everything we do, our team has developed technological solutions to ensure that patient data is always secure and protected, and that their privacy is respected throughout our pipeline. Imagia is actively developing methods to generate discoveries that will be translated into better diagnosis/treatment for all patients, even if they have not had a genetic test. There is still a long way to go until whole genome sequencing is a routine clinical practice, but in the meantime, we believe that AI/ML methods can help unlock the clinical potential of genomic data.

 


 

[1] https://www.genome.gov/human-genome-project

[2] Canadian Cancer Statistics Advisory Committee. Canadian Cancer Statistics: A 2020 special report on lung cancer. Toronto, ON: Canadian Cancer Society; 2020. Available at: cancer.ca/Canadian-Cancer-Statistics-2020-EN (accessed [March 26, 2021]).

[3] Abul-Husn NS, Kenny EE. Personalized Medicine and the Power of Electronic Health Records. Cell. 2019 Mar 21;177(1):58-69. doi: 10.1016/j.cell.2019.02.039. PMID: 30901549; PMCID: PMC6921466.

[4] Bien SA, Wojcik GL, Hodonsky CJ, Gignoux CR, Cheng I, Matise TC, Peters U, Kenny EE, North KE. The Future of Genomic Studies Must Be Globally Representative: Perspectives from PAGE. Annu Rev Genomics Hum Genet. 2019 Aug 31;20:181-200.

[5] Reproducibility in machine learning for health research: Still a ways to go. Matthew B. A. Mcdermott, Shirly Wang, Nikki Marinsek, Rajesh Ranganath, Luca Foschini, Marzyeh Ghassemi, Science Translational Medicine, 24 Mar 2021.

Related posts

How we developed an AI-driven discovery platform that transforms real-world healthcare data into clinically actionable insights

How we developed an AI-driven discovery platform that transforms real-world healthcare data into clinically actionable insights

It seems self-evident that healthcare research could gain tremendous value from using artificial intelligence to derive insights and solutio

...
Read more
How AI could improve lung cancer screening—and help to save lives

How AI could improve lung cancer screening—and help to save lives

Lung cancer is the deadliest cancer in Canada—and the world. Every year, lung cancer kills more than 20,000 Canadians. Of those who receiv

...
Read more
Application of Homomorphic Encryption in Medical Imaging

Application of Homomorphic Encryption in Medical Imaging

A technical 20-page report for next-generation data governance models. Authors: Francis Dutil, Alexandre See, Lisa Di Jorio and Florent Chan

...
Read more