An engine tailored to healthcare research is the crucial next step in bringing the benefits of AI to our patients

Author:

  • Lisa Di Jorio, Director of AI Research & Strategy, Imagia 
  • Becks Simpson, ML Technical Lead, Imagia
  • Cecile Low-Lam, Biostatistics Researcher, Imagia

When Imagia was founded in 2015, we had a vision of improving patients’ lives through the latest, most remarkable breakthroughs in artificial intelligence, in particular those afforded by deep learning. And indeed, recent advancements in deep learning have confirmed and expanded the possibilities of AI in the medical field, for instance in early cancer detection, or therapies tailored to individuals.

However, as a highly regulated, multi-stakeholder domain, medicine makes the realization of the full potential of AI a complex matter.

In order to reach our vision, we first needed to create a tool that would fluidly bring AI to healthcare, and healthcare to AI. This is why Imagia built EVIDENS™. EVIDENS is a complete, functional solution to the urgent need of applying artificial intelligence in healthcare and clinical research that operates directly on live clinical data.

Still, to produce this continuous digital pipeline, which gives users functionalities starting with live clinical data all the way through to discovery, we had to take a hard look at how Imagia had been delivering AI projects over time. This allowed us to identify challenges, opportunities and requirements throughout our processes, leading to a unified AI workflow, and therefore an AI engine, that makes it easier for AI experts and non experts alike to produce insights from their clinical data.

 

Imagia’s technological genesis: a series of hard-earned lessons

Machine learning is at the core of all Imagia’s projects: from delivering a real-time polyp malignancy prediction embedded in a medical device, to building a radiomics pipeline for immuno-oncology treatment response prediction, or proposing a tool for correctly de-identifying radiology reports. At one point or another, all our teams rely on AI to deliver cutting-edge technologies, allowing them to constantly push the boundaries of knowledge and contribute valuable insights to the research community.

But in the early days of Imagia, deep-learning frameworks were not as mature as they are today; there was no single, front-running technology that presented itself as an obvious choice to all teams. So every team chose a different technology to deliver its projects.

The result: peer-reviewed research, promising prototypes, and industrial-grade products.

But combined with another, more unfortunate result: three distinct machine-learning pipelines, each with its own unavoidable satellite tools, such as a data pre-processing and customized transformation pipelines – not to mention any number of personalized mini frameworks designed and written by our research science interns for their own projects.

And so, while we did deliver great projects, these wins came at a high price. Collaboration between projects proved to be very challenging. Project knowledge was scattered among a few individuals, which required constantly repeating time-consuming knowledge transfers. Porting our research prototypes into our product was disproportionately difficult.

These are the first-hand, hard-earned experiences that informed our decision to create a unified AI system.

 

We broke down our workflow to develop Imagia’s unified AI system

To better understand our standardization problem, we mapped out the life cycle of our AI-based projects, allowing us to detect and address underlying issues at each step.

This analysis allowed us to draw a common journey for our successful projects, and also allowed us to identify how the AI projects in healthcare research distinguish themselves from other AI projects in specific yet crucial ways.

We broke down the workflow into the five phases shown in the image, and you can read more about our extensive analysis here.

 

 

We stayed focused on answering the needs and challenges of AI experts and non-experts alike

After breaking down and reconstructing our AI project workflow, we integrated it into EVIDENS, a significant step in ensuring that our solution could easily be adopted by internal and external users alike.

In fact, ensuring a great experience across these diverse users became our focus. In attempting to standardize and optimize certain processes of the workflow, we took great care in taking into account multiple key-to-success factors, notably flexibility, and robustness and reproducibility.

“The target we set ourselves was to bring AI to the varied user profiles in the medical fields, from non-AI experts to seasoned data scientists.

We wanted to propose a unique system that was easy to use, whether you want to visualize results or hack a model or more.”

 

User Need: The flexibility to adopt the tool regardless of the level of AI expertise

One of our greatest challenges was to propose a tool with the flexibility to answer different user needs: a biostatistician does not interact with a system the same way an applied research scientist or a PhD student does. In particular, we took heed that some users need to easily operate the engine without any AI knowledge in order to train selected models, while others need to develop new models and want to control and change details of the implementation to tailor the system to a specific task. On the other hand, our students need to be able to run as many experiments as possible to easily debug or try out new ideas, and to keep track of the large amount of experiment data typically required for a publication.

We solved this challenge by proposing a standardized and configurable workflow that enables users in the company to build and operate machine learning at scale. To ensure constant integrability and a fast way to productionize any model deemed ready, our AI Developer team followed the design principles of the other modules in EVIDENS.

 

User Need: The robustness to reach medical-research levels of reproducibility

Additionally, particular attention was paid to robustness and reproducibility. Some machine-learning systems, and more specifically deep-learning systems, are known to be non-deterministic, which is to say that repeating the exact same experiment on the exact same data can lead to slightly different results. This can be due to a variety of factors, from hardware instability to using randomization in the code.

This is not useful in the healthcare field. Ultimately, our AI methods are designed to be applied on patient data – we are aiming to have an impact on human lives, so fluctuating results can undermine trust in our AI. This is why our teams have a very low tolerance for non-reproducible systems and an appropriate framework for testing has been implemented.

Truth is, there are a lot of great tools out there, but none were created to operate within the medical field, and were thus not complete enough to fulfill all of our needs. We drew our inspiration from the best, however, from Uber Michelangelo to Google Kubeflow, not to mention MLflow.

Here is what we came up with.

 

Imagia’s AI Engine integrates our healthcare-specific key learnings and best practices

After these extensive analyses, our list of challenges, opportunities, requirements and best practices is long – but our vision remains unchanged: creating a tool that makes it easier for AI experts and non-experts alike to produce insights from their clinical data.

Here are a few ways we brought all the pieces of the puzzle together into a single solution.

“The right solution for bringing AI to healthcare has to balance fast, flexible development, with reproducible, traceable experimentation.”

 

Conventionalizing our code: the Imagia Research Template

Our first step was to standardize our code structure. Before our AI Engine, each team had their own way of organizing code. Model definitions were never found in the same place, nor were training loops nor data transformation pipelines. Some teams were relying on configuration files with varying formats, while others adopted an all-code approach.

We unified all these approaches into a single, highly standardized and documented code structure, referred internally as our “Research Template”. This Research Template provides us with the underlying plumbing to connect models, data transforms and any other pieces that a researcher might need. This makes all AI-based code readable, maintainable and flexible. Users who want to run experiments with existing models and/or dataset readers and/or transforms are now able to use a simple configuration file written in human-readable terms. This vastly simplifies their research workflow. Our students were also migrated to this structure, so team members are now able to quickly dive into their code to provide support – knowledge transfers are quicker and easier than ever. Some of our alumni were so enthused by the  system that they used the  research template in projects initiated outside Imagia. This has inspired us to work on an open-source version of our latest iteration of the template, which we hope to share soon.

One last (but not least) huge benefit of having this template at the core of our AI Engine is the possibility of rapidly integrating AI-based features into EVIDENS. This means we can go from developing a research prototype to validating a model on real-world data in an instant.

This virtuous feedback is so exciting and motivating – our researchers get to quickly witness the impact of their idea in the real world, and adjust their model if the results are not what they were expecting.

 

Designing our architecture with an eye on sharing, versatility and balance

As we developed the blueprint for our AI Engine, we were guided by our intentions to (i) support major machine-learning libraries such as TensorFlow, PyTorch, scikit-learn, (ii) be provider-agnostic, as our AI Engine is deployed in the cloud as well as on premise in hospitals and (iii) ensure a balance between good software principles (such as containerization for every change) and allowing the research team to go fast. As such, our AI Engine consists of a mix of open-source and in-house components.

A best practice in software architecture is to encapsulate core functionalities that will be used by all in packages, and separate instances of custom code by project. This is the design we settled on, which is easy to say, but much harder to put into practice. Our Research Template has been key to our success, serving as the repository for custom code packages and containers accessible to projects.

The main benefit of this setup is increased reuse of existing components, with flexibility to add new components. This means that the container-building process happens less frequently – a real timesaver for our researchers. The other benefit of our setup is that it allows flexible integration of those core packages into different usages depending on the researcher profile. Whether you are a pure ML developer working with a CLI tool or a biostatistician exploring data with Jupyter, you can use these packages.

Our researchers can develop their new code in a deployed container and migrate their changes back to the repository once they’ve been proven on real data. We have the perfect balance between fast, flexible development and reproducible, traceable experimentation.

We also favoured open-source tools for saving models and tracking experiments. These tools were tested extensively, and approved by different users across teams. For example, we have adopted MLflow since its launch in 2018, and we explored multiple MLOps tools until we settled on Polyaxon for its distributed-training capabilities and compatibility with our technologies. Our containerization strategy combined with our standardized Research Template framework also means that our project experiments can be run as standalone pods on a cluster (like a microservice) or run as jobs by our chosen scheduler, Polyaxon.

A real-world use case of Imagia’s AI Engine

 

Imagia’s AI Engine was recently released to our biostatistics team, who is aiming to discover candidate image-based biomarkers in a multicentric study involving more than 500 data points and multiple outcomes. The team’s goal is to create a model that will predict clinical outcomes from a patient image.

Before

In previous iterations of the project, model development relied heavily on Jupyter notebooks. Jupyter notebooks are a useful tool to explore models interactively, but they can suffer from performance and reproducibility issues.

In this case, AI scientists had to manually run multiple notebook cells in the correct order, copy subsets of code to explore parameters, and checkpoint models and notebooks (causing potential reproducibility issues). They also had to combine data with code in the same notebook (introducing readability issues), and they frequently encountered memory overflow (efficiency and time issues).

These issues were exacerbated by the time-sensitive nature of the project, pushing a lot of responsibility onto individual users, which created a lot of headaches when transferring knowledge later on.

Now

The bulk of model exploration is now done with the AI Engine. Iterative variations of modelling steps are handled by dedicated sections in configuration files, allowing the team to both run and save multiple pipelines in parallel.

The resulting models and associated artefacts and outputs are automatically versioned and can be customized by the user to show the most relevant information.

Time-wise, the largest impact was observed during preliminary model exploration. Custom feature engineering, the process of summarizing image information into inputs suited for an AI model, went from a 1 day operation requiring engineering support, to a 1 hour step performed independently by the researcher. Statistical modeling is now 3 times as fast. AI scientists are now able to quickly discard the least interesting models and inputs, affording them time to focus on the most promising models and explore them interactively using their favourite tool.

 

Stay tuned for the next advancements

We are proud of the AI Engine at the heart of Imagia’s most exciting research. It was developed as a single tool across tasks, with clear arguments, options, and syntax for the user, transparent error logging, and the ability to customize model artefacts and outputs. It has become a central piece in the delivery of research at Imagia, and can be operated directly within hospitals or on the cloud.

Our work to date covers the AI system up to the prediction phase, and we are actively working on adding monitoring capabilities in our next iterations.

If you’d like to be the first to know about the progress of these functionalities, and when the open-source version of our Research Template goes live, subscribe to our monthly newsletter!

Related posts

Digital Technology Supercluster Announces Investment to Increase the Effectiveness of Precision Oncology

Digital Technology Supercluster Announces Investment to Increase the Effectiveness of Precision Oncology

Harnessing artificial intelligence to take the guesswork out of diagnosing cancer recurrence for millions of cancer survivors

Read more
How to Bring Biomarker Testing In-House for Cancer Targeted Treatment Selection

How to Bring Biomarker Testing In-House for Cancer Targeted Treatment Selection

Personalized cancer treatment via targeted therapies is two-to-three times more effective than standard chemotherapy for patients with advan

...
Read more
Imagia Cybernetics & Canexia Health Merge to Supercharge Precision Oncology Accessibility

Imagia Cybernetics & Canexia Health Merge to Supercharge Precision Oncology Accessibility

Imagia Cybernetics, an AI-healthcare company that accelerates oncology solutions generated from real world data, today announced its merger

...
Read more