AI projects in healthcare distinguish themselves from other AI projects – down to the workflow


  • Lisa Di Jorio, Director of AI Research & Strategy, Imagia 
  • Rebecca Simpson, ML Technical Lead, Imagia
  • Cecile Low-Lam, Biostatistics Researcher, Imagia

When we set out to create EVIDENS™, our AI-driven discovery platform that transforms real-world healthcare data into clinically actionable insights, we had to dig deep to analyze and derive value from everything we were doing right – and everything we could improve.

On that latter point, we knew a key factor in our success would be our ability to formalize and standardize our processes. While some existing AI tools and processes were interesting, they never quite fit the distinct realities of research in healthcare. We’d managed to deliver many successful projects, but with teams and individuals using different processes, tools, engines, frameworks, etc., at every turn.

And so to better understand our standardization problem, we mapped out the life cycle of our AI-based projects, allowing us to detect and address underlying issues at each step.

The result is Imagia’s unified AI system, a workflow that has helped structure our AI engine and EVIDENS.

Here’s where we landed.

An analysis of how we manage our AI / healthcare research projects

As with any AI research or product project, our project journey begins once we both (i) have clearly identified the problem we are trying to solve and (ii) have access to the data in its raw form from the healthcare organization.


PHASE I: Explore the data.

Data: the foundation of machine learning and the launching point for our AI-project journey. Still, we generally can’t work with just any data; it must meet certain quality criteria to ensure it suits the problem we are trying to solve. This can be especially significant for healthcare given the difficulties of accessing healthcare data.

It’s during this phase that our biostatistics experts will conduct an initial analysis that ascertains the data’s main characteristics, identifies trends and outliers, validates hypotheses, as well as split the data to reserve a subset for future AI-model validations. This data exploration can be performed independently of the AI Engine and using any kind of tool.


PHASE II: Produce AI-ready datasets.

AI frameworks are particular; they can only interact with data if it is presented in specific formats.

Medical data comes in many formats, but will only be useful in the AI-project journey once it is converted into a raw form. (For instance, images must be converted into raw values of pixels; text data should be in a raw form, which means proprietary formats such as PDFs are inadmissible.)

But to obtain accurate DL or ML results, data must often undergo a second set of transformations. We might want to normalize CT scans across manufacturers, for example. Or in the case of histopathology data, we might want to segment specific parts of the tissue.

These data transformations make this phase of the AI project particularly complex – with wide-ranging repercussions.

First, data transformations are extremely sensitive, which engenders risk. A bad transformation can lead to wrong results without triggering any warning signs in this or subsequent phases. Second, these transformations require a pipeline able to handle heavy computation while remaining flexible enough  for users to easily explore  different types of data pre-processing and transformations. With this in mind, we identified two types of data transformations:

  • Offline transformations. Some transformations are computationally expensive – they can add hours to the transformation process – and cannot realistically be run every time a model needs access to a dataset. Offline transformations should be performed a single time, with the results stored where the model can access them at any time in subsequent steps.
  • Online / On-the-Fly transformations. The opposite of offline transformations. Here, the cost of running this transform repeatedly during training or for a prediction does not negatively impact completion time, and may involve transforms with parameters that need to vary for different experiments. For example, cropping a chest CT scan to centre on identified nodules is a quick transform that can be performed on the fly.

These transformations differ in their implementation, but if they have a strictly defined and similar format and structure – our main challenge at this phase – this distinction becomes invisible to the end user.  And this standardization then frees our data scientists to focus on writing the content of the transform, the “meat” of the transformation as it were.

PHASE III: Train models on the datasets.

With suitable datasets in hand, we’re ready to train our models. The team now shifts its focus to identifying the most appropriate model for the problem we are trying to solve.

Training models consists of feeding a dataset to a selected model, which gradually learns how to perform a given task with increasing accuracy. This is often where the value of a great AI organization lies: in its ability to find or create the best model efficiently and consistently. In the case of Imagia, our multidisciplinary team of experts in medicine, machine learning and statistics has been optimally developed to build predictive models aligned with medical knowledge.

Still, there is no universal recipe to choose the “best” model. The development of a predictive AI model is an iterative process where multiple parameter combinations are tested (i.e., resampling, pre-processing, variable selection and hyper parameter tuning).

This multitude of tests represents, by far, one of our biggest challenges. It covers a broad range of required features. We must be able to optimally track the sheer amount of data, metrics and results being generated in order to make educated decisions. We want to be able to deploy our systems on various technologies and in various environments. We want to be able to log good models for posterity, so our scientists can reuse them or train them again.

Essentially, this all translates to a large number of candidate models that are created, trained, evaluated, compared and saved. The ultimate decision is based on multiple model artefacts and outputs that can be metrics, graphs or summaries.

During this phase, our goal is to free our data scientist team as much as possible from the burden of training operationalization so they can focus on actually building and creating models.


PHASE IV: Perform predictions in the field.

We’re now ready to send our trained model into the field, to ship it into production. It is robust enough that it can perform reliable predictions on unknown data – though the unknown data’s environment is comparable to the conditions of the training environment.

For example, in our polyp malignancy prediction use case, we trained the model on the same endoscopy towers that it was then deployed into, to deliver real-time predictions on each video frame.


PHASE V: Monitor the model’s behaviour.

It is particularly important in the medical field to keep monitoring that the model we designed is behaving as we intended; it can have an impact on patients’ lives.

During this final and continuing phase, the plan is to use every metric, log, event and trace to detect any unexpected change of behaviour and react accordingly. Such changes can be triggered by any number of  events, with causes ranging from demographics (for example, the model is being used on a target population that differs from the one it was trained on), to usage (an end user is applying the model in a way that was not intended).


The result: Imagia’s unified AI system dedicated to healthcare research

By solving and standardizing our AI-project workflow, or unified AI system, we managed to build an AI engine, one of EVIDENS’s major components, in such a way to cater to varied user profiles in healthcare, regardless of their level of AI expertise. This brings us one step closer to bringing the benefits of AI to patients.

You can read more about our AI engine and how it was designed here, and sign up for our newsletter for more exciting developments from Imagia about healthcare and AI research.

Related posts

Digital Technology Supercluster Announces Investment to Increase the Effectiveness of Precision Oncology

Digital Technology Supercluster Announces Investment to Increase the Effectiveness of Precision Oncology

Harnessing artificial intelligence to take the guesswork out of diagnosing cancer recurrence for millions of cancer survivors

Read more
How to Bring Biomarker Testing In-House for Cancer Targeted Treatment Selection

How to Bring Biomarker Testing In-House for Cancer Targeted Treatment Selection

Personalized cancer treatment via targeted therapies is two-to-three times more effective than standard chemotherapy for patients with advan

Read more
Imagia Cybernetics & Canexia Health Merge to Supercharge Precision Oncology Accessibility

Imagia Cybernetics & Canexia Health Merge to Supercharge Precision Oncology Accessibility

Imagia Cybernetics, an AI-healthcare company that accelerates oncology solutions generated from real world data, today announced its merger

Read more