How HDMI supports groundbreaking possibilities in healthcare

Hypothesis Disparity Regularized Mutual Information Maximization or HDMI

The process of labeling data is costly, tedious and is a very manual task tackled by humans. Current machine learning algorithms require a lot of labeled data in order to reach performance levels that are meaningful. Once a model is trained on a specific domain, it is difficult to easily transfer this learning to another domain. This is why the world needs unsupervised domain adaptation. Hypothesis Disparity Regularized Mutual Information Maximization’ or HDMI is one realization of this goal that brings some groundbreaking possibilities in healthcare.

What is HDMI and what challenge does it solve?

Hypothesis Disparity Regularized Mutual Information Maximization or HDMI was developed for situations where we have a domain shift between the training data and the test data (data in this case, can be images, and domain is identified by the set of all possible images taken from a particular camera).  Figures 1 and 2 show examples of domain shift in computer vision and medical datasets respectively.

Examples of domain shift in 4 computer vision datasetsFigure1. Examples of domain shift in 4 computer vision datasets (source).

Examples of domain shift in brain MRI from 4 centersFigure2. Examples of domain shift in brain MRI from 4 centers (source).

A major drawback of machine learning algorithms is that they assume the data used to test a model is from the same domain as the data used to train it. To simplify the concept, let’s use the example that  you have two cameras:  an iPhone and a DSLR camera (think of a Canon, or Minolta camera). The statistical distribution of images taken by the iPhone is inherently different from that of the images taken by the DSLR camera, aka “domain shift”. What this entails is that if you train a machine learning algorithm using the images from your iPhone and then test your model on the images from the DSLR camera, you will not achieve good results. By training, we mean using the image data and their corresponding annotations to learn an image recognition model (i.e. supervised image classification). In this example,  what makes the domain of  the two cameras differ, is the hardware specifications within the two cameras i.e. sensor size, lens specifications, exposure time, etc.

This distribution shift is sometimes referred to as domain shift. In general, machine learning models have trouble dealing with domain shifts and often perform poorly when such distribution shifts exist.

Even though as humans, our cognitive system is very robust to domain shift, meaning that the semantics of an image does not change between an image from an iPhone to that of a DSLR, machine learning methods interpret these images very differently and thus can not transfer the knowledge gained from one domain to another domain.

This can be important in healthcare, when we work with multiple data centers (i.e. hospitals); for example the training data comes from hospital A with a particular scanner (e.g., MRI machine). Having trained the AI model on data from hospital A, we would like to deploy (test) it on hospital B which uses different scanners.  Similar to the earlier case (iPhone and DSLR camera), because of differences in scanner hardware, data from hospitals A and B are of different domains and the model would fail to perform well in hospital B. HDMI aims to solve this problem.

In particular HDMI, builds on previous works1 and builds a 2 step strategy to solve this problem:

  1. Instead of training a single model on the training data, we train multiple models (i.e. multiple hypothesis). By using this strategy, each model focuses on slightly different characteristics (i.e. features) of the training data. In technical terms, every model is most likely to capture  different modes of the training data.
  2. As in other machine learning algorithms, the trained models are confident only on the training data they have seen. Thus, we need to adapt each model to data from the new domain. We do this by self-training; which means we apply the model on data from the new domain and use the class predictions as labels to train the model. This would increase each model’s confidence on the new data. In technical terms; we maximize the mutual information between each model and the new domain.

Self-training, may falsely over-confident the models in an undesirable manner. To prevent this effect, we enforce uniformity among the models (i.e., regularize the models with respect to each other). Because each model has a slightly different understanding of the data, they are considered as slightly different experts. Enforcing uniformity would encourage the models to correct each other’s mistakes.

Why did we invest in HDMI and what is the game-changer? 

Because HDMI on its own provides scalability of discovery for healthcare applications.  It is also a game-changer because it has the potential to be integrated in a federated learning environment.

Most federated learning 3 methods assume all participants have similar domains which is not very accurate. Using HDMI, we can potentially design a more realistic federated learning framework, with the ability to account for domain disparity between participants.

How does it help or improve the work day-to-day of experts?

HDMI is not a new application but rather a method to improve the application of AI to healthcare data. It helps to address challenges in dealing with Real World data.  Experts can have HDMI as one additional tool in their tool box to enable AI solutions in the clinical world.

What is next?

Applying HDMI to healthcare data and augmenting federating learning solutions for medical Imaging and healthcare data.

 

Related posts

Application of Homomorphic Encryption in Medical Imaging

Application of Homomorphic Encryption in Medical Imaging

A technical 20-page report for next-generation data governance models. Authors: Francis Dutil, Alexandre See, Lisa Di Jorio and Florent Chan

...
Read more
Preserving Patient Privacy with Homomorphic Encryption

Preserving Patient Privacy with Homomorphic Encryption

Health research necessitates access to vast quantities of medical data and personal health information (PHI). From a patient perspective, pr

...
Read more
Delivering AI in Healthcare - Platforms vs Packages? Why not both!

Delivering AI in Healthcare - Platforms vs Packages? Why not both!

Personalized medicine is touted as the holy grail of patient care. By supercharging decision support so that the right treatments are prescr

...
Read more