The process of labeling data is costly, tedious and is a very manual task tackled by humans. Current machine learning algorithms require a lot of labeled data in order to reach performance levels that are meaningful. Once a model is trained on a specific domain, it is difficult to easily transfer this learning to another domain. This is why the world needs unsupervised domain adaptation. Hypothesis Disparity Regularized Mutual Information Maximization’ or HDMI is one realization of this goal that brings some groundbreaking possibilities in healthcare.
What is HDMI and what challenge does it solve?
Hypothesis Disparity Regularized Mutual Information Maximization or HDMI was developed for situations where we have a domain shift between the training data and the test data (data in this case, can be images, and domain is identified by the set of all possible images taken from a particular camera). Figures 1 and 2 show examples of domain shift in computer vision and medical datasets respectively.
Figure1. Examples of domain shift in 4 computer vision datasets (source).
Figure2. Examples of domain shift in brain MRI from 4 centers (source).
A major drawback of machine learning algorithms is that they assume the data used to test a model is from the same domain as the data used to train it. To simplify the concept, let’s use the example that you have two cameras: an iPhone and a DSLR camera (think of a Canon, or Minolta camera). The statistical distribution of images taken by the iPhone is inherently different from that of the images taken by the DSLR camera, aka “domain shift”. What this entails is that if you train a machine learning algorithm using the images from your iPhone and then test your model on the images from the DSLR camera, you will not achieve good results. By training, we mean using the image data and their corresponding annotations to learn an image recognition model (i.e. supervised image classification). In this example, what makes the domain of the two cameras differ, is the hardware specifications within the two cameras i.e. sensor size, lens specifications, exposure time, etc.
This distribution shift is sometimes referred to as domain shift. In general, machine learning models have trouble dealing with domain shifts and often perform poorly when such distribution shifts exist.
Even though as humans, our cognitive system is very robust to domain shift, meaning that the semantics of an image does not change between an image from an iPhone to that of a DSLR, machine learning methods interpret these images very differently and thus can not transfer the knowledge gained from one domain to another domain.
This can be important in healthcare, when we work with multiple data centers (i.e. hospitals); for example the training data comes from hospital A with a particular scanner (e.g., MRI machine). Having trained the AI model on data from hospital A, we would like to deploy (test) it on hospital B which uses different scanners. Similar to the earlier case (iPhone and DSLR camera), because of differences in scanner hardware, data from hospitals A and B are of different domains and the model would fail to perform well in hospital B. HDMI aims to solve this problem.
In particular HDMI, builds on previous works1 and builds a 2 step strategy to solve this problem:
Self-training, may falsely over-confident the models in an undesirable manner. To prevent this effect, we enforce uniformity among the models (i.e., regularize the models with respect to each other). Because each model has a slightly different understanding of the data, they are considered as slightly different experts. Enforcing uniformity would encourage the models to correct each other’s mistakes.
Why did we invest in HDMI and what is the game-changer?
Because HDMI on its own provides scalability of discovery for healthcare applications. It is also a game-changer because it has the potential to be integrated in a federated learning environment.
Most federated learning 3 methods assume all participants have similar domains which is not very accurate. Using HDMI, we can potentially design a more realistic federated learning framework, with the ability to account for domain disparity between participants.
How does it help or improve the work day-to-day of experts?
HDMI is not a new application but rather a method to improve the application of AI to healthcare data. It helps to address challenges in dealing with Real World data. Experts can have HDMI as one additional tool in their tool box to enable AI solutions in the clinical world.
What is next?
Applying HDMI to healthcare data and augmenting federating learning solutions for medical Imaging and healthcare data.
Publications Mar 3, 2021
Lao, Qicheng, Xiang Jiang, Mohammad Havaei, and Yoshua Bengio.IEEE Transactions on Neural Networks and Learning Systems (2021).
Press Mar 2, 2021
Imagia, a leader in artificial intelligence and personalized healthcare announced today a $7M support from Investissement Québec as part of...
Blog Feb 17, 2021
The process of labeling data is costly, tedious and is a very manual task tackled by humans. Current machine learning algorithms require a...
Please complete the form in order to direct your request to the appropriate department, and we will reach out as soon as possible.