How we developed an AI-driven discovery platform that transforms real-world healthcare data into clinically actionable insights

Author:

  • Florent Chandelier, CTO

It seems self-evident that healthcare research could gain tremendous value from using artificial intelligence to derive insights and solutions from real-world evidence. These discoveries could prove themselves invaluable for optimizing clinical best practices and for the overall efficiency of our healthcare systems, not to mention for advancing both population-based and personalized care strategies.

At Imagia, we have written at length on the topic, from improving patient outcomes to gaining a better understanding of the human genome, for instance. But the fact remains that, in practice, no one in the healthcare market seems to have managed to produce consistent and affordable AI-driven discoveries. The challenges at every step of the discovery process are considerable and make it unwieldy.

Led by our belief in the tremendous potential of AI in healthcare, our expert teams have analyzed the overall healthcare research process, from live clinical data all the way through to discovery, to break it down into phases in a single continuous digital pipeline. This has allowed us to identify, and then mitigate or resolve each of the various points of friction to ease the transitions to clinical validation and use.

Over the past five years, our cross-disciplinary, hands-on research and design work has led to an end-to-end digital discovery platform that is unique in the AI healthcare market. We have a complete, functional solution to the urgent need of applying artificial intelligence in healthcare and clinical research, operating directly on live clinical data.

Here’s what we discovered, and what we built.

Breaking down the AI-driven clinical discovery process

At a high level, we assume that all stakeholders have the same goal: to consistently develop AI-driven insights and solutions from real-world data.

We then break the research pipeline down into four phases that exist within a cybersecurity framework that continually preserves patient confidentiality, protects intellectual property, and traces data use.

Phase I: Data ingestion. There is no machine learning without data. The good news is that real-world and clinical evidence exists in vast quantities within healthcare organizations (HCOs). However, accessing it, storing it, and processing it—all while respecting GDPR, HIPAA and other privacy governance—is no small feat.

Phase II: Data management. These data and clinical events come in many forms and must undergo careful, substantial, and time-consuming standardization and optimization to be AI ready.

Phase III: AI training and experimentation. Once these datasets are ready, they need to exist in a framework that allows scientists and researchers to harness the power of machine learning to extract knowledge and learnings.

Phase IV: Result, model, or product. Eventually, a successful AI-driven discovery process should lead to a statistically reliable and reproducible conclusion, and hopefully a solution that answers to a specific clinical need.

It rapidly becomes clear that each phase has its own technologies and stakeholders, with their own differing expertise, challenges, and objectives, which can often seem to be competing with one another. With technologies, there’s the issue of “But it works on my computer with that data.” And with the people—well, have you ever brought together an engineer, a clinician, and a machine-learning scientist to solve a task? There is a bit of a language barrier. This highlights an additional problem: the interoperability and transfer of data, technologies, and results has a significant impact on activities from one phase and stakeholder to the next.

What we’ve built: Imagia EVIDENSTM, an end-to-end platform that accelerates the healthcare discovery process

The EVIDENSTM platform is a unique, microservice-oriented, end-to-end consolidation of all the tools necessary for AI-driven digital healthcare discovery. Organizations and industry using the platform can perform data analytics on heterogeneous and distributed datasets, as well as scale promising discoveries through machine learning, all while preserving data ownership, patient privacy and IP lineage and ownership.

EVIDENS operates directly on live clinical data across all phases, while simultaneously facilitating collaboration between three groups of stakeholders: (1) the data engineers, DevOps and IT teams at the healthcare organizations who own the data, (2) clinical research teams and industry R & D groups, and (3) Imagia’s own biostatisticians and machine-learning scientists.

Over the past five years, EVIDENS has undergone three major architecture evolutions, and has been serving academic and industry use cases for the past four years. It has become Imagia’s de facto platform for all data-related activities. This is why we can state that EVIDENS has been allowing stakeholders to design solutions that work in their respective production environments, that scale, and that reliably manage the complexities of healthcare organizations.

Let’s dive into what we built.

1. EVIDENS integrates real-world evidence and clinical data safely and securely
Before anything else, there’s data… but before data, there are privacy and confidentiality concerns.

Accordingly, from the onset, EVIDENS addresses the maintenance, security, and privacy parameters established by the IT groups and privacy offices of the different healthcare organizations (HCOs) of the Imagia ecosystem that contribute to our data hub. This allows us to develop a hub of real-world data systems that brings together electronic medical records (EMR), electronic health records (EHR), and picture archiving and communication systems (PACS).

Once these siloed data are safely in the EVIDENS data hub, its mission is to reconcile them, systematically denormalize real-world data to a common format, provide more contextual structure across data entities, and permit stakeholders to rapidly access the precise information they need without manipulating it in its raw form.

Because we are dealing with tremendous amounts of data that are growing by the moment, the Imagia DevOps team continually renews deployment strategies to automatically provision storage from physical and virtual resources, to enforce firewall-friendly outbound only communication, and to leverage infrastructure-as-code to automate all aspects of operating the EVIDENS data hub infrastructure at the HCO premises.

This foundational phase of the EVIDENS end-to-end process is the segment we iterate on the most, for instance as new project-based needs arise.  We manage these iterations by using well-factored internal endpoints or APIs (a challenge in themselves), by hiding both the technical and the engineering complexity of data manipulation, and by allowing Imagia’s team to quickly move its solutions from experimentation to testing in production. This allows us to continually improve the underlying engineering of how we structure information without having an impact on what’s actually done with that information. In other words, these operations are compatible with any downstream activities.

Thanks to this focus on iteration, our data-modelling strategy has reached a reasonable maturity level, empowering all our major stakeholders to access and use the real-world clinical data managed by EVIDENS.

2. EVIDENS’s data analytics pipeline creates AI-ready datasets quickly and affordably
At this point in the end-to-end process, a major change of stakeholders takes place, from IT and engineering to clinicians. This introduces a new understanding of the nature of data: from an object to be retrieved, to an object to be investigated. This phase also shifts the pipeline’s focus to capturing associations and linkages across information, formed and enriched by clinical expertise.

The challenge, then, for clinical investigators is to be able to efficiently search through the ever-increasing volumes of unstandardized and diverse data structures generated by real-world data. For instance, of particular interest in clinical R&D is the task of discovering critical groups of patients for specific clinical hypotheses, an overwhelming and tedious chore that represents up to 80% of traditional research activities.

Thanks to Imagia’s strong ecosystem of partner HCOs, we have access to and can leverage clinical expertise that has allowed us to further develop EVIDENS in line with healthcare professionals’ needs—for instance, in terms of analyzing clinical facts and structuring patient cohorts according to familiar inclusion/exclusion criteria traditionally used in clinical-trial design.

It’s during this part of the process that a data-first description of the clinical problem is generated, which will allow the problem to be transferred from clinical teams to the non-clinical teams contributing to the project. Specifically, an AI-friendly dataset is generated for biostatisticians to assess the quality and representativeness of the data. The objective is to establish a statement of primary statistical hypothesis, and prepare associated data splits for analysis; in machine learning, datasets must be split into training, validation and test sets, a critical activity that ensures data errors are not propagated in the analysis and learning processes.

In essence, it’s during this phase that EVIDENS applies expert clinical experience knowledge to the datasets to make them accessible and ready for any type of machine learning (ML) experimentation, as well as to generate key metadata about the datasets. It’s this kind of “information about information” that improves efficiency in discovery applications, as we’ll see in the next phase.

3. EVIDENS provides researchers and scientists with a framework where they can conduct experiments and train their models on AI-ready datasets
Machine learning (ML) is a system that extracts knowledge from a given dataset to solve a task with the objective to generalize this solution for out-of-sample, real-world data. Some examples of useful ML tasks in healthcare R&D include clustering (finding and labelling natural groupings of data to draw inferences, which is useful in cohorting), classification (identifying groupings of data that fit a category, for instance whether a patient has a disease or not), and prediction (forecasting likely outcomes).

The machine learning part of the digital healthcare discovery process is the domain of data and ML engineering, ML computer science, and statistics, which respectively (1) design data pipelines that reliably produce input datasets of sufficient quality from raw data, (2) extract task-specific knowledge for a given ML model (architecture and algorithm), (3) follow a statistically robust study design that ensures generalizability.

Whereas the first two phases of EVIDENS ensure that we have standardized every piece of information in such a way that AI can understand and learn from it quickly, the EVIDENS AI engine provides the coherent workflow that writes, executes, and tests ML experiments on these AI-ready datasets.

What’s more, we chose to build this custom-build Extract-Transform-Load (ETL) workflow, as opposed to using Apache Beam for example, to ensure that it is decoupled from underlying technologies. We want to guarantee that any given AI transform can operate independently of one another, to quickly experiment across ML model architectures and reliably compare results.

It’s this notion that allows EVIDENS to capitalize on one of Imagia’s most innovative, proprietary projects, its self-evolving learning framework (SELF). Through SELF, the EVIDENS platform engages in purely data-driven, automatic AI architecture design—it performs a proprietary neural architecture search to discover and train new models automatically. In essence, it trades ML expertise for processing time, making it possible for “AI non-specialists” to reap the benefits of ML without needing to understand it.

4. EVIDENS yields reproducible, sound, insightful results
While our preclinical models, derived from tailored statistical analysis, planning and design, run on carefully curated datasets issued from real-world data, the challenges we are solving are always rooted in specific clinical needs defined by clinicians, and our processes always ensure future  applicability on real-world data.

By adhering to strict privacy requirements first, then carefully listening to the various expert industry needs and preferences both in terms of technologies and results, we have managed to create a data and AI pipeline with its own tech-agnostic lingua franca that allows it, in a sense, to operate as an ambassador and activator between these stakeholders.

EVIDENS is a mature, foundational step in our pursuit of AI-driven insights in healthcare
Success is shaped by the experience of users, particularly as it relates to adapting to their everyday environments. Imagia’s digital platform strategy has proven to be responsive to particular constraints and deliver tangible value. EVIDENS removes the friction from Imagia’s engineering and research teams, providing high-quality, self-service access to a standardized ecosystem of foundational technologies deployed across healthcare organizations. It is a delivery infrastructure, ready to execute on various research and industry projects, enabling an operating model that focuses on generating actionable insights from routine clinical data, to reliably be applied in real-world scenarios.

There’s certainly a lot of work ahead of us to support more collaborations across domains, but we are confident in the results we are already witnessing. To stay updated on our progress, subscribe to our monthly newsletter!

Related posts

How we developed an AI-driven discovery platform that transforms real-world healthcare data into clinically actionable insights

How we developed an AI-driven discovery platform that transforms real-world healthcare data into clinically actionable insights

It seems self-evident that healthcare research could gain tremendous value from using artificial intelligence to derive insights and solutio

...
Read more
How AI could improve lung cancer screening—and help to save lives

How AI could improve lung cancer screening—and help to save lives

Lung cancer is the deadliest cancer in Canada—and the world. Every year, lung cancer kills more than 20,000 Canadians. Of those who receiv

...
Read more
Application of Homomorphic Encryption in Medical Imaging

Application of Homomorphic Encryption in Medical Imaging

A technical 20-page report for next-generation data governance models. Authors: Francis Dutil, Alexandre See, Lisa Di Jorio and Florent Chan

...
Read more