Missing analytical and regulatory tools for medical AI

Posted by GEF on 1st May 201822nd November 2018

Devices which use either high-tech AI based on machine learning, or simpler algorithms like SortED, to perform tasks like triage, or which make recommendations for investigations, fall under the control of the MHRA (Medicines and Healthcare Products Regulatory Agency).

The Medicines part of the regulatory framework has extremely well established analytical methods and tools: the classical clinical double-blind trial, and a peer-review system well versed in the relevant techniques. The same is the case for simple medical devices.

However these well worn pathways are not established for devices which attempt triage, investigate and even diagnose real patients.

‘GP at Hand’, Babylon’s phone app tested their system using patient vignettes. These are brief histories with symptoms and signs and other details about patients. Babylon also used old exam question scenarios from the Royal College of GPs.

We have also used vignettes and a colleague patiently assembled 500 ED vignettes for triage bench-tests. However, not only did we find a flaw in one standard statistical method of analysis (relating specifically to triage), we found vignettes too artificial to provide a good test of the system.

What we have moved on to is ‘simulated clinical use’. Experiments designed to put the device and the nurse using it in a real clinical setting, gathering data to determine what the system outputs would have been in real use. This test the system rigorously, but without putting patients at risk. Running such tests side-by-side with the current clinical process allows for a simple analytical framework.

In the rapidly moving field of medical applications using AI, both the regulators and journals need to update their tools and approach.

Gillie Francis – May 2018