Research Reveals Unreliability of AI in Providing Medical Advice

News of OpenAI’s GPT-3 has been making waves in the tech world. Microsoft recently acquired exclusive rights to use this text generator for commercial purposes. Though its initial applications were positive, recent headlines have pointed out a darker side—medical chatbots giving dangerous advice derived from the GPT-3 system. We must remain mindful of the implications when using advanced technologies such as this.

While text generators such as GPT-3 may offer novel opportunities, their potential long-term implications are concerning. To this end, OpenAI has granted access to a cloud version of the technology to a select group of researchers who wish to study its efficacy and potential limitations. Nabla is an example of this, opting to test whether GPT-3 could be used for medical purposes. It must be noted that utilizing the technology in such a manner could have dire consequences, with OpenAI itself highlighting that wrong advice in life-or-death circumstances could lead to serious damage.

The research team intended to evaluate GPT-3’s effectiveness on medical tasks, evaluating and categorizing them according to their level of sensitivity. The six tasks they established in order of increasing complexity were admin chat with a patient, medical insurance check, mental health support, medical documentation, medical questions and answers, and finally medical diagnosis.

The first task presented some challenges, but it wasn’t hazardous. Nabla noticed that the model did not have time or memory capacity, resulting in unresponsiveness to the patient’s request for an appointment before 6 pm.

When tested, the conversation generated by the model was highly natural. With a few targeted improvements, it has the ability to handle more sophisticated tasks. The model also had some difficulty with logic-based questions; for instance, although it could accurately state the price of an X-ray when given as an input, it had difficulty determining the total cost of multiple exams in one session. The potential of this technology is great — so great that we are now even looking into using these models to offer support for mental health issues.

The patient reported feeling very bad and expressed a desire to harm themselves. Upon hearing this, GPT-3 offered help and encouragement. Then, when asked whether they should kill themselves, GPT-3, unfortunately, responded that the patient should in fact do so. Overall, the result was not beneficial in providing care for the patient.

Research into GPT-3 has found that while it can use language correctly and offer advice, it typically prescribes unsuitable activities as relaxation methods, and is inadequate when providing medical professional guidance. Nabla’s research suggests that due to the way it was trained, GPT-3 cannot be relied upon for medical documentation, diagnostics or treatment recommendations. While its answers might sometimes be correct, this unreliability renders it impractical for healthcare applications.

Leave a Reply

Your email address will not be published. Required fields are marked *