The importance of explainability in machine learning (ML) models used in the medical field is increasing. It is necessary to understand and explain the models from various perspectives, including medical, technological, legal, and most importantly, the patient’s perspective, in order to gain acceptance and adoption. While ML models developed for text in the medical domain have shown statistical accuracy, clinicians have an ethical obligation to assess the weaknesses of these predictions to ensure the best care for individual patients. The explainability of these predictions is crucial for clinicians to make informed decisions on a patient-to-patient basis. This article demonstrates how to enhance model explainability in clinical settings using Amazon SageMaker Clarify.
In the medical domain, one specific application of ML algorithms is clinical decision support systems (CDSSs) for triage. Every day, patients are admitted to hospitals, and admission notes are recorded. ML models can assist clinicians in estimating clinical outcomes during the triage process, reducing operational costs and providing optimal care. Understanding the reasoning behind these ML-based decisions is essential for making informed decisions regarding individual patients. The purpose of this article is to outline the deployment of predictive models using Amazon SageMaker for triage purposes in hospital settings and using SageMaker Clarify to explain these predictions. This approach aims to facilitate the adoption of predictive techniques in CDSSs for healthcare organizations.
Clinical notes are a valuable asset for healthcare organizations. When patients are admitted to a hospital, admission notes are taken, and recent studies have shown that key indicators such as diagnoses, procedures, length of stay, and in-hospital mortality can be predicted accurately using natural language processing (NLP) algorithms. Advanced NLP models like BERT (Bi-directional Encoder Representations from Transformers) have improved the accuracy of predictions on text data like admission notes. However, it is important to explain how these BERT models achieve their predictions to use them effectively. One technique for explaining the predictions of ML models is SHAP (SHapley Additive exPlanations), which breaks down the contribution of each input feature to the final prediction. SHAP values are calculated using game theory concepts and provide explanations without requiring an understanding of the model’s internal workings.
SageMaker Clarify, combined with pre-trained models from Hugging Face, enables easy model training and explainability on AWS. In this article, the example focuses on predicting in-hospital mortality using a pre-trained Hugging Face BERT model called bigbird-base-mimic-mortality. This model is specifically designed for predicting mortality based on admission notes from the MIMIC ICU dataset. The advantage of using this model is its ability to process larger context lengths, allowing the input of complete admission notes without truncation. The end-to-end solution involves deploying the fine-tuned model on SageMaker and using Clarify to explain its predictions.
SageMaker Clarify provides tools for ML developers to gain insights into their training data and models. It can explain both global and local predictions, making it suitable for computer vision (CV) and NLP models. The architecture for hosting an endpoint that serves explainability requests includes interactions between the endpoint, model container, and SageMaker Clarify explainer. The article provides sample code in a Jupyter notebook to showcase this functionality, but in real-world scenarios, electronic health records (EHRs) or other hospital care applications would directly invoke the SageMaker endpoint.
Before deploying the model, several prerequisites need to be fulfilled. The code from the GitHub repository should be accessed and uploaded to the notebook instance. It is recommended to use a Python 3 kernel on SageMaker Studio or a conda_python3 kernel on a SageMaker notebook instance. The deployment process involves downloading the model from Hugging Face, uploading it to an Amazon S3 bucket, creating a model object using the HuggingFaceModel class, defining the instance type for deployment, creating a model, creating an endpoint configuration, and finally creating the endpoint.
To enable explainability, SageMaker Clarify is integrated into the endpoint configuration. The configuration includes specifying a SHAP baseline, which can be provided as inline baseline data or through an S3 baseline file. The TextConfig is set to sentence-level granularity and English language. Once the model and endpoint configuration are ready, the create_endpoint API is used to create the endpoint.
Overall, this article provides a comprehensive guide on improving the explainability of ML models in clinical settings using Amazon SageMaker Clarify. It explains the importance of explainability, the technical background, and the step-by-step process of deploying a predictive model and enabling explainability using SageMaker Clarify.