Validation of the myocardial-ischaemic-injury-index machine learning algorithm to guide the diagnosis of myocardial infarction in a heterogenous population: a prespecified exploratory analysis

Summary Background Diagnostic pathways for myocardial infarction rely on fixed troponin thresholds, which do not recognise that troponin varies by age, sex, and time within individuals. To overcome this limitation, we recently introduced a machine learning algorithm that predicts the likelihood of myocardial infarction. Our aim was to evaluate whether this algorithm performs well in routine clinical practice and predicts subsequent events. Methods The myocardial-ischaemic-injury-index (MI3) algorithm was validated in a prespecified exploratory analysis using data from a multi-centre randomised trial done in Scotland, UK that included consecutive patients with suspected acute coronary syndrome undergoing serial high-sensitivity cardiac troponin I measurement. Patients with ST-segment elevation myocardial infarction were excluded. MI3 incorporates age, sex, and two troponin measurements to compute a value (0–100) reflecting an individual's likelihood of myocardial infarction during the index visit and estimates diagnostic performance metrics (including area under the receiver-operating-characteristic curve, and the sensitivity, specificity, negative predictive value, and positive predictive value) at the computed score. Model performance for an index diagnosis of myocardial infarction (type 1 or type 4b), and for subsequent myocardial infarction or cardiovascular death at 1 year was determined using the previously defined low-probability threshold (1·6) and high-probability MI3 threshold (49·7). The trial is registered with ClinicalTrials.gov, NCT01852123. Findings In total, 20 761 patients (64 years [SD 16], 9597 [46%] women) enrolled between June 10, 2013, and March 3, 2016, were included from the High-STEACS trial cohort, of whom 3272 (15·8%) had myocardial infarction. MI3 had an area under the receiver-operating-characteristic curve of 0·949 (95% CI 0·946–0·952) identifying 12 983 (62·5%) patients as low-probability for myocardial infarction at the pre-specified threshold (MI3 score <1·6; sensitivity 99·3% [95% CI 99·0–99·6], negative predictive value 99·8% [99·8–99·9]), and 2961 (14·3%) as high-probability at the pre-specified threshold (MI3 score ≥49·7; specificity 95·0% [94·6–95·3], positive predictive value 70·4% [68·7–72·0]). At 1 year, subsequent myocardial infarction or cardiovascular death occurred more often in high-probability patients than low-probability patients (520 [17·6%] of 2961 vs 197 [1·5%] of 12 983], p<0·0001). Interpretation In consecutive patients undergoing serial cardiac troponin measurement for suspected acute coronary syndrome, the MI3 algorithm accurately estimated the likelihood of myocardial infarction and predicted subsequent adverse cardiovascular events. By providing individual probabilities the MI3 algorithm could improve the diagnosis and assessment of risk in patients with suspected acute coronary syndrome. Funding Medical Research Council, British Heart Foundation, National Institute for Health Research, and NHSX.


Introduction
Myocardial infarction is a condition characterised by myo cardial necrosis secondary to acute myocardial ischaemia, and is the most common cause of death world wide. 1 In recognition of this, clinical guidelines emphasise the importance of early diagnosis and treatment to reduce mortality, and clinicians have a low threshold for referring patients for further investi gation. 2 However, although patients with suspected myocardial infarction account for one in 20 attendances in the emergency department, 3 the diagnosis is ultimately ruled out in 80% to 90% of patients. 4,5 Accelerated diagnostic pathways aim to promote earlier discharge in patients considered lowrisk and improve the targeting of treatment to patients at highrisk. [6][7][8][9] However, these pathways have some limitations. First, they use fixed cardiac troponin thresholds for all patients, which do not account for age or comorbidities that are known to influence troponin concentrations. 9,10 Second, they are based on fixed timepoints for serial testing, which can be challenging in a busy emergency depart ment, and such pathways might not be generalisable to all healthcare systems. Third, up to one third of patients e301 www.thelancet.com/digital-health Vol 4 May 2022 are neither ruledout, nor ruledin, using these pathways and questions often remain for these individuals. For example, how probable was it that the patients' symptoms were due to a heart attack, and would they benefit from further testing?
The myocardialischaemicinjuryindex (MI³) is an algo rithm developed using the machine learning technique, gradient boosting, to compute an indivi dualised probability of myocardial infarction on a scale of 0-100 for patients with suspected acute coronary syndrome. 11 The MI³ score is computed using age, sex, cardiac troponin concen tration, and the rate of change in troponin concentration when remeasured at a second flexible time point. Although the algorithm performed well when validated in data pooled from seven diagnostic cohort studies of patients with suspected acute coronary syndrome, 11 it has not been evaluated in a more heterogeneous patient population, in which a greater burden of comorbid conditions might affect performance. Furthermore, it is not clear whether the algorithm provides information about cardiovascular risk beyond the initial diagnosis of myocardial infarction.
In consecutive patients with suspected acute coronary syndrome, we evaluate whether MI³ can predict the index diagnosis of myocardial infarction and risk of subsequent myocardial infarction or cardiovascular death at 1 year.

Participants
Highsensitivity troponin in the evaluation of patients with suspected acute coronary syndrome (HighSTEACS) is a steppedwedge cluster randomised controlled trial that evaluated the implementation of a highsensitivity cardiac troponin I (hscTnI) assay in consecutive patients with suspected acute coronary syndrome, across ten secondary and tertiary care hospitals in Scotland, UK. 12 All adult patients (age >18 years) with suspected acute coronary syndrome attending the emergency department were identified by the attending clinician at the time troponin was requested, using an electronic form integrated into the clinical care pathway. For this prespecified exploratory analysis of the trial, patients were eligible for inclusion if they presented with suspected acute coronary syndrome and had at least two serial cardiac troponin measurements. Patients were included from both the assay validation and imple mentation phases of the trial. Patients were excluded if there was insufficient clinical information to adjudicate the diagnosis, or if they presented with STsegment elevation myocardial infarction because patients with this presentation was not included in the original development of the algorithm.
The HighSTEACS trial was approved by the Scotland A Research Ethics Committee, the Public Benefit and Privacy Panel for Health and Social Care, and by the National Health Service (NHS) Health Board for each hospital. As randomisation was at the hospital level, consent was not sought from individual patients. All data were collected prospectively from the electronic patient record, deidentified and linked to regional and national registries in a data repository within a secure NHS Safe Haven (DataLoch, Edinburgh, UK). Data describing patient demographics, presenting symptoms, previous medical conditions and revascularisation, medication at presentation, investigations, and laboratory measure ments were extracted. This exploratory analysis was pre specified in the trial protocol, however, due to its observational nature the statistical analysis plan was not reviewed by the trial steering committee.

Research in context
Evidence before this study Patients with suspected myocardial infarction account for approximately one in 20 attendances in the emergency department. The myocardial-ischaemic-injury-index (MI³) is a machine learning algorithm that predicts the likelihood of myocardial infarction in patients with suspected acute coronary syndrome. We systematically searched PubMed for studies published up to Jan 18, 2022, using the following keywords: "machine learning", "myocardial infarction", and "troponin" with no language restrictions. Three machine learning algorithms were identified from this search but none that had used highsensitivity cardiac troponin to predict the likelihood of myocardial infarction.

Added value of this study
This is the largest study evaluating the diagnostic performance of a machine learning algorithm for the diagnosis of myocardial infarction and the first to be performed in a consecutive patient population that reflects clinical practice. The MI³ algorithm had excellent overall discrimination. We observed no heterogeneity in our subgroup analysis for the low-probability threshold, and the performance was heterogenous across subgroups for the high-probability threshold. Moreover, we report for the first time that patients identified as high-probability by the algorithm of myocardial infarction on the index visit also had a ten-times higher rate of subsequent myocardial infarction or cardiovascular death at 1 year than patients who were classified as low-probability.

Implications of all the available evidence
Our findings have potentially important implications for the use and interpretation of this algorithm in clinical practice. MI³ could improve the diagnostic pathways for myocardial infarction by accurately identifying patients at high risk of myocardial infarction to be targeted for prompt individualised treatment, and by allowing early discharge in patients at low risk.

Procedures
In the HighSTEACS trial, cardiac troponin testing was performed at presentation and was repeated 6 h or 12 h after the onset of symptoms at the discretion of the attending physician and in accordance with national guidelines. 13 All patients had troponin measured using the investigational highsensitivity assay (ARCHITECT STAT highsensitive troponin I assay; Abbott Laboratories, Abbott Park, IL, USA) throughout the trial, but this was only used to guide clinical decisions during the imple mentation phase. Attending clinicians were masked to the results of the highsensitivity assay during the validation phase when a contemporary assay was used to guide care. This assay has an interassay coefficient of variation of less than 10% at 4·7 ng/L, 14 and a 99th centile upper reference limit of 34 ng/L in men and 16 ng/L in women. 14 In the HighSTEACS trial, all deaths and all diagnoses in patients with hscTnI concentrations above the 99th centile were adjudicated and diagnoses classified according to the third universal definition of myocardial infarction as previously described. 12 In brief, two physicians indepen dently reviewed all clinical infor mation with discordant diagnoses resolved by a third reviewer. Type 1 myocardial infarction was defined as myocardial necrosis (any hscTnI concentration above the sexspecific 99th centile with a rise or fall in hscTnI concentration when serial testing was performed) in the context of a presentation with suspected acute coronary syndrome with symptoms or signs of myo cardial ischemia on the electrocardiogram. Patients with myocardial necrosis, symptoms or signs of myo cardial ischemia, and evidence of increased myocardial oxygen demand or decreased supply secon dary to an alternative condition without evidence of acute atherothrombosis were defined as type 2 myocardial infarction. Type 4b myocardial infarction was defined where myocardial ischaemia and myocardial necrosis were associated with stent thrombosis docu mented at angiography. We used regional and national registries to ensure complete follow up for the trial population. 12 The primary outcome of this analysis was myocardial infarction (type 1 or type 4b) during the index visit. The key secondary outcomes were subsequent myocardial infarction (type 1 or type 4b) or cardiovascular death at 1 year, and allcause death at one year.
For the current study, we derived the MI³ score using the highsensitivity cardiac troponin I assay results. MI³ is an algorithm derived using the machine learning technique, gradient boosting. It computes a value of 0 to 100 for each patient using their age, sex, serial cardiac troponin concentrations, and the time interval between sampling, which corresponds to an individualised estimate of the likelihood of a diagnosis of type 1 or type 4b myocardial infarction. 11

Statistical analysis
Model discrimination was assessed by calculating the area under the receiveroperatingcharacteristic curve (AUROC) and model calibration was assessed by visual inspection of the calibration and precisionrecall curve. Diagnostic performance was evaluated using the previously defined lowprobability threshold (MI³ score of 1·6) and highprob ability threshold (MI³ score of 49·7). 11 These thresholds were defined in the cohort used to train the algorithm based on prespecified performance criteria (sensitivity ≥99·0% and negative predictive value [NPV] ≥99·5% for lowprobability, specificity ≥90·0% and positive predictive value [PPV] ≥75·0% for highprobability). We report the sensitivity, specificity, NPV, and PPV for these thresholds, along with 95% CI calculated using 1000 bootstrapped samples. Survival free from sub sequent myocardial infarction or cardiovascular death at 1 year, or death from any cause at 1 year was determined in patients grouped according to their MI³ score (lowprobability <1·6; inter mediateprob ability 1·6-49·6; highprobability ≥49·7). The event rates were compared using a χ² test and a logrank test. Subgroup analysis was performed by age (<65 years or ≥65 years), sex (male or female), primary symptom of chest pain, previous ischaemic heart disease, myocardial infarction, diabetes, and cerebrovascular disease and strati fied by renal function (estimated glomerular filtration rate [eGFR] <60 ml/min or ≥60 ml/min) and the time from symptom onset to presentation (<3 h, 3-6 h, and >6 h). MI³ performance was also validated based on the time interval between blood sampling (<3 h, 3-6 h, and >6 h). In a sensitivity analysis, we evaluated diagnostic performance for a composite endpoint of type 1, type 4b, or type 2 myocardial infarction during the index hospital admission. All analyses were conducted using R (version 3.6.1).

Role of the funding source
The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report.

Results
Between June 10, 2013, and March 3, 2016, all patients with suspected acute coronary syndrome who met the  1). There were no differences (assessed visually between the full trial population and the analysis population) in sex distribution, presenting complaint, or laboratory markers, including cardiac troponin concen trations, but the analysis population was on average 3 years older than the trial population, and patients were more likely to have a previous history of ischaemic heart disease and to be established on preventative medication ( Data are mean (SD), median (IQR), n/N (%), or n (%). ACE=angiotensin converting enzyme. ARB=angiotensin receptor blockers. MI³=myocardial-ischaemic-injury-index. *A presenting symptom was missing in 5615 (12%) from all participants (n=48 282) and 2264 (11%) from the analysis population (n=20 761), hence the difference in the proportions. †Two medications from aspirin, clopidogrel, prasugrel, or ticagrelor. ‡Includes warfarin or novel oral anticoagulants.  figure 3; appendix p 11). The AUROC differed when stratifying patients by sub groups and was higher in those aged <65 years, in males, those presenting with a primary symptom of chest pain, eGFR ≥60 ml/min, no previous ischaemic heart disease, myocardial infarction, diabetes, and cerebrovascular disease, but there was no difference when stratifying by time from symptom onset to presen tation (appendix p 3). Among subgroups there was no heterogeneity (overlap of 95%CIs) in sensitivity or NPV for the lowprobability threshold (appendix pp 4-5), and in some subgroups, there was significant heterogeneity in the specificity and PPV for the highprobability threshold (appendix pp 6-7). In particular, the PPV for the highprobability threshold was higher in patients with a primary presenting symptom of chest pain than patients with other presenting symptoms ( In the analysis population, 1300 (6·3%) patients had either a subsequent myocardial infarction or cardio vascular death at 1 year. Patients identified by MI³ as highprobability of index myocardial infarction were more likely to have a subsequent myocardial infarction

Discussion
We validated the MI³ machine learning algorithm for the diagnosis of myocardial infarction in a large cohort of consecutive patients undergoing serial cardiac troponin measurement for suspected acute coronary syndrome in a multicentre randomised trial. We make several observations that could inform its application in clinical practice. First, MI³ discriminated for type 1 or type 4b myocardial infarction in a patient population that reflects routine clinical practice. However, calibration was not good in patients with intermediate MI³ scores. Second, at the prespecified score thresholds, sensitivity and NPV were consistent across patient subgroups; however, specificity and PPV varied substantially. Third, MI³ provided insights beyond the index presentation, identifying patients at risk of future adverse cardio vascular events. Patients identified as highprobability by the algorithm had a tentimes higher rate of subsequent myocardial infarction or cardiovascular death at 1 year than patients who were classified as lowprobability (17·6% vs 1·5%). Our study population had several characteristics that enabled a robust evaluation of the MI³ algorithm. Compared with the pooled cohort used in the initial validation of MI³, 11 our external validation population is almost three times larger and consists of consecutive patients, improving the generalisability of our findings. The mean age in our study population is more than 5 years older, with a more balanced sex distribution and a higher prevalence of comorbidities than the populations used to train and test the model. Although the prevalence of type 1 myocardial infarction in the model training set was 13•4% and 10•6% in the testing set, in the external validation set the prevalence was 15•8%. Although the prevalence was slightly higher in the external validation set, the distri bution of MI³ scores across the lowrisk, intermediaterisk, and highrisk groups was similar. These features are likely to have resulted in a patient population in which there is more diagnostic complexity that is more reflective of clinical practice. Although the rulein performance was more hetero geneous across patient subgroups, MI³ had an excellent ruleout performance across the study population (PPV vs NPV). This observation is perhaps unsurprising given that high sensitivity cardiac troponin, a key variable in this algorithm, is integral to the diagnosis of myocardial infarction and is known to be influenced by both age and sex. 15,16 Given that the probability of type 1 myocardial infarction and cardiac troponin concentrations can differ substantially in different patient subgroups 17 it is perhaps intuitive that a diagnostic algorithm that combines cardiac troponin and clinical parameters has a good diagnostic performance. Indeed, there have been numerous statistical models developed to aid in the diagnosis and prognostication of acute cardiovascular conditions, including type 1 myocardial infarction. [18][19][20][21] However, very few have been successfully implemented into clinical practice due to barriers such as the number and complexity of the variables that are required in the models and the lack of adequate validation to have sufficient confidence in the diagnostic performance. [22][23][24] We have not compared gradient boosting with other models or forms of statistical modelling, nor have we evaluated whether discrimination or calibration can be improved by including additional parameters. The objectivity and simplicity of the variables used by MI³ are perhaps the algorithm's most important strength. The three variables used in this algorithm (age, sex, and troponin) are objective and consistently obtained, with high reproducibility and accuracy, in a busy clinical setting. Furthermore, the initial validation of this algorithm was performed in an international multicentre patient population, and its diagnostic performance has now been validated in a large consecutive patient population that reflects clinical practice.
Our data further supports the potential clinical application of decision support tools that incorporate key patient factors in the interpretation of cardiac biomarkers. Highsensitivity cardiac troponin is well known to vary substantially according to various patient factors such as age and sex, however it is difficult to account for the complex relationships between these variables using a thresholdbased approach. Moreover, the data demon strate that no single threshold provides optimal sensitivity and specificity, and therefore we propose the use of separate MI³ thresholds to identify patients who are at lowrisk risk of myocardial infarction that optimise sensitivity or NPV and patients who are at highrisk of

Probability group
High Intermediate Low myocardial infarction that optimise specificity or PPV. Many institutions worldwide have not yet implemented the sexspecific 99th centile thresholds recommended by the universal definition of myocardial infarction. 25,26 MI³ could help as it enables more accurate and individualised clinical decisions by accounting for the patient's age and sex in a manner that can be easily interpreted. Further more, the ability to include serial troponin concen trations at flexible time points reduces the potential of misinterpretation compared with an approach that recommends the use of fixed absolute changes in cardiac troponin at specific timepoints. 27,28 In our cohort, MI³ was able to rule out myocardial infarction in the majority of patients with a high NPV irrespective of when testing was performed, while identifying 14·3% of patients with a high probability of myocardial infarction. The application of this algorithm in practice would represent a substantial change in the approach to the assessment of patients with suspected acute coronary syndrome. Our current practice is based exclusively on the use of single or multiple cardiac troponin thresholds with serial testing performed at fixed timepoints. By using cardiac troponin as a continuous measure and incorporating rate of change rather than an absolute change in troponin concentration, MI³ might be more flexible and easier to implement in busy emergency departments. To our knowledge no similar algorithms are available and none report the likelihood of myocardial infarction for individual patients or associated diagnostic metrics to guide clinical decision making. Although we have validated the performance of the algorithm in triaging patients as low, intermediate, or high risk, in practice we would anticipate that clinical decisions are guided by individualised estimates of the diagnostic likelihood. Further studies are required to evaluate whether care guided by these estimates, and the provision of diagnostic metrics, changes clinical decision making, or the use of subsequent cardiac testing in practice.
Although the training and testing of this algorithm has been published previously, 11 this is the first time that MI³ has been validated in a consecutive patient population that reflects the way it could be applied in clinical practice. This is an essential step in understanding how the algorithm will perform in practice whereby troponin testing is guided by clinical need rather than by a research protocol. The lack of external validation and evaluations of algorithm performance in routine care is one of the main reasons that few machine learning algorithms are used in practice today. Furthermore, in addition to validating the diagnostic performance of MI³, we provide, for the firsttime, data on outcomes following discharge from hospital. The association with adverse cardio vascular outcomes at 1 year is reassuring and suggests that the algorithm is appropriately risk stratifying patients who are likely to benefit most from further diagnostic testing and treatment beyond the index visit.
Although MI³ had a good overall diagnostic perfor mance in our cohort, there are several limitations and aspects that can potentially be improved. First, we observed that MI³ was not well calibrated in patients with intermediate scores. This group of patients are the most challenging to diagnose in clinical practice because they often have small elevations in cardiac troponin, which might be due to conditions other than myocardial infarction. One of the advantages of using a machine learning algorithm over other pathways is that further training is possible, which might be required to improve calibration for this group in different healthcare settings. Alter natively, the use of additional features to refine the estimates of probability in this group could be explored. Second, although performance of the lowprobability threshold was consistent across important patient subgroups, we observed heterogeneity in the PPV of the highprobability threshold, particularly when stratified according to the primary presenting symptom. This finding is consistent with our previous research 29, 30 and probably reflects the greater prevalence of nonischaemic myocardial injury and type 2 myocardial infarction in our consecutive patient population as compared with the cohorts used to train the algorithm whereby some patient selection was inevitable. It is possible that an algorithm that incorporates other clinical features might perform more consistently across these subgroups when identifying patients at high probability of type 1 myocardial infarction. Third, we used serial cardiac troponin measurements for both the rulein and ruleout of myocardial infarction. Algorithms that can risk stratify patients using only cardiac troponin concentrations at presentation could be developed and might further improve efficiency. Finally, although MI³ had good performance for the prediction of type 1 or type 2 myocardial infarction, it was not developed to distin guish between the two. Future algorithms to diagnose and differentiate between type 1 and type 2 myocardial infarction would be useful given the diagnostic challenge of doing so in clinical practice and that the treatment for these conditions differs.
We also acknowledge several limitations in our study design. In most patients in our cohort, serial cardiac troponin measurements were performed 6 h apart, which is longer than recommended by current inter national guidelines. 27 However, in our subgroup analysis stratified by time of serial sampling, the diagnostic performance of MI³ remained good regardless of the time interval between serial troponin measure ments. Although MI³ includes sex as a parameter in the model discrimination was not as good in women com pared to men. This finding probably reflects differences in the use of sexspecific and uniform thresholds to diagnose myocardial infarction between the data sets used to train and to validate the algorithm. 11 In the HighSTEACS trial, sexspecific thresholds were used in practice and to adjudicate the diagnosis of myocardial infarction in line with the recommendations of the universal definition of myocardial infarction. 16 These recommendations were not consistently applied in the populations used to train the algorithm. Performance could be improved with additional training of the algorithm in healthcare settings that use sexspecific diagnostic thresholds in practice. A further limitation is that we did not have access to data on ethnicity to evaluate whether diagnostic performance also varied across ethnic groups. Finally, although our analysis demonstrated that the low number of adverse cardiovascular outcomes at 1 year in patients classified as lowprobability by MI³ was reassuring, future studies evaluating outcomes after MI³ is implemented are needed to confirm the safety of this algorithm in clinical practice.
In consecutive patients undergoing serial cardiac troponin measurement for suspected acute coronary syn drome, the MI³ machine learning algorithm can accurately estimate the likelihood of myocardial infarction and predict subsequent adverse cardiovascular events. The model could improve the diagnostic pathways for myocardial infarction by accurately identifying patients at high risk to be targeted for prompt individualised treatment, and by allowing early discharge in patients at low risk.