Sepsis prediction, early detection, and identification using clinical text for machine learning: a systematic review

Abstract Objective To determine the effects of using unstructured clinical text in machine learning (ML) for prediction, early detection, and identification of sepsis. Materials and methods PubMed, Scopus, ACM DL, dblp, and IEEE Xplore databases were searched. Articles utilizing clinical text for ML or natural language processing (NLP) to detect, identify, recognize, diagnose, or predict the onset, development, progress, or prognosis of systemic inflammatory response syndrome, sepsis, severe sepsis, or septic shock were included. Sepsis definition, dataset, types of data, ML models, NLP techniques, and evaluation metrics were extracted. Results The clinical text used in models include narrative notes written by nurses, physicians, and specialists in varying situations. This is often combined with common structured data such as demographics, vital signs, laboratory data, and medications. Area under the receiver operating characteristic curve (AUC) comparison of ML methods showed that utilizing both text and structured data predicts sepsis earlier and more accurately than structured data alone. No meta-analysis was performed because of incomparable measurements among the 9 included studies. Discussion Studies focused on sepsis identification or early detection before onset; no studies used patient histories beyond the current episode of care to predict sepsis. Sepsis definition affects reporting methods, outcomes, and results. Many methods rely on continuous vital sign measurements in intensive care, making them not easily transferable to general ward units. Conclusions Approaches were heterogeneous, but studies showed that utilizing both unstructured text and structured data in ML can improve identification and early detection of sepsis.


INTRODUCTION
Sepsis is a life-threatening illness caused by the body's immune response to an infection that leads to multi-organ failure. 1 Annually, there are 31.5 million sepsis cases, 19.4 million severe sepsis cases, and 5.3 million sepsis deaths estimated in high-income countries. 2 Studies have shown that early identification of sepsis following rapid initiation of antibiotic treatment improves patient outcomes, 3 and 6 h of treatment delay is shown to increase the mortality risk by 7.6%. 4 Unfortunately, sepsis is commonly misdiagnosed and mistreated because deterioration with organ failure is also common in associated infections mentioning sepsis. [43][44][45] However, to the best of our knowledge, no reviews focus on the effect of utilizing unstructured clinical text for sepsis prediction, early detection, or identification; this makes it challenging to assess and utilize text in future ML and NLP sepsis research.

OBJECTIVE
The review aims to gain an overview of studies utilizing clinical text in ML for sepsis prediction, early detection, or identification.

MATERIALS AND METHODS
This systematic review follows the Preferred Reporting Items for Systematic review and Meta-Analyses guidelines. 46 Search strategy Relevant articles were identified from 2 clinical databases (PubMed and Scopus) and 3 computer science databases (ACM DL, dblp, and IEEE Xplore) using defined search terms. The 3 sets of search terms included: (1) "sepsis," "septic shock," or "systemic inflammatory response syndrome"; (2) "natural language processing," "machine learning," "artificial intelligence," "unstructured data," "unstructured text," "clinical note," "clinical notes," "clinical text," "free-text," "free text," "record text," "narrative," or "narratives"; and (3) detect, identify, recognize, diagnosis, predict, prognosis, progress, develop, or onset. Searches on clinical databases were performed using all 3 sets of search terms and excluded animal-related terms. Whereas searches on computer science databases only used the first set of search terms. No additional search restrictions, such as date, language, and publication status, were included. Additional articles were identified from relevant review articles or backward reference and forward citation searches of eligible articles. Complete search strategies are in Supplementary Table  S1.
The search was initially conducted using only computer science databases on December 10, 2019 and was updated to include clinical databases on December 14, 2020. The first search found that 4 of 454 articles met inclusion criteria, [47][48][49][50] and the second search uncovered 2 more articles that met inclusion criteria (6 of 1335 articles). 51,52 Those 2 searches did not contain the search terms: "systemic inflammatory response syndrome," "artificial intelligence," identify, recognize, diagnosis, prognosis, progress, develop, and onset. Hence, a search on May 15, 2021, including those terms, found 2 additional articles. 53,54 To ensure inclusion of other relevant articles, a broader search was conducted on September 3, 2021 to include the following terms: "unstructured data," "unstructured text," "clinical note," "clinical notes," "clinical text," "free-text," "free text," "record text," "narrative," or "narratives." This resulted in 1 additional article. 55

Study selection
Titles, abstracts, and keywords were screened using Zotero v5.0.96.3 (Corporation for Digital Scholarship, Vienna, VA) and Paperpile (Paperpile LLC, Cambridge, MA). Screening removed duplicates and articles that did not contain the following terms: (1) text, (2) notes, or (3) unstructured. Full-text articles were evaluated to determine if the study used unstructured clinical text for the identification, early detection, or prediction of sepsis onset in ML. Thus, selected articles had to rely on methods that automatically improve based on what they learn and not rely solely on human-curated rules. Additionally, articles solely focusing on predicting sepsis mortality were excluded as these articles are based on already established sepsis cases. Reviews, abstract-only articles, and presentations were removed. Additionally, a backward and forward search was performed on eligible full-text articles.

Data extraction
One author independently extracted data, which a second author verified. Any discrepancies were resolved either through discussion with the third author by assessing and comparing data to evidence from the studies or by directly communicating with authors from included articles. The following information was extracted: (1) general study information including authors and publication year, (2) data source, (3) sample size, (4) clinical setting, (5) sepsis infection definition, (6) task and objective, (7) characteristics of structured and unstructured data, (8) underlying ML and NLP techniques, and (9) evaluation metrics.

Selection process
The initial search identified 2268 articles from 5 databases and 5 additional articles 56-60 from 2 relevant review articles ( Figure 1). 43,44 From the 1817 unique articles, 1620 articles were excluded based on eligibility criteria described in the methods. After assessing the remaining 197 articles, most studies (189 of 197, ie, 96%) were excluded because they had not used or attempted to use unstructured clinical text in their ML models to identify, detect, or predict sepsis onset. For instance, there were sepsis-related studies that used text but for other purposes such as mortality prediction, 61-65 phenotyping, 66 visualization, 67 exploratory data analysis, 68 and manual chart review. [69][70][71] Additionally, 6 articles about infection detection, 60 central venous catheter adverse events, 58 postoperative sepsis adverse events, [72][73][74] and septic shock identification 75 were excluded because they used manually human-curated rules instead of ML methods that automatically learn from data. The remaining 8 eligible articles were used to perform backward and forward searches, [47][48][49][50][52][53][54][55] which led to the inclusion of 1 additional article. 51 This resulted in 9 articles for synthesis.

Clinical text used in models
The 9 studies utilized narrative notes written by nurses, [47][48][49][50][53][54][55] physicians, [49][50][51][52][53]55 or specialists [49][50][51]54,55 to document symptoms,  signs, diagnoses, treatment plans, care provided, laboratory test results, or reports. EHRs contain various types of clinical notes. A note covers an implicit time period or activity and describes events, hypotheses, interventions, and observations within the health care provider's responsibilities. The note's form depends on its function: an order, a plan, a prescription, an investigation or analysis report, a narrative or log of events, information for the next shifts, or a requirement for legal, medical, or administrative purposes. An episode of care begins when a patient is admitted to the hospital and ends when the patient is discharged. Throughout a patient's hospital stay, documentation can include chief complaints, history-and-physical notes, progress notes, reports, descriptions of various laboratory tests, procedures, or treatments, and a discharge summary. Chief complaints are the symptoms or complaints provided by a patient for why they are seeking care. 82 History-and-physical notes can include history about the current illness, medical history, social history, family history, a physical examination, a chief complaint, probable diagnosis, and a treatment plan. 83 Progress notes document care provided and a description of the patient's condition to convey events to other clinicians. 84 Free-text reports can include interpretations of echocardiograms, electrocardiograms (ECGs), or imaging results such as X-rays, computerized tomography scans, magnetic resonance imaging scans, and ultrasounds. At discharge, the health care personnel write a discharge summary note comprised of patient details, hospital admittance reason, diagnosis, conditions, history, progress, interventions, prescribed medications, and followup plans. [85][86][87] The discharge summary letter is a formal document used to transfer patient care to another provider for further treatment and follow-up care. [88][89][90] Studies have shown that nursing documentation differs from physician documentation. 91,92 Nurses document more about a patient's functional abilities than physicians, 91 and the information from notes used and the frequency of viewing and documenting differs between health care personnel. 92 Additionally, documentation varies between hospitals, 93,94 hospitals have different resources and practices, [95][96][97] and communicative behavior differs among professions in different wards. 98 Hence, the type of notes used, who wrote the notes, and purpose of the note will play a role in how the documentation is interpreted. 99 Table 2 provides information regarding documentation types, author of the note, time content of the data, time latency between documentation and availability in records, and the documentation frequency. In Figure 2, the relationship between hospital events and longitudinal data used to train models is shown. As sepsis develops in a patient over time, it shows there are typically delays between a patient's actual state, clinical observations, and recorded documentation, such as ICU vital signs, narrative notes, and ICD codes.
The included studies utilized the following types of notes: 6 studies used unstructured nursing-related documentation, 47,48,50,53-55 4 used physician notes, 50,52,53,55 3 used radiology reports, 50,54,55 3 used respiratory therapist progress notes, 50,54,55 2 used ED chief complaints, 47,51 2 used ECG interpretations, 50,54 2 used pharmacy reports, 50,54 2 used consultation notes, 50,52 1 used discharge summaries, 50 1 included mostly progress notes and history-and-physical notes, 49 and 3 used additional unspecified notes. 49,50,54 Not all notes used are listed. Liu et al 50 used all MIMIC-III notes to build a vocabulary of unique words, and discharge summaries were likely not used in predictions because they are unlikely to occur before Predict if a patient will develop sepsis to explore how numerical and textual features can be used to build a predictive model for early sepsis prediction. observations. Additionally, Hammoud et al 54 used all MIMIC-II notes except discharge summaries.
These 9 studies utilized clinical notes differently. For the unit of analysis, 6 studies used a single note, 47,48,50,52-54 1 used a set of many notes from a patient encounter, 49 1 used a set of many notes within a specific hour of consideration, 55 and 1 used keywords from notes. 51 To identify infection signs, Horng et al 47   time. As shown in Figure 3, studies can use windows differently, such as a window with the duration of the whole encounter, a window with a duration of hours before onset, non-overlapping sliding windows with a fixed duration until onset, or overlapping sliding windows with a fixed duration until onset. Culliton et al 49 Table 3 show the type of text and unit of analysis used. Additional details about variables and specific notes used are listed in Supplementary Table S3 (the types of notes and usage for Liu et al 50 Figure 4, single notes or a set of many notes are preprocessed and represented to extract features, whereas keywords are used as is. Then structured data can be added, and the data are used to train ML models.
As shown in Figures 3 and 4 and listed in Tables 1 and 3 and  Supplementary Tables S2 and S3, although all studies are related to sepsis, there are varying sample sizes, data types, inclusion criteria, and objectives. This heterogeneity makes it challenging to compare results for a meta-analysis.
Natural language processing and machine learning study outcomes To utilize text in ML, it must be transformed into a representation understandable by computers. In order to do that, Bag-of-words (BoW), 100 n-gram, term frequency-inverse document frequency (tfidf), and paragraph vectors (PV) 101 representations can be used. These representations can be improved using additional NLP techniques, such as stop word removal, lemmatization, and stemming. In addition, other useful features can be extracted from text using partof-speech (POS) tagging, named entity recognition, or Latent Dirichlet Allocation (LDA) topic modeling. 102 In recent years, neural networks (NNs) have shown high predictive performance. As a result, many state-of-the-art results have been achieved using NNs to learn Figure 2. Overview of data from a patient timeline used to create models. The proximity of events toward a patient's actual state and the actual documentation recorded in the electronic health records typically has delays. Green represents patient states as sepsis develops in a patient. Yellow are observations made by clinicians. Documentation includes ICU vital signs a in pink, narrative notes in blue, and ICD codes in orange. ICU vital sign a documentation can be instantaneous, narrative notes can be written after observations are made, and ICD codes are typically registered after a patient is discharged. PIVC: peripheral intravenous catheter. a Vital signs include temperature, pulse, blood pressure, respiratory rate, oxygen saturation, and level of consciousness and awareness. Figure 3. Different types of windows were used to obtain longitudinal data. Each gray box represents a single window, which can vary in duration (length of time) depending on the study. One window with the whole encounter means the study used a single window containing data with a duration of the whole encounter from admittance until discharge. One window before onset signifies data from a window with a duration of time before sepsis, severe sepsis, or septic shock onset. Sliding windows are consecutive windows until before sepsis, severe sepsis, or septic shock onset; this includes non-overlapping and overlapping sliding windows. Non-overlapping sliding windows indicate that data within one window of a fixed duration does not contain data in the next window. In contrast, overlapping sliding windows indicate windows of a fixed duration overlap, and data within one window will be partially in the next window. Techniques: • Convert to lowercase • Remove frequent tokens and non-alphanumeric characters • Negation Culliton et al. 49 (2017) Clinical notes (mostly progress notes and history-and-physical notes) One patient encounter ¼ many notes Representation: • GloVe (300-dimensional vector) þ summing word vectors Techniques: • Concatenated all notes for an encounter into a single text block Delahanty et al. 51  a suitable representation of texts, often known as embeddings. 103 Embedding techniques include Global Vectors for Word Representation (GloVe), 104 Word2Vec as a continuous bag-of-words (CBOW) model or skip-gram model, 105 Bidirectional Encoder Representations from Transformers (BERT), 106 and ClinicalBERT. 107 The advantage of using embeddings is that it retains the sequential information lost in a BoW representation and does feature extraction automatically. 103 Utilized text processing operations are in Table 3. One study used keyword extraction instead of text processing operations. 51 Six studies used tokenization of words for word-level representation, 47-50,52,54 1 also tried PV for document-level representation, 48 and another used the first 40 tokens in a sentence to get sentence-level representation and averaged sentence-level representations to provide document-level representation. 53 The most common technique for improving representation was token removal, such as removing rare tokens, 47,50,52-54 frequent tokens, 48,50,53,54 punctuation or special characters, 47,48,50,52,53 and stop words. 52,53 The most frequently used representation was tf-idf, 48,52-55 followed by BoW, 47,48,50,54 LDA, 47,52 GloVe, 49,50 ClinicalBERT, 53,55 bi-gram, 47 CBOW, 48 and PV. 48 Three studies created a vocabulary of unique terms using BoW, 50 CBOW, 48 and tf-idf. 53 Apostolova and Velez 48 found that using structured data was inadequate for identifying infection in nursing notes, so they used antibiotic usage and word embeddings to create a labeled dataset of notes with infection, suspected infection, and no infection. Additionally, Horng et al 47   Techniques:  . The unit of analysis used to train machine learning models for the included studies was either (1) a single note, (2) a set of many notes, or (3) keywords. In general, text was preprocessed and represented as features interpretable by a computer, then structured data were added, and the data were used to fit machine learning models.
are: (1) For sepsis, severe sepsis, or septic shock, Goh et al 52 classified the top 100-topics into 7 categories: clinical condition or diagnosis, communication between staff, laboratory test order or results, nonclinical condition updates, social relationship information, symptoms, and treatments or medication. (2) Liu et al's 50 most predictive NLP terms for the pre-shock versus non-shock state include "tube," "crrt," "ards," "vasopressin," "portable," "failure," "shock," "sepsis," and "dl." (3) Horng et al's 47 most predictive terms or topics for having an infection in the ED include "cellulitis," "sore_throat," "abscess," "uti," "dysuria," "pneumonia," "redness_swelling," "erythema," "swelling," "redness, celluititis, left, leg, swelling, area, rle, arm, lle, increased, erythema," "abcess, buttock, area, drainage, axilla, groin, painful, thigh, left, hx, abcesses, red, boil," and "cellulitis, abx, pt, iv, infection, po, keflex, antibiotics, leg, treated, started, yesterday." Whereas the least predictive terms or topics for not having an infection include "motor vehicle crash," "laceration," "epistaxis," "pancreatitis", "etoh"(ethanol for drunkenness), "etoh, found, vomiting, apparently, drunk, drinking, denies, friends, trauma_neg, triage," and "watching, tv, sitting, sudden_onset, movie, television, smoked, couch, pt, pot, 5pm, theater." ML methods for detecting sepsis using clinical text included: ridge regression, 49 lasso regression, 54 logistic regression, 47,48,52 Naïve Bayes (NB), 47 support vector machines (SVMs), 47,48 K-nearest neighbors (KNNs), 48 random forest (RF), 47,52 gradient boosted trees (GBTs), [50][51][52]55 gated recurrent unit (GRU), 50 and long short-term memory (LSTM). 53 Although the methods are listed separately, 2 studies combined different ML methods 48,52 (see Supplementary Table S4 for details). Ridge and lasso regression are linear regression methods that constrain the model parameters. A linear regression model is represented as y is the predicted value, x is the input variable and b 1 and b 0 are model parameters. Model parameters are estimated by minimizing P N i¼1 y i À b y i ð Þ 2 , where y i is the label and N is the number of training samples. In ridge and lasso regression, where k is a hyperparameter that trades-off between fitting the data and model complexity, and f ðzÞ ¼ z 2 for ridge regression or f ðzÞ ¼ jzj for lasso regression. Logistic regression is a classification method that models P yjx ð Þ, which is the probability of a class y given the feature x. The logistic regression model is defined as f NB is a Bayesian network that eases computation by assuming all input variables are independent given the outcome. 108 SVM is an extension of a support vector classifier that separates training data points into 2 class regions using a linear decision boundary and classifies new data points based on which region they belong to. To accommodate for non-linearity in the data, SVM enlarges the feature space by applying kernels. 109 KNNs assume similar data points are close together and use similarity measures to classify new data based on "proximity" to points in the training data. 110 RF and GBT are ensemble models that use a collection of decision trees to improve the predictive performance of the models. RF classification takes the majority vote of a collection of trees to reduce the decision tree variance. 111 GBT trains decision trees sequentially so that each tree trains based on information from previously trained trees. 112,113 To avoid overfitting, each tree is scaled by a hyperparameter k, often known as the shrinkage parameter or learning rate that controls the rate the model learns. Recurrent neural networks (RNNs) are a type of NN with recurrent connections and assume that the input data have an ordering, for example, words in a sentence. [114][115][116] RNN can be seen as a feed-forward NN with a connection from output to input. 115 GRU 117 and LSTM 118 are improved variations of RNN with gating mechanisms to combat the vanishing gradient problem. The improvements help the models to better model long-term temporal dependencies. To tune hyperparameters, grid-search and Bayesian optimization were used in the studies. 47,48,50,53,54 The grid-search method iterates exhaustively through all hyperparameter values within a pre-defined set of values to find the optimal hyperparameter with respect to a validation set. In contrast, the Bayesian optimization method makes informed choices on which values to evaluate using the Bayes formula. The goal of using Bayesian optimization for hyperparameter tuning is to minimize the number of values to evaluate.
All studies reported evaluation results for different algorithms or data types and almost all reported area under the receiver operating characteristic curve (AUC) values except 1. 48 Figure 5 shows differences in AUC values for infection ( Figure 5A), sepsis ( Figure  5B), septic shock ( Figure  5C), and severe sepsis ( Figure 5E) when using structured data only, text data only, or a combination of structured and text data. Studies that compared their methods for different hours prior to onset are also included ( Figure 5D and F), the lines connecting the points are to visually separate the methods and do not indicate changing AUC values over time. This figure compares data type usage and model performance within an individual study; it should not be used to compare AUC values between subfigures and studies because the studies used different cohorts, sepsis definitions, and hours before onset. Additionally, sepsis, severe sepsis, and septic shock have different manifestations. 119,120 [50][51][52]55 followed by logistic regression, 47,48,52 SVMs, 47,48 RF, 47,52 ridge regression, 49 lasso regression, 54 NB, 47 KNNs, 48 GRU, 50 and LSTM. 53 For hyperparameter tuning, 3 studies used the grid-search method 47,48,54 and 2 used the Bayesian optimization method 50,53 (hyperparameter tuning was provided by personal communication with Ran Liu on September 7, 2021 and Fatemeh Amrollahi on September 7, 2021). Delahanty et al, 51 Hammoud et al, 54 Goh et al, 52  Although results are difficult to compare directly because of study heterogeneity, most results suggest that utilizing both structured data and text generally results in better performance for sepsis identification and early detection.

Identification, early detection, prediction, and method transferability
Nine studies utilized clinical text for sepsis identification, early detection, or prediction. As all identified studies focus on the identification or early detection of sepsis within a fixed time frame, this indicates much work is still needed before sepsis prediction can use text from complete patient histories. Studies from this review focus mainly on the ICU and ED, and the addition of continuous measurements of vital signs for sepsis makes generalizability to the ward units limited. However, Culliton et al 49 was successful in detecting sepsis early utilizing only the text from EHR clinical notes, which is a promising approach for all inpatients. Additionally, Horng et al 47 showed that their ML model performed on subsets of specific patient cohorts like pneumonia or urinary tract infection. The different ML methods and NLP techniques from each study may be applicable for different retrospective cohort or case-control studies. Though the studies have varying sepsis definitions, cohorts, ML methods, and NLP techniques, overall, they show that using clinical text and structured data can improve sepsis identification and early detection. Unstructured clinical text predicts sepsis 48-12 h before onset, while structured data predicts sepsis closer to onset (<12 h before).

Sepsis definition impact
In ML, many studies rely heavily on sepsis definitions and ICDcodes to identify patient cohort datasets for sepsis studies. 9,11,13  (2017) Identify  80 Although a consensus sepsis definition exists, 1 not all definition elements will be present in a sepsis patient because sepsis is a very heterogeneous syndrome 127 and the infection site is difficult to identify correctly. 128 Many patients with sepsis are often misdiagnosed with other diseases such as respiratory failure 129 and pneumonia. 129,130 In practice, hospitals also have varying sepsis coding methods. [131][132][133][134][135] As the sepsis definitions change, studies also tend to use the most current definition in their study. A recent study that used different sepsis definitions to generate patient cohorts found significant heterogeneous characteristics and clinical outcomes between cohorts. 136 Similarly, previous work by Liu et al 137 demonstrated that using different infection criteria resulted in a different number of patients and slightly different outcomes. Similar to how changes in the definition and varying coding methods can affect sepsis mortality outcomes, 138 the sepsis definition and codes used in ML studies will likely change the outcome, results, and reporting methods. Thus, future studies should acknowledge that sepsis is a syndrome and clearly characterize each sign of sepsis to reflect the heterogeneity in the definition.

Suggestions for future studies
Predicting sepsis earlier than 12 h prior to sepsis onset can reduce treatment delays and improve patient outcomes. 3,4 Because predictions 48-12 h before sepsis onset appear to rely more on clinical text than structured data, additional NLP techniques should be considered for future ML studies. Additionally, since the sepsis definition used will change the cohort, this indicates opportunities to expand the cohort. Like Apostolova and Velez, 48 who determined their cohort by finding notes describing the use of antibiotics. It should be possible to determine cohorts by using notes describing infection signs (eg, fever, hypotension, or deterioration in mental status), indicators of diseases that sepsis is misdiagnosed with (eg, pulmonary embolism, adrenal insufficiency, diabetic ketoacidosis, pancreatitis, anaphylaxis, bowel obstruction, hypovolemia, colitis, or vasculitis), or medication effect and toxin ingestion, overdose, or withdrawal. 139 NLP methods from infectious diseases known to trigger sepsis can be incorporated to extract infection signs and symptoms from the text for determining potential sepsis signs, patient groups, and risk factors. For instance, many sepsis patients are often admitted with pneumonia, and there are several studies about identifying pneumonia from radiology reports using NLP. 23,140,141 Additionally, heterogeneous sepsis signs or symptoms might be identified by utilizing NLP features for detecting healthcare-associated infections risk patterns 59 or infectious symptoms. 142 Information from other NLP related reviews about using clinical notes can also be applied, such as: challenges to consider, 16 clinical information extraction tools and methods, 18 methods to overcome the need for annotated data, 22 different embedding techniques, 143,144 sources of labeled corpora, 143 transferability of methods, 145 and processing and analyzing symptoms. 146 Moreover, heterogeneous or infectious diseases, with overlapping signs and symptoms of other diseases, can utilize similar sepsis ML and NLP methods to improve detection. The identified studies did not utilize complete patient history data. Thus, future research utilizing complete patient history data can study if sepsis risk can be predicted earlier than 48 h by incorporating sepsis risk factors, such as comorbidities, 7 chronic diseases, 147 patient trajectories, 148 or prior infection incidents. 149 Limitations This review has several limitations. The narrow scope of including only studies about utilizing clinical text for sepsis detection or prediction could have missed studies that use other types of text for sepsis detection or prediction. For example, search terms did not include "early warning system," "feature extraction," and "topic modeling." Additionally, search terms did not include possible sources of infection for sepsis, such as bloodstream infection, catheterassociated infection, pneumonia, and postoperative surgical complications. Further, the sensitivity to detect sepsis in text, structured data, or the combined data from these will depend on the timestamps these data recordings have in the EHR. These timestamps may vary depending on the data used to inform the study or the different systems implemented at different hospitals. The articles identified in this review had a homogenous choice of structured data (ie, demographics, vital signs, and laboratory measurements). Of those, laboratory test results have the largest time lag, around 1-2 h to obtain the blood test results. 150 Thus, the good performance of text to detect sepsis in these articles are unlikely explained fully by the time lag between measurement and recording of the structured data. This review thus shows that it is possible to detect sepsis early using text, with or without the addition of structured data.

CONCLUSION
Many studies about sepsis detection exist, but very few studies utilize clinical text. Heterogeneous study characteristics made it difficult to compare results; however, the consensus from most studies was that combining structured data with clinical text improves identification and early detection of sepsis. There is a need to utilize the unstructured text in EHR data to create early detection models for sepsis. The lack of utilizing the complete patient history in early prediction models for sepsis is an opportunity for future ML and NLP studies.

FUNDING
Financial support for this study was provided by the Computational Sepsis Mining and Modelling project through the Norwegian University of Science and Technology Health Strategic Area.

AUTHOR CONTRIBUTIONS
MYY and ØN conceptualized the study and design with substantial clinical insight from LTG. MYY conducted the literature search and initial analysis, LTG verified results, and ØN resolved discrepancies. All authors participated in data analysis and interpretation. MYY drafted the manuscript, which LTG and ØN critically revised.

SUPPLEMENTARY MATERIAL
Supplementary material is available at Journal of the American Medical Informatics Association online.

ACKNOWLEDGMENTS
We thank those from the Gemini Center for Sepsis Research group for valuable discussions and recommendations related to clinical databases, missing search terms, and presenting results. Specifically, Ms Lise Husby Høvik (RN), Dr Erik Solligård, Dr Jan Kristian Damå s, Dr Jan Egil Afset, Dr Kristin Vardheim Liyanarachi, Dr Randi Marie Mohus, and Dr Anuradha Ravi.