U.S. flag

An official website of the United States government

Display Settings:

Items per page

PMC Full-Text Search Results

Items: 6

1.
Figure 3

Figure 3. From: Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis.

Area under the receiver operating characteristic curve (AUROC) of a random forest predicting whether data will be present or missing. (A) Missing completely at random simulation. (B) Missing at random simulation. (C) Missing not at random simulation.

Brett K Beaulieu-Jones, et al. JMIR Med Inform. 2018 Jan-Mar;6(1):e11.
2.
Figure 4

Figure 4. From: Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis.

Imputation accuracy measured by root mean square error (RMSE) across simulations 1-3. (A) Missing completely at random (MCAR). (B) Missing at random (MAR). (C) Missing not at random (MNAR). FI: fancyimpute; KNN: k-nearest neighbors; MICE: Multivariate Imputation by Chained Equations; pmm: predictive mean matching; RF: random forest; SVD: singular value decomposition.

Brett K Beaulieu-Jones, et al. JMIR Med Inform. 2018 Jan-Mar;6(1):e11.
3.
Figure 5

Figure 5. From: Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis.

Imputation root mean square error (RMSE) for a subset of 10,000 patients from simulation 4. A total of 12 imputation methods were tested (x-axis), and each color corresponds to a Logical Observation Identifiers Names and Codes (LOINC) code. The black line shows the theoretical error from random sampling. FI: fancyimpute; KNN: k-nearest neighbors; MICE: Multivariate Imputation by Chained Equations; pmm: predictive mean matching; RF: random forest; SVD: singular value decomposition.

Brett K Beaulieu-Jones, et al. JMIR Med Inform. 2018 Jan-Mar;6(1):e11.
4.
Figure 1

Figure 1. From: Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis.

Two general paradigms are commonly used to describe missing data. Missing data are considered ignorable if the probability of observing a variable has no relation to the value of the observed variable and are considered nonignorable otherwise. The second paradigm divides missingness into 3 categories: missing completely at random (MCAR: the probability of observing a variable is not dependent on its value or other observed values), missing at random (MAR: the probability of observing a variable is not dependent on its own value after conditioning on other observed variables), and missing not at random (MNAR: the probability of observing a variable is dependent on its value, even after conditioning on other observed variables). The x-axis indicates the extent to which a given value being observed depends on other values of other observed variables. The y-axis indicates the extent to which a given value being observed depends on its own value.

Brett K Beaulieu-Jones, et al. JMIR Med Inform. 2018 Jan-Mar;6(1):e11.
5.
Figure 6

Figure 6. From: Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis.

Assessment of multiple imputation for each method. Using simulation 4, missing values were imputed multiple times with each method. The x-axes show the root mean square error (RMSE) between the imputed data and the observed values. The y-axes show the RMSE between multiple imputations of the same data. The axis scales vary between panels to better show the range of variation. The laboratory tests are indicated by the color of the points. The black diagonal line represents unity (y=x). Panels are ordered by each method’s mean deviation (MD) from unity, indicated in the top left corner of each panel. In the last 7 panels, the unity line is not visible because the variation between multiple imputations was close to zero. FI: fancyimpute; KNN: k-nearest neighbors; MICE: Multivariate Imputation by Chained Equations; pmm: predictive mean matching; RF: random forest; SVD: singular value decomposition.

Brett K Beaulieu-Jones, et al. JMIR Med Inform. 2018 Jan-Mar;6(1):e11.
6.
Figure 2

Figure 2. From: Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis.

Summary of missing data across 143 clinical laboratory measures. (A) After ranking the clinical laboratory measures by the number of total results, the percentage of patients missing a result for each test was plotted (red points). At each rank, the percentage of complete cases for all tests of equal or lower rank were also plotted (blue points). Only variables with a rank ≤75 are shown. The vertical bar indicates the 28 tests that were selected for further analysis. (B) The full distribution of patient median ages is shown in blue, and the fraction of individuals in each age group that had a complete set of observations for tests 1-28 are shown in red. (C) Within the 28 laboratory tests that were selected for imputation analyses, the mean number of missing tests is depicted as a function of age. (D) Within the 28 laboratory tests that were selected for imputation, the mean number of missing tests is depicted as a function of body mass index (BMI). (E) Accuracy of a random forest predicting the presence or absence of all 143 laboratory tests. AUROC: area under the receiver operating characteristic curve. (F) Accuracy of a random forest predicting the presence or absence of the top 28 laboratory tests, by Logical Observation Identifiers Names and Codes (LOINC).

Brett K Beaulieu-Jones, et al. JMIR Med Inform. 2018 Jan-Mar;6(1):e11.

Display Settings:

Items per page

Supplemental Content

Recent activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...
Support Center