Send to

Choose Destination
See comment in PubMed Commons below
Teach Learn Med. 2018 Jan 30:1-9. doi: 10.1080/10401334.2017.1414609. [Epub ahead of print]

Quality Evaluation Scores are no more Reliable than Gestalt in Evaluating the Quality of Emergency Medicine Blogs: A METRIQ Study.

Author information

a Department of Emergency Medicine , College of Medicine, University of Saskatchewan , Saskatoon , Saskatchewan , Canada.
b Center for Education Research & Innovation, Schulich School of Medicine and Dentistry, Western University , London , Ontario , Canada.
c Department of Emergency Medicine University of Alberta , Edmonton , Alberta , Canada.
d Department of Emergency Medicine , McMaster University , Hamilton , Ontario , Canada.
e Department of Health Professions Education at HealthPartners Institute , Bloomington , Minnesota , USA.
f Department of Emergency Medicine , Northwestern University , Chicago , Illinois , USA.
g Department of Emergency Medicine , University of California Los Angeles , Los Angeles , California , USA.
h College of Medicine, University of Saskatchewan , Saskatoon , Saskatchewan , Canada.
i Department of Emergency Medicine , University of Minnesota , Minneapolis , Minnesota , USA.
j Department of Emergency Medicine , Queen's University, Kingston , Ontario , Canada.
k Department of Medicine , Western University , London , Ontario , Canada.


Construct: We investigated the quality of emergency medicine (EM) blogs as educational resources.


Online medical education resources such as blogs are increasingly used by EM trainees and clinicians. However, quality evaluations of these resources using gestalt are unreliable. We investigated the reliability of two previously derived quality evaluation instruments for blogs.


Sixty English-language EM websites that published clinically oriented blog posts between January 1 and February 24, 2016, were identified. A random number generator selected 10 websites, and the 2 most recent clinically oriented blog posts from each site were evaluated using gestalt, the Academic Life in Emergency Medicine (ALiEM) Approved Instructional Resources (AIR) score, and the Medical Education Translational Resources: Impact and Quality (METRIQ-8) score, by a sample of medical students, EM residents, and EM attendings. Each rater evaluated all 20 blog posts with gestalt and 15 of the 20 blog posts with the ALiEM AIR and METRIQ-8 scores. Pearson's correlations were calculated between the average scores for each metric. Single-measure intraclass correlation coefficients (ICCs) evaluated the reliability of each instrument.


Our study included 121 medical students, 88 EM residents, and 100 EM attendings who completed ratings. The average gestalt rating of each blog post correlated strongly with the average scores for ALiEM AIR (r = .94) and METRIQ-8 (r = .91). Single-measure ICCs were fair for gestalt (0.37, IQR 0.25-0.56), ALiEM AIR (0.41, IQR 0.29-0.60) and METRIQ-8 (0.40, IQR 0.28-0.59).


The average scores of each blog post correlated strongly with gestalt ratings. However, neither ALiEM AIR nor METRIQ-8 showed higher reliability than gestalt. Improved reliability may be possible through rater training and instrument refinement.


blogs; online educational resources; quality; reliability; social media

PubMed Commons home

PubMed Commons


    Supplemental Content

    Full text links

    Icon for Taylor & Francis
    Loading ...
    Support Center