Send to

Choose Destination
Sleep Med. 2013 Nov;14(11):1199-207. doi: 10.1016/j.sleep.2013.04.022. Epub 2013 Aug 16.

Scoring accuracy of automated sleep staging from a bipolar electroocular recording compared to manual scoring by multiple raters.

Author information

Department of Medicine, University of California, San Diego, La Jolla, CA, United States; Veterans Affairs San Diego Healthcare System, San Diego, CA, United States. Electronic address:



Electroencephalography (EEG) assessment in research and clinical studies is limited by the patient burden of multiple electrodes and the time needed to manually score records. The objective of our study was to investigate the accuracy of an automated sleep-staging algorithm which is based on a single bipolar EEG signal.


Three raters each manually scored the polysomnographic (PSG) records from 44 patients referred for sleep evaluation. Twenty-one PSG records were scored by Rechtschaffen and Kales (R&K) criteria (group 1) and 23 PSGs were scored by American Academy of Sleep Medicine (AASM) 2007 criteria (group 2). Majority agreement was present in 98.4% of epochs and was used for comparison to automated scoring from a single EEG lead derived from the left and right electrooculogram.


The κ coefficients for interrater manual scoring ranged from 0.46 to 0.89. The κ coefficient for the auto algorithm vs manual scoring by rater ranged from 0.42 to 0.63 and was 0.61 (group 1, κ=0.61 and group 2, κ=0.62) for majority agreement for all studies. The mean positive percent agreement across subjects and stages was 72.6%, approximately 80% for stages wake (78.3%), stage 2 sleep (N2) (80.9%), and stage 3 sleep (N3) (78.1%); the percentage slightly decreased to 73.2% for rapid eye movement (REM) sleep and dropped to 31.9% for stage 1 sleep (N1). Differences in agreement were observed based on raters, obstructive sleep apnea (OSA) severity, medications, and signal quality.


Our study demonstrated that automated scoring of sleep obtained from a single-channel of forehead EEG results in agreement to majority manual scoring are similar to results obtained from studies of manual interrater agreement. The benefit in assessing auto-staging accuracy with consensus agreement across multiple raters is most apparent in patients with OSA; additionally, assessing auto-staging accuracy limited disagreements in patients on medications and in those with compromised signal quality.


Automatic sleep scoring; Electroencephalography; Electrooculography; Polysomnography; Sleep stages; Validation studies

[Indexed for MEDLINE]
Free full text

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center