Send to

Choose Destination
LREC Int Conf Lang Resour Eval. 2016 May;2016:3772-3778.

Annotating Logical Forms for EHR Questions.

Author information

School of Biomedical Informatics University of Texas Health Science Center at Houston, Houston TX, USA.
Lister Hill National Center for Biomedical Communications National Library of Medicine, National Institutes of Health, Bethesda MD, USA.


This paper discusses the creation of a semantically annotated corpus of questions about patient data in electronic health records (EHRs). The goal is to provide the training data necessary for semantic parsers to automatically convert EHR questions into a structured query. A layered annotation strategy is used which mirrors a typical natural language processing (NLP) pipeline. First, questions are syntactically analyzed to identify multi-part questions. Second, medical concepts are recognized and normalized to a clinical ontology. Finally, logical forms are created using a lambda calculus representation. We use a corpus of 446 questions asking for patient-specific information. From these, 468 specific questions are found containing 259 unique medical concepts and requiring 53 unique predicates to represent the logical forms. We further present detailed characteristics of the corpus, including inter-annotator agreement results, and describe the challenges automatic NLP systems will face on this task.


electronic health records; question answering; semantic parsing


Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center