Development and application of the ocular immune-mediated inflammatory diseases ontology enhanced with synonyms from online patient support forum conversation

Background Unstructured text created by patients represents a rich, but relatively inaccessible resource for advancing patient-centred care. This study aimed to develop an ontology for ocular immune-mediated inflammatory diseases (OcIMIDo), as a tool to facilitate data extraction and analysis, illustrating its application to online patient support forum data. Methods We developed OcIMIDo using clinical guidelines, domain expertise, and cross-references to classes from other biomedical ontologies. We developed an approach to add patient-preferred synonyms text-mined from oliviasvision.org online forum, using statistical ranking. We validated the approach with split-sampling and comparison to manual extraction. Using OcIMIDo, we then explored the frequency of OcIMIDo classes and synonyms, and their potential association with natural language sentiment expressed in each online forum post. Findings OcIMIDo (version 1.2) includes 661 classes, describing anatomy, clinical phenotype, disease activity status, complications, investigations, interventions and functional impacts. It contains 1661 relationships and axioms, 2851 annotations, including 1131 database cross-references, and 187 patient-preferred synonyms. To illustrate OcIMIDo's potential applications, we explored 9031 forum posts, revealing frequent mention of different clinical phenotypes, treatments, and complications. Language sentiment analysis of each post was generally positive (median 0.12, IQR 0.01–0.24). In multivariable logistic regression, the odds of a post expressing negative sentiment were significantly associated with first posts as compared to replies (OR 3.3, 95% CI 2.8 to 3.9, p < 0.001). Conclusion We report the development and validation of a new ontology for inflammatory eye diseases, which includes patient-preferred synonyms, and can be used to explore unstructured patient or physician-reported text data, with many potential applications.


A. Supplementary
Summary of the key biomedical ontologies.

BFO
The "basic formal ontology" was designed in 2005 for use in supporting information retrieval, analysis and integration in scientific and other domains (9).

DOID
The "human disease ontology" was developed in 2011, this ontology initially included over 8043 terms relating to inherited, developmental, and acquired human disease (10). In 2019 there were over 17930 terms.

HPO
The "human phenotype ontology" was developed in 2008 and included 8000 distinct phenotypic features (11), which have been expanded over time to include ocular symptoms (12). The 2018 HPO release included 1106 terms relating to ocular phenotypes and 968 synonyms related to these terms; 7702 annotations of ocular phenotypes to 2770 rare disorders (13).

ORDO
The "orphanet rare disease ontology" has developed over the past two decades into a formal ontology in 2014. Terms in ORDO correspond to specific rare diseases and their relationships to genes and other features (14). The 2018 version included 1202 rare eye disease-related entries (13). In 2019 there were 14259 classes.

PATO
The "phenotype and trait ontology" was developed in 2002, and includes 2730 classes relating to phenotypes and traits with associated symptoms (6).

RO
The Open Biological and Biomedical Ontologies Foundry "relations ontology" was developed in 2005 with 80 terms as of 2018, and a collection of relations intended for use across a wide variety of biological ontologies (15).

UBERON
The "uber-anatomy ontology" was developed in 2012 and originally consisted of over 6500 classes representing a variety of anatomical entities, including eye anatomy (16).

ICD-10
The "International Classification of Diseases" version 10 began in 1983 and was first used in 1994. ICD-10 contains codes for diseases, signs and symptoms, abnormal findings, complaints, social circumstances, and external causes of injury or diseases (18). ICD version 11 is due to come into effect in 2022 and is an ontology-based terminology and classification system.

Read Codes
"Read codes" were introduced in the 1980s to provide a structured vocabulary for standardised electronic coding in primary care general practice in the United Kingdom (19). Read codes are not an ontology.

SNOMED-CT
The "Systematized Nomenclature of Medicine -Clinical Terms" was created in 1999, developed by the UK National Health Service (NHS) and released in 2002. It is a systematically organised ontology-based collection of medical terms, synonyms and definitions, used in electronic health records (20).

Classes
Classes represent the main entities of our domain. Classes can have subclasses to further describe our area of interest. For example, a class, "uveitis" can have "anterior uveitis" and "posterior uveitis" as subclasses.

Relationships
Relationships link classes together to provide a deeper meaning and understanding. Some relationship examples include: "adjacent to", "occurs in", or "part of" (See Table 2 for other relationships).

Axioms
Axioms are formal rules (also known as logical relationships) that are inferred from the ontology structure and are machine readable, allowing deeper understanding of how the classes relate to one another, facilitating semantic interoperability. For example, if "retinitis" is a subclass of "posterior uveitis", and "posterior uveitis" is a subclass of "uveitis", then the ontology infers the axiom that "retinitis" is a subclass of "uveitis".

Synonyms
A synonym is a word or phrase that means exactly or nearly the same as a label in the ontology. Commonly synonyms in an ontology are determined by "exact", "related", "broad", or "narrow". Table 5 Summary of OcIMIDo sources, count of cross-references, and synonyms.   Figure 4: Illustration of synonym inclusion difference for a given class. Showing the amount of available data for analysis increased -for example, adding two synonyms for mycophenolate increased the number of posts mentioning this drug by 30.