Send to

Choose Destination
Bioinformatics. 2019 Sep 3. pii: btz678. doi: 10.1093/bioinformatics/btz678. [Epub ahead of print]

Applying Citizen Science to Gene, Drug, and Disease Relationship Extraction from Biomedical Abstracts.

Author information

Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA.



Biomedical literature is growing at a rate that outpaces our ability to harness the knowledge contained therein. To mine valuable inferences from the large volume of literature, many researchers use information extraction algorithms to harvest information in biomedical texts. Information extraction is usually accomplished via a combination of manual expert curation and computational methods. Advances in computational methods usually depends on the time-consuming generation of gold standards by a limited number of expert curators. Citizen science is public participation in scientific research. We previously found that citizen scientists are willing and capable of performing named entity recognition of disease mentions in biomedical abstracts, but did not know if this was true with relationship extraction.


In this paper, we introduce the Relationship Extraction Module of the web-based application Mark2Cure and demonstrate that citizen scientists can perform relationship extraction. We confirm the importance of accurate named entity recognition on user performance of relationship extraction and identify design issues that impacted data quality. We find that the data generated by citizen scientists can be used to identify relationship types not currently available in the Mark2Cure Relationship Extraction Module. We compare the citizen science-generated data with algorithm-mined data and identify ways in which the two approaches may complement one another. We also discuss opportunities for future improvement of this system, as well as the potential synergies between citizen science, manual biocuration, and natural language processing.


Mark2Cure platform: source code: and analysis code for this paper:


Supplementary data are available at Bioinformatics online.

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center