Display Settings:

Format

Send to:

Choose Destination
    J Digit Imaging. 2009 Aug;22(4):348-56. Epub 2008 Apr 5.

    Development of a Google-based search engine for data mining radiology reports.

    Source

    Mallinckrodt Institute of Radiology, Washington University School of Medicine, 510 South Kingshighway Boulevard, Campus Box 8131, Saint Louis, MO 63110, USA. erinjeri@gmail.com

    Abstract

    The aim of this study is to develop a secure, Google-based data-mining tool for radiology reports using free and open source technologies and to explore its use within an academic radiology department. A Health Insurance Portability and Accountability Act (HIPAA)-compliant data repository, search engine and user interface were created to facilitate treatment, operations, and reviews preparatory to research. The Institutional Review Board waived review of the project, and informed consent was not required. Comprising 7.9 GB of disk space, 2.9 million text reports were downloaded from our radiology information system to a fileserver. Extensible markup language (XML) representations of the reports were indexed using Google Desktop Enterprise search engine software. A hypertext markup language (HTML) form allowed users to submit queries to Google Desktop, and Google's XML response was interpreted by a practical extraction and report language (PERL) script, presenting ranked results in a web browser window. The query, reason for search, results, and documents visited were logged to maintain HIPAA compliance. Indexing averaged approximately 25,000 reports per hour. Keyword search of a common term like "pneumothorax" yielded the first ten most relevant results of 705,550 total results in 1.36 s. Keyword search of a rare term like "hemangioendothelioma" yielded the first ten most relevant results of 167 total results in 0.23 s; retrieval of all 167 results took 0.26 s. Data mining tools for radiology reports will improve the productivity of academic radiologists in clinical, educational, research, and administrative tasks. By leveraging existing knowledge of Google's interface, radiologists can quickly perform useful searches.

    PMID:
    18392657
    [PubMed - indexed for MEDLINE]
    PMCID:
    PMC3043709
    Free PMC Article

    Images from this publication.See all images (3) Free text

    Fig 2
    Fig 1
    Fig 3

      Supplemental Content

      Icon for Springer Icon for PubMed Central

      Save items

      loading

      Recent activity

      Your browsing activity is empty.

      Activity recording is turned off.

      Turn recording back on

      See more...
      Write to the Help Desk