A cost-effective, case-control study on the association between breast cancer and pregnancy through web mining

Annu ORNL Biomed Sci Eng Cent Conf. 2013 May:2013:1-4. doi: 10.1109/BSEC.2013.6618493.

Abstract

We report a case-control, breast cancer epidemiological study through mining people stories from the Internet. The aim of the study is to test whether mining openly available, personal stories from the Internet can be a cost-effective way for reliable epidemiological discoveries. As a case study, we focus on the association between breast cancer risk and pregnancy, which is clearly established through controlled clinical survey studies. Specifically, we mined 30,000 online obituary articles. Replicating a case-control study design, our web mining based approach confirmed the general trends reported by traditional epidemiological studies. Our web mining study demonstrates promising preliminary evidence that online content mining can be a cost-effective way for epidemiological knowledge discovery.