Logo of plosonePLoS OneView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
PLoS ONE. 2009; 4(4): e5134.
Published online Apr 6, 2009. doi:  10.1371/journal.pone.0005134
PMCID: PMC2661376

Molecular Phenotypes Distinguish Patients with Relatively Stable from Progressive Idiopathic Pulmonary Fibrosis (IPF)

Oliver Eickelberg, Editor

Abstract

Background

Idiopathic pulmonary fibrosis (IPF) is a progressive, chronic interstitial lung disease that is unresponsive to current therapy and often leads to death. However, the rate of disease progression differs among patients. We hypothesized that comparing the gene expression profiles between patients with stable disease and those in which the disease progressed rapidly will lead to biomarker discovery and contribute to the understanding of disease pathogenesis.

Methodology and Principal Findings

To begin to address this hypothesis, we applied Serial Analysis of Gene Expression (SAGE) to generate lung expression profiles from diagnostic surgical lung biopsies in 6 individuals with relatively stable (or slowly progressive) IPF and 6 individuals with progressive IPF (based on changes in DLCO and FVC over 12 months). Our results indicate that this comprehensive lung IPF SAGE transcriptome is distinct from normal lung tissue and other chronic lung diseases. To identify candidate markers of disease progression, we compared the IPF SAGE profiles in stable and progressive disease, and identified a set of 102 transcripts that were at least 5-fold up regulated and a set of 89 transcripts that were at least 5-fold down regulated in the progressive group (P-value≤0.05). The over expressed genes included surfactant protein A1, two members of the MAPK-EGR-1-HSP70 pathway that regulate cigarette-smoke induced inflammation, and Plunc (palate, lung and nasal epithelium associated), a gene not previously implicated in IPF. Interestingly, 26 of the up regulated genes are also increased in lung adenocarcinomas and have low or no expression in normal lung tissue. More importantly, we defined a SAGE molecular expression signature of 134 transcripts that sufficiently distinguished relatively stable from progressive IPF.

Conclusions

These findings indicate that molecular signatures from lung parenchyma at the time of diagnosis could prove helpful in predicting the likelihood of disease progression or possibly understanding the biological activity of IPF.

Introduction

Idiopathic Pulmonary Fibrosis (IPF) is a chronic progressive disease of unknown etiology that is characterized by irreversible scarring in the lung. IPF is one of a subgroup of the diffuse parenchymal lung diseases (DPLD) of unknown origin, represented by the idiopathic interstitial pneunomias (IIPs). IPF is the most common form of IIP, and pathologically is represented by usual interstitial pneumonia (UIP) [1][3]. While hypotheses have been put forth, varying from chronic inflammation leading to widespread fibrosis to abnormal wound healing and deregulated epithelial cell function [4][9], the basic mechanism of disease pathogenesis remains unknown.

Disease progression is highly variable in IPF. While the 3 to 5 year mortality is 50%, this is quite variable with some patients living up to 10 years following diagnosis [10]. The disease course is also variable, ranging from patients who remain stable for protracted periods of time to others whom experience rapid stepwise progression with accelerated mortality [11][13]. Although predictors of survival [10] and disease progression [14] have included demographic factors, exposures, lung physiology, radiography, and pathology, it remains difficult to predict the prognosis of any one case of IPF. Moreover, none of the prediction models have accounted for differences in molecular features of the pathological process.

Unfortunately, patients generally present in the later stages of disease. And no medical treatment either reverses or slows the progression of IPF. This heterogeneity of disease progression and the lack of available treatment emphasize the importance of early diagnosis, especially with the hope that intervention may be more effective in the early stages of disease. This also underscores the need for biomarkers which not only may predict progression but may contribute to discovery of molecular mechanisms that are involved in disease pathogenesis.

We hypothesized that by comparing the transcriptome of relatively stable and progressive IPF, markers of disease activity would be identified that could lead to biomarker discovery, improved prognostic ability, and further contribute to the understanding of IPF pathogenesis. In this study, we generated the lung expression profiles from pre-treatment, diagnostic surgical lung biopsies using SAGE technology [15] from 6 individuals with relatively stable (or slowly progressive) IPF and compared these profiles to 6 individuals with progressive IPF. In silico analyses of the comprehensive SAGE profiles allowed for the generation of an IPF molecular signature that distinguished relatively stable from progressive patients, and identified genes not previously implicated in IPF. Moreover, the SAGE IPF gene expression profile identified molecular pathways that may be important in disease development and progression.

Results

A summary of the clinical and demographic features are presented in Table 1. The average age was 64.8 years in the progressive group and 66.7 years in the relatively stable group. Both groups included smokers and non-smokers. However, only one female subject was present in the progressive group, whereas 3 were included in the relatively stable group (Table 1). The percent predicted pulmonary function test (PFT) values at baseline and end point for both groups are depicted in Figure 1. The mean of the percent predicted PFT values at baseline are not significantly different between both groups (Table 1). The actual PFT values are depicted in Figure S1. A significant difference between the progressive and the relatively stable group was found for the actual change in DLCO and the change in percent predicted DLCO with a P-value<0.05 based on a Mann-Whitney test. Given that not all samples were collected at equal time intervals between baseline and end point, a time-weighted factor was calculated to assure the correct group assignment. The time-weighted change in % predicted values between the two groups was significantly different with P-values between 2.2E-3 (DLCO) and 8.7E-3 (FVC).

Figure 1
Forced vital capacity (FVC) and carbon monoxide diffusing capacity (DLCO) values.
Table 1
Clinical and demographic variables

IPF SAGE Transcriptome

For an in-depth assessment of the IPF transcriptome, we generated and analyzed 12 IPF SAGE libraries with an average of 79,578 tags per library. A total of 954,932 transcript tags were sequenced of which 168,272 were unique. After removal of linker and repetitive sequences the number was reduced to 168,066. Transcript tags with a raw count of one in the entire IPF transcriptome (singletons) were also removed resulting in 149,291 transcripts. For comparison purposes, the tag counts in each library were normalized to 200,000 tags. We also included 8 other human lung tissue libraries comprising another 500,244 tags that were downloaded from the SAGE Genie (http://cgap.nci.nih.gov/SAGE) or the GEO website (http://ncbi/geo/). All libraries included in this study are described in Table S1. Hierarchical clustering analysis of the 12 IPF and 5 normal lung parenchyma SAGE libraries included in this study demonstrated that IPF samples are distinguishable from normal lung parenchyma (Figure 2A). Three IPF samples are clustered together and are dissimilar from the other 9 IPF samples indicating the possible existence of other subtypes and reflecting the heterogeneity among IPF patients. Though the three samples belong to the relatively stable group (Table S1), it cannot be excluded that this separation might be simply due to normal lung parenchyma present within the surgically removed biopsy sample. We repeated the unsupervised clustering with all 22 SAGE libraries available and noticed that these particular three samples are still closer related to certain normal tissue samples though the separation between IPF and normal lung parenchyma is not as efficient as in the first analysis (Figure S2A).

Figure 2
Analysis of the IPF transcriptome.

Initially, we compared the normal lung parenchyma libraries NB1, NLP-1 and NLP-2 with the 12 IPF SAGE libraries in order to identify genes that are over expressed in IPF and are minimally expressed or absent in normal lung parenchyma. The filter applied for each individual transcript tag selected for a P-value≤0.05 and a fold difference of at least 10; less than 2 counts in the normal libraries; 10 or more counts in the IPF group; and an expression level of at least 5 counts in 50% of all the IPF SAGE libraries, which resulted in 1,121 transcript tags. The tag to gene mapping was performed using SAGE Genie downloads and the tag to gene classifications of the 1,121 transcript tags shows that 80% of the tags can be mapped to well-defined transcripts, 5% match to hypothetical proteins/open reading frames of unknown function and 13% represent unknown transcripts (Figure 2B). A total of 18% of the significantly over expressed transcripts possibly represent novel genes and/or alternative transcripts [16] uniquely expressed in the IPF transcriptome. Careful analysis of the over expressed genes in IPF revealed known genes that have been shown to be highly expressed in IPF like S100 calcium binding protein 2, chemokine CXC ligand 14, several collagens, tenascin, metalloprotease 7, and fibronectin. These results confirm previously published IPF and normal expression profiles comparisons [17][19]. However, our results also indicate that there are other genes over expressed in our SAGE IPF libraries when compared to normal lung with an unknown role in IPF pathogenesis like syndecan 1(SDC1), suppression of tumorigenicity 5 (ST5, regulator of MAPK1/ERK2 kinase) and centaurin delta 2 (CENTD2); all of which have been associated with lung adenocarcinomas. Interestingly, we did not find significantly down regulated genes in IPF compared to normal lung tissue.

Given that the IPF samples used in this study were selected based on clinical variation in disease progression, we applied a more stringent criteria in order to define a clear molecular signature that would distinguish IPF from normal lung parenchyma. We applied the above mentioned criteria and then selected for the expression of at least 5 counts in more than 75% of all the IPF SAGE libraries. This yielded a list of 293 transcripts tags of which 244 matched to well defined genes (Table S2). A T-test analysis showed that the mean expression level of the 293 transcripts in IPF differs significantly from the mean level in normal lung parenchyma (P≤2.7E-31). Furthermore, cluster analysis, including five normal lung SAGE libraries (Table S1), indicated that this signature is sufficient to separate IPF from normal lung by a single-linkage hierarchical algorithm (Figure 2D), a clear improvement in separation when compared with Figure 2A. Interestingly, even when using all 22 SAGE libraries the 293-signature results in a good separation of most IPF from the normal lung parenchyma samples demonstrating the strength of the 293-signature (Figure S2B).

Differentially expressed genes characterizing the progressive and relatively stable disease groups in IPF

After establishing that our newly generated SAGE IPF transcriptome contained sufficient information to distinguish IPF from normal lung parenchyma and other lung diseases, we analyzed the differential expression between progressive and relatively stable IPF. The progressive and relatively stable groups had 446,158 and 508,774 total SAGE tag counts respectively. After applying a filter for a total sum of at least 2 or more tag counts in both groups only 16,089 unique transcript tags were left of which 13,745 were in common between the two groups; 1,268 were only present in the progressive group and 1,076 were only present in the relatively stable group. To identify significant differentially expressed transcript tags that distinguished the two groups, we also selected for a fold difference ≥5, a minimal tag count in the corresponding group ≥5 counts, and a P-value≤0.05 resulting in 243 differentially expressed transcripts. As a final filter, we selected for an expression level of 4 or more counts in at least 50% of the SAGE libraries representing either of the two groups. In this way, we identified 102 transcripts up regulated and 89 down regulated transcripts in the progressive group (Figure 3A, Table 2 and S3). The up regulated genes in the progressive group includes surfactant protein A1 (SFTPA1) and also members of the MAPK-EGR1-HSP70 pathway that regulates cigarette-smoke induced inflammation [20]. Other up regulated genes are ADM (adrenomedullin), CCL2 (chemokine ligand 2), PTPRF (protein tyrosine phosphatase receptor F) and SPP1 (osteopontin). Interestingly, we found 26 genes among the list of 102 up regulated transcripts that are also associated with various cancers like Heat shock 70KDa protein 1A (HSPA1A), Macropain (PSMA7), Ras homolog gene family member B (RHOB), FK506 binding protein 2 (FKBP2) and Plunc (palate, lung and nasal epithelium carcinoma associated). None of the above mentioned genes, with the exception of SFTPA1, have been previously correlated with disease progression in IPF. Other candidate molecular biomarkers, not necessarily previously implicated in IPF pathogenesis, were selected by IPA analysis among the differentially expressed genes in the progressive group and are listed in Table 3. Real-time PCR confirmed the over expression of ADM, Plunc, SPP1, and the down regulation of RTKN2 (rhotekin 2) in a subset of samples (n = 4) used for SAGE library construction representing the progressive group. The values obtained for the relatively stable group (n = 4) was arbitrarily set to one in order to calculate a fold difference (Figure 3B). Both ADM and SPP1 have been previously shown to be up regulated in IPF confirming our results [13], [21], [22]. To examine the cellular distribution of Plunc, we analyzed IPF and normal lung tissue by immunohistochemistry. Plunc was found to be mainly expressed in the secretory/goblet type of bronchial columnar cells. In regions of honeycombing there are bronchial/bronchiolar epithelia (including the secretory type) that are strongly staining. It appears that Plunc is also secreted into the mucus that is filling these cystic spaces (Figure 3C). No Plunc expression was detected in normal lung tissue (Figure 3D).

Figure 3
Differentially expressed genes in the lung parenchyma from the relatively stable and progressive IPF.
Table 2
Top 50 differentially expressed genes in progressive group
Table 3
Candidate Biomarkers for disease progression based on IPA database survey

A molecular signature for disease progression in IPF

To determine if the identified 191 differentially expressed genes associated with rapid progression in IPF (102 up and 89 down regulated) represent a molecular signature, we selected for genes with a P-value<0.05, and analyzed the significance of the difference in mean expression level in both groups and determined that the expression level of 134 of the 191 genes was sufficient to correctly distinguish the progressive from the relatively stable group (Student T-test P-values between 6.5E-3 and 2.4E-5). This expression signature was tested by an unsupervised hierarchical clustering of all IPF lung SAGE libraries used in this study showing a clear distinction between the progressive and relatively stable groups (Figure 4A). Interestingly, a study by Selman and colleagues described an accelerated and slowly progressive variant of IPF [13]. The sample size in the Selman microarray based study is small but offers an opportunity to test our 134-signature in an independent cohort. We found 90 genes (67%) in common with our 134 expression signature that were represented on the custom Affymetrix oligonucleotide microarrays [13]. Cluster analysis using those 90 genes was insufficient to clearly distinguish the accelerated variant from the slow variant (Figure 4B). Analysis of the significance of the difference in mean expression level between the accelerated and slow variant group among the 90 genes tested, demonstrated that only 58 out of the 90 genes were significant (Student T-test P-value of 8.0E-3). This prompted us to repeat the hierarchical clustering using a smaller set of genes and the results show an improved separation of both groups (Figure 4C). It is possible that when using the full progressive IPF signature of 134, the clustering will be even more definitive for the accelerated and slow variant. The clustering ‘behavior’ of the dataset might simply reflect the differences in definition of the accelerated and slow variant between Selman's study and our study. The main distinction being that the slow variant group in Selman's study included subjects with more than 24 months of symptoms while in our study we selected subjects based on their PFT values within a 12 months period following the initial biopsy. Though the preliminary results are promising, the small sample size does not support any formal conclusions regarding the classification strength of the proposed 134-expression signature.

Figure 4
Heat map SAGE molecular signature.

Pathway analysis of the IPF transcriptome and biomarker selection

Ingenuity pathway analysis (IPA) was applied to select for the main canonical pathways represented in the 1,121 transcript list of over expressed genes in the IPF SAGE Transcriptome, using a Fisher's exact test with a P-value threshold of 0.05. These pathways are the IGF-1 signaling, the ERK/MAPK signaling, the protein ubiquitination, the PI13/AKT signaling, the cardiac β-adrenergic signaling, the actin-cytoskeleton signaling, the integrin signaling, and the NRF2-mediated oxidative stress response pathway (Figure 2C). Some of the canonical pathways identified in this study have been previously implicated in IPF [23], [24]. Biomarker analysis using the IPA software identified 33 candidate biomarkers in the 293-IPF SAGE transcript signature. Expression of these genes has been detected in various bodily fluids like blood, bronchoalveolar lavage fluid, plasma/serum, sputum, and lung tissue in various diseases (Table S4). Two genes located in the extracellular space have been detected in sputum as well as in other bodily fluids; complement factor H (CFH) and metallopeptidase inhibitor 1 (TIMP1). CFH is secreted into the bloodstream and has an essential role in the regulation of complement activation, and it acts as an adrenomedullin binding protein [25]. The metallopeptidase inhibitor 1 has been previously detected in interstitial macrophages in human IPF samples [26] and is a key player in the fibrogenic response to bleomycin in C57BL/6 mice [27], [28]. The proteins encoded by the TIMP gene family are natural inhibitors of the matrix metalloproteinases (MMPs), a group of peptidases involved in degradation of the extracellular matrix. Though it is unmistakable that MMPs play an important role in IPF pathogenesis the exact mechanism how TIMP1 is activated is still unresolved [29]. Recently it has been shown that the over expression of TIMP1 was an independent prognostic marker in patients with non-small cell lung carcinoma [30].

Gene Ontology, Pathway and Network analysis of significant differentially expressed genes among patients with progressive IPF

The identified differentially expressed genes in the progressive group offer insight to the possible pathways and cellular processes that might be involved in IPF progression. For a systematic and unbiased analysis we used the Ingenuity Pathway Analysis program to explore the list of differentially expressed genes. Figure 5A depicts the top canonical pathways that are significantly associated with our dataset and are highly represented in either the up regulated or down regulated list of genes. The significance is determined by a high ratio (or percentage of genes in pathway found in the gene list) and by a high negative logarithm of the P-value; indicating that the pathway is significantly associated with the data and that a large portion of the corresponding canonical pathway may be affected.

Figure 5
Biological differences between progressive and relatively stable disease groups in IPF.

The most prominent pathways in the progressive group (up regulated list of genes) are integrin signaling, regulation of actin-based mobility (cell migration), glycosaminoglycan (mucopolysaccharides) degradation, (LXR/VDR) retinoic X receptor (RXR) activation and chemokine signaling; suggesting an important role for integrin signaling, immune function, bone metabolism and vitamin D3/RXR activation in disease progression. The glycosaminoglycan degradation pathway is a process that seems to be significantly associated with increased disease progression (Figure 5). Glycosaminoglycans (mucopolysaccharides) are carbohydrate molecules or complexes of protein and carbohydrate that form the ground substance of connective tissue. One of these carbohydrates is hyaluronic acid. All substances passing to and from cells must pass through the ground substance. Variations in its composition and viscosity may therefore have an important influence on the exchange of materials between tissue cells and the blood. Another finding is the association of the nuclear RXR (retinoic X receptor)/VDR (vitamin D receptor) activation pathway with disease progression. The nuclear RXR can regulate transcription by forming complexes with other nuclear factors and can be activated by retinoid acid. This pathway has until now been unexplored in pulmonary fibrosis, however it is important to note that VDR-deficient mice failed to develop experimental allergic asthma, suggesting an important role for the vitamin D endocrine system in the generation of Th2-driven inflammation in the lung [31].

Another clear distinction can be seen between the two groups of patients with IPF when analyzing the percentage of genes associated with various molecular and cellular functions (Figure 5B). During disease progression, an increase is detected in genes associated with cellular growth and proliferation, cellular compromise (stress), cell signaling, cell morphology, cell death, cell cycle and cell movement; molecular functions usually associated with cancer. These results were confirmed by performing a Network analysis. This analysis identified five partially overlapping networks (Table 4) in our list of differentially expressed genes, highlighting similar molecular functions associated with the corresponding networks. The overlap between Network 1 and 3 is depicted in Figure 6 in which the central role for genes like p38 MAPK, NFκB, HSP70, EGR1, CCL2 and Ras homolog can be easily detected.

Figure 6
Network analysis.
Table 4
Functional network analysis of differentially expressed genes in the progressive group

Discussion

Our results indicate that molecular signatures of gene expression appear to be useful in the identification of the presence and predictive of the activity of IPF. We have shown that molecular signatures can distinguish IPF from both normal lung and other chronic lung diseases. Moreover, our findings suggest that molecular signatures from lung parenchyma at the time of diagnosis appear to be helpful in predicting disease progression and may prove valuable in predicting the activity of IPF.

Genome-wide analyses of gene expression have facilitated the identification of gene expression patterns or signatures revealing the complexity of human cancer. Most of the work using large scale gene expression data has been focused on discovering gene expression profiles that can lead to a better understanding of tumor development and proliferation. The strength of gene expression analysis has been shown by the ability to identify new cancer subtypes and predict clinical outcome [32]. A prognostic gene expression signature has been proposed for survival in early-stage lung cancer [33] and was recently validated in a large, training-testing, multi-site, blinded study [34]. Gene expression profiling has also allowed the prediction of breast cancer recurrence [35] which has ultimately lead to the development of the Mammaprint, a clinical test based on a 70-genes signature that predicts the risk of metastasis in breast cancer patients [36].

While gene expression profiling has proven to be a powerful tool for the identification of specific gene patterns and pathways associated with certain types of human cancers, our findings suggest that these molecular signatures may also prove useful in understanding complex lung diseases, like IPF. The increase of the protein ubiquitination pathway could be associated with an increase of apoptosis of epithelial cells but has not been extensively studied in IPF. There are few studies implicating the PI3/AKT signaling pathway in IPF. Bleomycin-induced pulmonary fibrosis studies in mice have shown activation not only of TGF-beta but also phosphatidylinositol 3-kinase (PI3K) and protein kinase B via a Semaphorin (SEMA) 7A-dependent mechanisms, and PKB/AKT inhibition diminished TGF-beta-induced fibrosis [37]. SEMA 7A was not found to be differentially expressed in our dataset though many family members and its receptor intergrin beta are involved in the transcriptional profile of IPF. It has been shown that collagen accumulation can be reduced by the administration of PI3K inhibitors [38], implying that the PI3K/AKT pathway might play an important role in pulmonary fibrosis. Deregulation of the PI3K/PTEN/AKT pathway is one of the most common altered pathways in human malignancy. Significant advances have been made in the understanding of the AKT signaling pathway in oncogenesis and in the development of small molecule inhibitors. Whether this pathway could be targeted in human pulmonary fibrosis remains to be established and could offer new treatment opportunities. The integrin signaling pathway is anticipated to be associated with pulmonary fibrosis since integrins are the primary extracellular matrix (ECM) receptors mediating ECM remodeling [39]. In response to changes in the ECM, integrin signaling also regulates many other interrelated cellular processes like proliferation, survival, cell migration and invasion. However, further studies in larger cohorts, using either, real-time PCR, a customized SAGE signature array or tissue-array, are needed to validate the importance and relevance of these findings for early diagnosis and disease management.

Our results may have a significant impact in the development of early biomarkers for IPF. Identifying biomarkers that could reduce the time to diagnosis may create a window of opportunity for therapeutic intervention, especially in a disease like IPF where the diagnosis is often delayed. While our transcriptional signature for disease progression was developed using lung biopsy samples, 47 of the 134 gene products that were associated with clinical progression have been detected in body fluids in various diseases (such as blood, plasma/serum, bronchoalveolar lavage fluid or sputum) according to Ingenuity Pathway Analysis software. Although for many of these 47 genes the biological function and role in IPF pathogenesis is unknown, these genes and gene products could potentially serve as biomarkers for this disease. Genes like ADM (adrenomedullin), CCL2 (chemokine ligand 2), PTPRF (protein tyrosine phosphatase receptor F) and SPP1 (osteopontin) play a role in the migration of smooth muscle cells and cell proliferation and/or invasion implying a potentially more important role of these processes in disease progression. The chemokine CCL2 have been previously detected in metaplastic epithelial cells and vascular endothelial cells of IPF cases and it was proposed that CCL2 may play a key role in the irreversible progression of IPF [40]. In addition, a decrease of lung fibrosis was detected in CCL2 null mice when exposed to bleomycin [41], [42]. What's more, CCL2 has been shown to be elevated in human bronchoalveolar lavage fluid from patients with IPF [42], [43]. The protein was measured in plasma as well and it was shown that there was no significant difference between IPF patients and normal controls [44]. Our results indicate that CCL2 is a potential marker of disease progression in IPF. Whether the plasma levels of CCL2 correlates with disease progression remains unknown [44]. Interestingly, SPP1 have been localized to the alveolar epithelial cells in IPF lungs, was also significantly elevated in bronchoalveolar lavage fluid from IPF patients [21] and, has been detected in plasma from patients with idiopathic interstitial pneumonia [45]. Previous studies have shown that SPP1 null mice clearly develop less fibrosis when exposed to bleomycin. It was suggested that SPP1 is secreted by the epithelial cells and has a profibrotic effect [21].

Some of these potential biomarkers genes have been implicated in human cancers. Heat shock 70KDa protein 1A (HSPA1A) is up regulated in brain, lung, and liver cancer. Macropain (PSMA7) is increased in brain, breast, and stomach cancer, and plays an important role in colorectal cancer progression providing a unique target for drug development. The Ras homolog gene family member B (RHOB), a Rho GTPase, is up regulated in brain and breast cancer though down regulated in lung neoplasms. These GTPases are crucial regulators of the actin cytoskeleton and also play an important role in membrane trafficking. Associated with lung cancer are FK506 binding protein 2 (FKBP2) and Plunc (palate, lung and nasal epithelium carcinoma associated). The latter gene belongs to the PLUNC family of proteins postulated to play a role in innate immune response and is uniquely expressed in the upper respiratory tract. Studies in cystic fibrosis have shown a significant elevation of Plunc expression in diseased airways [46], [47]. As Plunc can be detected in sputum [48] and bronchoalveolar lavage fluid, it appears to be an ideal candidate biomarker for disease progression in IPF. SAGE and microarray analysis have recently indicated that Plunc is a novel marker that distinguishes gastric hepatoid adenocarcinoma from primary hepatocellular carcinoma [49].

The extensive SAGE IPF transcriptome presented in this investigation demonstrates the complexity and scope of the biological activity involved in IPF. Some of the pathways identified by SAGE profiling have not been previously associated with IPF. Network and pathway analyses have also shown that various signaling pathways can interact or even partially overlap with each other, thereby suggesting that IPF may be the result of multiple, consecutive (or interactive) biological events, possibly triggered by environmental stimuli. However, despite this biological complexity, our findings clearly illustrate that molecular signatures of gene expression in IPF may prove helpful in predicting disease progression among those with IPF. Molecular and cellular functions like cell proliferation, migration, invasion and cell morphology appear to be over represented in the more progressive IPF group; a striking similarity with human cancers. The association with disease progression and the identifiable heterogeneity seen within samples emphasize the importance and the need for an extensive molecular classification of IPF and other forms of interstitial lung disease. The recognition that IPF may have different subtypes that can be distinguished by their molecular patterns could identify novel therapeutic targets and personalize the clinical approach to this complex group of diseases.

Materials and Methods

Study Population

Lung tissue was obtained from patients with IPF who had a definitive diagnosis based on UIP pathology from a surgical lung biopsy [1], [2]. Flash frozen surgical lung biopsy specimens, were obtained from 12 patients with IPF who were undergoing initial diagnostic evaluation and were not being treated for their IPF. The protocol was approved by the Institutional Review Board from National Jewish Health and written informed consent was obtained where required. Further processing of the frozen samples was performed at the National Institutes of Health (NIH). This research activity was approved by the Office of Human Subjects Research at the NIH. The 12 specimens were specifically obtained from two groups of patients; progressive IPF or relatively stable IPF. In the progressive group (n = 6), the percent predicted forced vital capacity (FVC) and the percent predicted diffusing capacity of carbon monoxide (DLCO) declined significantly up to 12 months following biopsy (respectively ≥10% and ≥15% ). The relatively stable group (n = 6) had a relatively uneventful eventless course over the 12 months following surgical lung biopsy with a decline in percent predicted FVC<10% or a decline in percent predicted DLCO<15%. No patient in either group received treatment for IPF prior to lung biopsy.

RNA isolation and SAGE library construction

Total RNA was extracted from frozen lung tissue using the RNAgents total RNA isolation system (Promega, Madison, WI, USA). The quality of total RNA was analyzed using the RNA 6000 Nano Labchip kit on a 2100 BioAnalyzer (Agilent Technologies, Santa Clara, California). On average 1 to 5 µg of total RNA as determined by the Ribo-Green RNA Quantification kit (Molecular Probes, Eugene, OR, USA) was used to construct SAGE libraries from 12 IPF samples using Nla III as the anchoring enzyme and BsmF I as the tagging enzyme according to a micro-SAGE protocol [50]. The SAGE library clones were arrayed and inserts were purified and sequenced at Agencourt Bioscience Corporation. The SAGE 2000 software version 4.5 (available at http://www.sagenet.org) was used to extract SAGE tags from the original sequence files, remove duplicate ditags, remove linker sequences, remove one base pair variations of linker sequences and tabulate the occurrence of each tag. Tag sequences, tag counts and gene associations were stored in a Microsoft Access relational database for subsequent analysis. The complete SAGE IPF dataset have been deposited in the GEO database (GSE11665). Other SAGE profiles used in this study were downloaded from the GEO (http://ncbi/geo/) or the SAGE Genie [51] website and are depicted in Table S1. P-values for differentially expressed transcripts were calculated according to the sequence odds ratio and significant test (http://cgap.nci.nih.gov/SAGE). Similar results were obtained when using the SAGE software Monte Carlo approach or the significant test available as part of the DiscoverySpace application [52].

Hierarchical Clustering

The open source clustering software Cluster 3.0 was used for gene expression data analysis. Cluster 3.0 is an enhanced version of Cluster [53] built for the Microsoft Windows platform. The Cluster program is based on a modified Pearson correlation, and was applied to the normalized SAGE data. Starting with a dataset of 149,291 unique transcript tags, a second filter selecting for a project total (sum expression level all 22 libraries included in this study) of ≥10 counts was applied reducing the amount of transcript tags to 23, 649. This SAGE dataset was then filtered for at least five observations across the 22 libraries, with an absolute value ≥2 and a maximum minus minimum value ≥2. These settings produced a data set of 11,467 transcript tags. This data set was subsequently adjusted by performing median centering and normalization. This procedure resulted in a median-polished (i.e. all row and column-wise median values are close to zero) and normal (i.e. all row and column magnitudes are close to 1.0) dataset. Next, the dataset was analyzed by applying a correlation centered complete linkage clustering, which assembles the dataset into a tree. Items joined by short branches are very similar, whereas longer branches represent decreasing similarities. Results were displayed with the TreeView program [53]. For confirmation of the results and subsequent clustering of small datasets or microarray data, the MultiExperiment Viewer (Version 4.1, January 18th, 2008) was used [54] as well as a modified version specific for SAGE data analysis [55].

Real-time PCR

Equal amounts of total RNA (5 µg) were used in a 20 µl cDNA synthesis reaction primed with oligo-dT (Superscript II; Invitrogen, Carlsbad, CA, USA). Control reactions were prepared in parallel without reverse transcriptase. Prior to cDNA synthesis, residual genomic DNA was removed from total RNA with a DNase I treatment (DNA-free; Ambion, Austin, TX, USA). Quantitative PCR was performed with a 7900TH Fast Real-Time PCR system (Applied Biosystems, Foster City, CA, USA) using SYBR-Green. PCR reactions were performed in triplicate, and the threshold cycle numbers were averaged. Gene expression levels were normalized to ACTB (actin, beta), and PGK1 (phosphoglycerate kinase 1). The relative expression levels were calculated in comparison to the levels in total RNA from normal lung (Ambion, Austin, TX) according to the Comparative Ct method in which the relative expression equals 2-ΔΔCt. PCR primers were designed using the Primer 3 interface (http://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi).

Immunohistochemistry

Five micron sections of paraffin-embedded tissue were deparaffinized using 5 minute incubations in Xylene followed by 100% and 95% ethanol. Slides were rinsed in distilled water and then incubated for 30 minutes in a 3% aqueous solution of hydrogen peroxide at room temperature. Slides were rinsed in distilled water and then transferred to a Citrate buffer at pH 6.0 for antigen retrieval in a pressure cooker at 125°C for 5 minutes followed by a gradual cooling back to room temperature. Slides were rinsed in distilled water, and then washed for 5 minutes in Tris-buffered saline with 0.1% Tween (TBS-T). Washed slides were incubated in Serum-free protein blocking buffer (DAKO, Carpenteria, CA) for 30 minutes. Blocking buffer was removed without washing and 100 ul of biotinylated primary antibody (#BAF1897, 1:50 dilution in TBS-T, R&D Systems, Minneapolis, MN) was applied to each slide. Slides were incubated in primary antibody solution overnight at 4°C. Slides were washed for 5 minutes in TBS-T. Biotin-labeled antibody detection was carried out using the Vectastain RTU ARB reagent (Vector Laboratories, Burlingame, CA) following the manufactures instructions. Staining was visualized with a 5 minute room temperature incubation using DAB chromagen/buffer (DAKO, Capenteria, CA). The color reaction was stopped in distilled water and slides were counterstained for 3 minutes in hemotoxylin, dehydrated in graduated ethanol's and cleared using Xylene prior to cover slipping. All slides were scanned using the Aperio ScanScope XT (Aperio, Vista, CA).

Gene Ontology, Biomarker selection and Functional Network Analysis

Data were analyzed through the use of Ingenuity Pathways Analysis (Ingenuity Systems®, www.ingenuity.com). Ingenuity Pathway Analysis (IPA) is a web-based application that enables the visualization, discovery and analysis of molecular interaction networks within gene expression profiles. All generated gene lists and corresponding expression levels, represented as the log2 ratios, were uploaded within the IPA database for further analysis. Both gene symbols and gene bank accession numbers were used with no apparent differences in results. These genes, called focus genes, were overlaid onto a global molecular network developed from information contained in the Ingenuity knowledge base. The IPA knowledge base represents a proprietary ontology of over 600,000 classes of biologic objects spanning genes, proteins, cells and cell components, anatomy, molecular and cellular processes, and small molecules. Networks of the focus genes were then algorithmically generated based on their connectivity. The Functional Analysis of a network identified the biological functions and/or diseases that were most significant to the genes in the network. The network genes associated with biological functions and/or diseases in the Ingenuity knowledge base were considered for the analysis. Fischer's exact test was used to calculate a P-value determining the probability that each biological function and/or disease assigned to that network is due to chance alone. Canonical Pathways Analysis identified the pathways from the Ingenuity Pathways Analysis library of canonical pathways that were most significant to the dataset. The significance of the association between the dataset and the canonical pathway was measured in 2 ways: 1) a ratio of the number of genes from the dataset that map to the pathway divided by the total number of molecules that exist in the canonical pathway is displayed; 2) Fischer's exact test was used to calculate a P-value. Biomarker Analysis allows the identification of the most relevant molecular biomarker candidates from a dataset based on contextual information such as mechanistic association with a disease or detection in bodily fluids.

Supporting Information

Figure S1

Lung function test values for individual samples included in this study. Actual DLCO (A) or FVC (B) values are depicted in a scatter dot plot with mean and standard deviation. The progressive group is represented in red (dots) and the relatively stable group in blue (squares).

(0.45 MB TIF)

Figure S2

Unsupervised Hierarchical clustering analysis of all 22 SAGE libraries based on 11,467 transcripts (A) and based on the 293-gene expression signature (B).

(1.02 MB TIF)

Table S1

Summary SAGE libraries included in this study

(0.03 MB PDF)

Table S2

Genes over expressed in IPF when compared to normal lung.

(0.07 MB XLS)

Table S3

Differentially expressed genes in progressive group

(0.06 MB XLS)

Table S4

Candidate IPF biomarkers.

(0.02 MB XLS)

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: This research was supported by the intramural research programs at the National Heart, Lung, and Blood Institute, and the National Institute of Environmental Health Sciences. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of manuscript.

References

1. ATS Dyspnea. Mechanisms, Assessment, and Management: A Consensus Statement. Am J Respir Crit Care Med. 1999;159:321–340. [PubMed]
2. ATS American Thoracic Society/European Respiratory Society International Multidisciplinary Consensus Classification of the Idiopathic Interstitial Pneumonias. This Joint Statement of the American Thoracic Society (ATS), and the European Respiratory Society (ERS) was adopted by the ATS Board of Directors, June 2001 and by The ERS Executive Committee, June 2001. Am J Respir Crit Care Med. 2002;165:277–304. [PubMed]
3. Steele MP, Speer MC, Loyd JE, Brown KK, Herron A, et al. Clinical and Pathologic Features of Familial Interstitial Pneumonia. Am J Respir Crit Care Med. 2005;172:1146–1152. [PMC free article] [PubMed]
4. Gharaee-Kermani M, Gyetko M, Hu B, Phan S. New Insights into the Pathogenesis and Treatment of Idiopathic Pulmonary Fibrosis: A Potential Role for Stem Cells in the Lung Parenchyma and Implications for Therapy. Pharmaceutical Research. 2007;24:819–841. [PubMed]
5. Maher TM, Wells AU, Laurent GJ. Idiopathic pulmonary fibrosis: multiple causes and multiple mechanisms? Eur Respir J. 2007;30:835–839. [PubMed]
6. Selman M, Pardo A. Role of Epithelial Cells in Idiopathic Pulmonary Fibrosis: From Innocent Targets to Serial Killers. Proc Am Thorac Soc. 2006;3:364–372. [PubMed]
7. Strieter RM. Pathogenesis and Natural History of Usual Interstitial Pneumonia: The Whole Story or the Last Chapter of a Long Novel. Chest. 2005;128:526S–532. [PubMed]
8. Thannickal VJ, Horowitz JC. Evolving Concepts of Apoptosis in Idiopathic Pulmonary Fibrosis. Proc Am Thorac Soc. 2006;3:350–356. [PMC free article] [PubMed]
9. Willis BC, Liebler JM, Luby-Phelps K, Nicholson AG, Crandall ED, et al. Induction of Epithelial-Mesenchymal Transition in Alveolar Epithelial Cells by Transforming Growth Factor-{beta}1: Potential Role in Idiopathic Pulmonary Fibrosis. Am J Pathol. 2005;166:1321–1332. [PMC free article] [PubMed]
10. King TEJ, Tooze JA, Schwarz MI, Brown KR, Cherniack RM. Predicting Survival in Idiopathic Pulmonary Fibrosis. Scoring System and Survival Model. Am J Respir Crit Care Med. 2001;164:1171–1181. [PubMed]
11. Martinez FJ, Safrin S, Weycker D, Starko KM, Bradford WZ, et al. The Clinical Course of Patients with Idiopathic Pulmonary Fibrosis. Ann Intern Med. 2005;142:963–967. [PubMed]
12. Kim DS, Collard HR, King TE., Jr Classification and Natural History of the Idiopathic Interstitial Pneumonias. Proc Am Thorac Soc. 2006;3:285–292. [PMC free article] [PubMed]
13. Selman M, Carrillo G, Estrada A, Mejia M, Becerril C, et al. Accelerated Variant of Idiopathic Pulmonary Fibrosis: Clinical Behavior and Gene Expression Pattern. PLoS ONE. 2007;2:e482. [PMC free article] [PubMed]
14. Schwartz DA, Van Fossen DS, Davis CS, Helmers RA, Dayton CS, et al. Determinants of progression in idiopathic pulmonary fibrosis. Am J Respir Crit Care Med. 1994;149:444–449. [PubMed]
15. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW. Serial analysis of gene expression. Science. 1995;270:484–487. [PubMed]
16. Chen J, Sun M, Lee S, Zhou G, Rowley JD, et al. Identifying novel transcripts and novel genes in the human genome by using novel SAGE tags. Proceedings of the National Academy of Sciences. 2002;99:12257–12262. [PMC free article] [PubMed]
17. Kaminski Microarray analysis of idiopathic pulmonary fibrosis. Am J Respir Cell Mol Biol. 2003;29:S32–S36. [PubMed]
18. Kaminski N, Rosas IO. Gene Expression Profiling as a Window into Idiopathic Pulmonary Fibrosis Pathogenesis: Can We Identify the Right Target Genes? Proc Am Thorac Soc. 2006;3:339–344. [PMC free article] [PubMed]
19. Yang IV, Burch LH, Steele MP, Savov JD, Hollingsworth JW, et al. Gene Expression Profiling of Familial and Sporadic Interstitial Pneumonia. Am J Respir Crit Care Med. 2007;175:45–54. [PMC free article] [PubMed]
20. Li C-J, Ning W, Matthay MA, Feghali-Bostwick CA, Choi AMK. MAPK pathway mediates EGR-1-HSP70-dependent cigarette smoke-induced chemokine production. Am J Physiol Lung Cell Mol Physiol. 2007;292:L1297–1303. [PubMed]
21. Pardo A, Gibson K, Cisneros J, Richards TJ, Yang Y, et al. Up-Regulation and Profibrotic Role of Osteopontin in Human Idiopathic Pulmonary Fibrosis. PLoS Medicine. 2005;2:e251. [PMC free article] [PubMed]
22. Vizza CD, Letizia C, Sciomer S, Naeije R, Della Rocca G, et al. Increased plasma levels of adrenomedullin, a vasoactive peptide, in patients with end-stage pulmonary disease. Regulatory Peptides. 2005;124:187–193. [PubMed]
23. Hetzel M, Bachem M, Anders D, Trischler G, Faehling M. Different Effects of Growth Factors on Proliferation and Matrix Production of Normal and Fibrotic Human Lung Fibroblasts. Lung. 2005;183:225–237. [PubMed]
24. Walters DM, Cho HY, Kleeberger SR. Oxidative Stress and Antioxidants in the Pathogenesis of Pulmonary Fibrosis: A Potential Role for Nrf2. Antioxidants & Redox Signaling. 2008;10:321–332. [PubMed]
25. Pío R, Elsasser TH, Martínez A, Cuttitta F. Identification, characterization, and physiological actions of factor H as an adrenomedullin binding protein present in human plasma. Microscopy Research and Technique. 2002;57:23–27. [PubMed]
26. Selman M, Ruiz V, Cabrera S, Segura L, Ramirez R, et al. TIMP-1, -2, -3, and -4 in idiopathic pulmonary fibrosis. A prevailing nondegradative lung microenvironment? Am J Physiol Lung Cell Mol Physiol. 2000;279:L562–574. [PubMed]
27. Swiderski R, Dencoff J, Floerchinger C, Shapiro S, Hunninghake G. Differential expression of extracellular matrix remodeling genes in a murine model of bleomycin-induced pulmonary fibrosis. Am J Pathol. 1998;152:821–828. [PMC free article] [PubMed]
28. Fattman CL, Gambelli F, Hoyle G, Pitt BR, Ortiz LA. Epithelial expression of TIMP-1 does not alter sensitivity to bleomycin-induced lung injury in C57BL/6 mice. Am J Physiol Lung Cell Mol Physiol. 2008;294:L572–581. [PubMed]
29. Pardo A, Selman M. Matrix Metalloproteases in Aberrant Fibrotic Tissue Remodeling. Proc Am Thorac Soc. 2006;3:383–388. [PubMed]
30. Gouyer V, Conti M, Devos P, Zerimech F, Copin M-C, et al. Tissue inhibitor of metalloproteinase 1 is an independent predictor of prognosis in patients with nonsmall cell lung carcinoma who undergo resection with curative intent. Cancer. 2005;103:1676–1684. [PubMed]
31. Wittke A, Weaver V, Mahon BD, August A, Cantorna MT. Vitamin D Receptor-Deficient Mice Fail to Develop Experimental Allergic Asthma. J Immunol. 2004;173:3432–3436. [PubMed]
32. Nevins JR, Potti A. Mining gene expression profiles: expression signatures as cancer phenotypes. Nat Rev Genet. 2007;8:601–609. [PubMed]
33. Beer DG, Kardia SLR, Huang C-C, Giordano TJ, Levin AM, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Med. 2002;8:816–824. [PubMed]
34. Shedden K, Taylor JM, Enkemann SA, Tsao M-S, Yeatman TJ, et al. Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nature Med. 2008;14:822–827. [PMC free article] [PubMed]
35. van 't Veer L, Dai H, van de Vijver M, He Y, Hart A, et al. Expression profiling predicts outcome in breast cancer. Breast Cancer Res. 2003;5:57–58. [PMC free article] [PubMed]
36. Glas A, Floore A, Delahaye L, Witteveen A, Pover R, et al. Converting a breast cancer microarray signature into a high-throughput diagnostic test. BMC Genomics. 2006;7:278. [PMC free article] [PubMed]
37. Kang H-R, Lee CG, Homer RJ, Elias JA. Semaphorin 7A plays a critical role in TGF-beta1-induced pulmonary fibrosis. J Exp Med. 2007;204:1083–1093. [PMC free article] [PubMed]
38. Lee KS, Park SJ, Kim SR, Min KH, Lee KY, et al. Inhibition of VEGF blocks TGF-beta1 production through a PI3K/Akt signalling pathway. Eur Respir J. 2008;31:523–531. [PubMed]
39. DeMali KA, Wennerberg K, Burridge K. Integrin signaling to the actin cytoskeleton. Current Opinion in Cell Biology. 2003;15:572–582. [PubMed]
40. Iyonaga K, Takeya M, Saita N, Sakamoto O, Yoshimura T, et al. Monocyte chemoattractant protein-1 in idiopathic pulmonary fibrosis and other interstitial lung diseases. Hum Pathol. 1994;25:455–463. [PubMed]
41. Gharaee-Kermani M, McCullumsmith R, Charo I, Kunkel S, Phan S. CC-chemokine receptor 2 required for bleomycin-induced pulmonary fibrosis. Cytokine. 2003;24:266–276. [PubMed]
42. Baran CP, Opalek JM, McMaken S, Newland CA, O'Brien JM, Jr, et al. Important Roles for Macrophage Colony-stimulating Factor, CC Chemokine Ligand 2, and Mononuclear Phagocytes in the Pathogenesis of Pulmonary Fibrosis. Am J Respir Crit Care Med. 2007;176:78–89. [PMC free article] [PubMed]
43. Capelli A, Di Stefano A, Gnemmi I, Donner CF. CCR5 expression and CC chemokine levels in idiopathic pulmonary fibrosis. Eur Respir J. 2005;25:701–707. [PubMed]
44. Rosas IO, Richards TJ, Konishi K, Zhang Y, Gibson K, et al. MMP1 and MMP7 as Potential Peripheral Blood Biomarkers in Idiopathic Pulmonary Fibrosis. PLoS Medicine. 2008;5:e93. [PMC free article] [PubMed]
45. Kadota JMS, Mito K, Mukae H, Yoshioka S, Kawakami K, Koguchi Y, Fukushima K, Kon S, Kohno S, Saito A, Uede T, Nasu M. High plasma concentrations of osteopontin in patients with interstitial pneumonia. Respir Med. 2005;99:111–117. [PubMed]
46. Bingle CD, Craven JC. PLUNC: A novel family of candidate host defence proteins expressed in the upper airways and nasopharynx. Hum Mol Genet. 2002;11:937–943. [PubMed]
47. Bingle L, Barnes F, Cross S, Rassl D, Wallace W, et al. Differential epithelial expression of the putative innate immune molecule SPLUNC1 in Cystic Fibrosis. Respiratory Research. 2007;8:79. [PMC free article] [PubMed]
48. Di Y-P, Harper R, Zhao Y, Pahlavan N, Finkbeiner W, et al. Molecular Cloning and Characterization of spurt, a Human Novel Gene That Is Retinoic Acid-inducible and Encodes a Secretory Protein Specific in Upper Respiratory Tracts. J Biol Chem. 2003;278:1165–1173. [PubMed]
49. Sentani K, Oue N, Sakamoto N, Arihiro K, Aoyagi K, et al. Gene expression profiling with microarray and SAGE identifies PLUNC as a marker for hepatoid adenocarcinoma of the stomach. Mod Pathol. 2008;21:464–475. [PubMed]
50. st Croix B, Rago C, Velculescu V, Traverso G, Romans K, et al. Genes Expressed in Human Tumor Endothelium. Science. 2000;289:1197–1202. [PubMed]
51. Boon K, Osorio EC, Greenhut SF, Schaefer CF, Shoemaker J, et al. An anatomy of normal and malignant gene expression. Proceedings of the National Academy of Sciences. 2002;99:11287–11292. [PMC free article] [PubMed]
52. Robertson N, Oveisi-Fordorei M, Zuyderduyn S, Varhol R, Fjell C, et al. DiscoverySpace: an interactive data analysis application. Genome Biology. 2007;8:R6. [PMC free article] [PubMed]
53. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America. 1998;95:14863–14868. [PMC free article] [PubMed]
54. Saeed AI, Bhagabati NK, Braisted JC, Liang W, Sharov V, et al. Methods in Enzymology. Academic Press; 2006. TM4 Microarray Software Suite. pp. 134–193. [PubMed]
55. Wang H, Zheng H, Azuaje F. Poisson-Based Self-Organizing Feature Maps and Hierarchical Clustering for Serial Analysis of Gene Expression Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2007;4:163–175. [PubMed]

Articles from PLoS ONE are provided here courtesy of Public Library of Science
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...