Radiomic analysis of lung cancer for the assessment of patient prognosis and intratumor heterogeneity

Objective To determine whether the radiomic features of lung lesions on computed tomography correlate with overall survival in lung cancer patients. Materials and Methods This was a retrospective study involving 101 consecutive patients with malignant neoplasms confirmed by biopsy or surgery. On computed tomography images, the lesions were submitted to semi-automated segmentation and were characterized on the basis of 2,465 radiomic variables. The prognostic assessment was based on Kaplan-Meier analysis and log-rank tests, according to the median value of the radiomic variables. Results Of the 101 patients evaluated, 28 died (16 dying from lung cancer), and 73 were censored, with a mean overall survival time of 1,819.4 days (95% confidence interval [95% CI]: 1,481.2-2,157.5). One radiomic feature (the mean of the Fourier transform) presented a difference on Kaplan-Meier curves (p < 0.05). A high-risk group of patients was identified on the basis of high values for the mean of the Fourier transform. In that group, the mean survival time was 1,465.4 days (95% CI: 985.2-1,945.6), with a hazard ratio of 2.12 (95% CI: 1.01-4.48). We also identified a low-risk group, in which the mean of the Fourier transform was low (mean survival time of 2,164.8 days; 95% CI: 1,745.4-2,584.1). Conclusion A radiomic signature based on the Fourier transform correlates with overall survival, representing a prognostic biomarker for risk stratification in patients with lung cancer.


INTRODUCTION
Lung cancer is the leading cause of cancer-related death worldwide, accounting for one in five deaths (1) .
Defining the prognosis of lung cancer is a major challenge, because it can vary dramatically depending on the tumor stage at diagnosis (2) . The choice of treatment for a malignant tumor is made primarily on the basis of the international neoplasm staging system, also known as the tumor-node-metastasis (TNM) system (3) . However, studies have shown that other clinical factors, such as tumor histological type and the presence of specific genetic mutations, can also affect the prognosis, the clinical decision-making process, and the treatment (4,5) .
Computed tomography (CT) is the imaging method most widely used for diagnosing and staging lung cancer (6) . In routine clinical practice, lung cancer is assessed on the basis of tumor size, signs of invasion into adjacent structures, lymph node involvement, and lesions suggestive of distant metastases. However, as other studies have shown (2,7) , in addition to the TNM staging, other tumor imaging features can also affect prognosis and the therapeutic decision-making process. Those imaging features usually relate to the shape and attenuation pattern of the lesions (such as heterogeneous enhancement, enhancement intensity, spiculated contours, and two-dimensional diameter on the axial plane) and are evaluated in a subjective, qualitative, or semi-quantitative manner (8,9) .
Radiomics, on the other hand, has been described as a promising, quantitative, reproducible tool for the characterization of medical images (10,11) . Simply put, radiomics performs a massive extraction of quantitative imaging variables and then a computer analysis of those variables, combining them with clinical and biochemical data related not only to diagnosis, but also to clinical outcome, histological data, and genetic mutations, increasing the power of biomarkers and decision support systems (12)(13)(14) . Radiomics is also able to quantify the spatial complexity of tumors and to identify tumor heterogeneity, which is the presence of multiple histological and genetic subregions within a tumor, a feature that can be related to disease progression and treatment resistance (15,16) . In view of recent advances in targeted therapies and immunotherapies, it is now imperative to carry out comprehensive and individualized assessments of neoplasms, and radiomics can do this in a noninvasive, rapid, low-cost manner in routine clinical practice (17,18) .
The objective of the present study was to determine whether the radiomic analysis of lung cancer lesions on CT images correlates with prognosis and overall survival in patients with lung cancer.

MATERIALS AND METHODS Patients
This was a retrospective study. The study was approved by the research ethics committee of our institution. Because of the retrospective nature of the study, the requirement for written informed consent was waived. The initial sample included 126 consecutive patients with lesions consistent with lung cancer, mainly pulmonary nodules, confirmed by histology or surgery, who were referred for further investigation and diagnosis after a multidisciplinary discussion. Patients were diagnosed and treated between 2010 and 2017 at one of the hospitals operated by our institution. Of those 126 cases, 25 were excluded from the analysis: 19 because the standard CT protocol for administration of intravenous iodinated contrast medium was not followed (which affected the image characterization process) or because the images showed significant artifacts; four because there were other opacities adjacent to the tumor (which affected the segmentation process); and two because not all of the clinical data were available. The 101 cases included had diagnosticquality contrast-enhanced CT images and all the necessary clinical data available for the analysis. The clinical and pathological data were obtained from the electronic medical records of the patients (Table 1).

CT image acquisition
Before any diagnostic or therapeutic intervention, the patients underwent CT in a 16-slice scanner (Brilliance Big Bore; Philips Healthcare, Eindhoven, Netherlands) or in a 128-slice scanner (Aquilion Prime; Toshiba Medical Systems, Tokyo, Japan). In all examinations, the image acquisition and reconstruction protocols were similar, varying depending on the clinical routine of the institution. The chest examinations were performed during a deep inspiratory breath hold, in a single volumetric acquisition in the caudocranial direction, with automatic exposure control, after intravenous administration of 80-100 mL of iodinated contrast medium (flow rate, 3.0 mL/s) adjusted according to patient weight in a rapid bolus injection, followed by injection of 30 mL of saline solution (flow rate, 3.0 mL/s). The images were reconstructed using a 512 × 512 matrix, a slice thickness of 0.5-1.25 mm, a standard filter (for the radiomic analysis), and a hard filter (used for visualization and manual measurement of the lesions in lung window settings). Other typical acquisition parameters were a tube voltage of 120 kVp, a tube current of 39-464 mAs, and a rotation time < 1 s.

Segmentation of the lesions
In order to perform the radiomic analysis, the lesions had to be segmented on the CT scans. The segmentation anatomically separates the structures or tissues seen in imaging examinations. In our study, all lesions were submitted to semi-automated segmentation with the Grow-Cut tool (3D Slicer, Boston, MA, USA), a interactive segmentation method (19) . The GrowCut method has been validated for lung cancer assessment on CT images (20,21) . For the semi-automated segmentation, two regions-inside and outside the tumor, respectively (Figures 1a and 1b)-were marked on three slices (axial, sagittal, and coronal) with lung window settings, at a level of −500 HU and a width of 1,400 HU. The tumor tissue was then detected in three dimensions with the GrowCut algorithm ( Figure  1c), after which the external portion of the tumor was removed ( Figure 1d) and the tumor borders were delimited ( Figure 1e). Finally, the tumor imaging data were exported as a structured Digital Imaging and Communications in Medicine for Radiation Therapy file (22) to be used in the radiomic feature extraction process.

Radiomic features
The radiomic feature extraction process consists of a massive calculation of numerical variables that represent the visual content of an image ( Figure 2). In this study, the tumors were characterized on the basis of 2,465 quantitative variables, with specific software (23)(24)(25) : IBEX (University of Texas MD Anderson Cancer Center, Houston, TX, USA); LIRe-JFeatureLib (Institute for Information Technology, Klagenfurt University, Klagenfurt, Austria); and ImageJ (National Institutes of Health, Bethesda, MD, USA). The radiomic features were classified into four main groups: first order, second order, higher order, and shape (11,26) . First-order features (gray level and histogram) individually describe the distribution of the tumor pixel values. Second-order features (co-occurrence matrix, run length matrix, and Tamura texture) describe spatial relationships between the tumor voxels. Higher-order features (neighboring gray tone difference matrix, Laplacian-of-Gaussian filters, Gabor filters, Fourier transform, Haar wavelet, and fractal dimension estimate) describe repetitive texture patterns imposed by filters or transforms. And finally, shape features describe the borders and geometric properties of the tumor.

Survival analysis
The prognosis was analyzed on the basis of the correlation between CT radiomic features and overall survival. We used the Kaplan-Meier method to calculate the survival times and the probability of all-cause death. Patients who were still alive or had been lost to follow-up were censored for the calculations of the overall survival probabilities.   Patients were grouped according to the median radiomic feature value (14) . The log-rank test was used in order to determine the statistical difference between the Kaplan-Meier curves in each group so as to identify the features with the greatest prognostic potential (R survival package; R Foundation for Statistical Computing, Vienna, Austria). Values of p < 0.05 were considered statistically significant.
The mean of the Fourier transform was the only radiomic feature that showed a statistically significant difference in the Kaplan-Meier curve analysis (Table 2). Patients with a high mean of the Fourier transform (greater than the median of 109.10) were identified as being at high risk, with a hazard ratio of 2.12 (95% CI: 1.01-4.48). That high-risk group was composed of 29 men (57%), with a mean age of 68.1 ± 9.4 years, and 22 women (43%), with a mean age of 67.1 ± 7.5 years (Figure 3). The following clinical stage distribution was seen in this group: T1, T2, T3, and T4 in 16 (31%), 25 (49%), 5 (10%), and 5 (10%) of the patients, respectively; N0, N1, N2, and N3 in 22 (43%), 10 (20%), 15 (29%), and 4 (8%), respectively; and M0 and M1 in 38 (75%) and 13 (25%), respectively. The histological types found in this group were adenocar-cinoma, in 30 cases (59%); squamous cell carcinoma, in 10 (19%); small cell carcinoma, in 5 (10%); carcinoid tumor, in 2 (4%); large cell carcinoma, in 1 (2%); unspecified NSCLC, in 1 (2%); poorly differentiated neuroendocrine NSCLC, in 1 (2%); and adenosquamous NSCLC, in 1 (2%). Patients with a low mean of the Fourier transform (less than or equal to the median of 109.10) were identified as being at low risk, with a hazard ratio of 0.47 (95% CI: 0.22-0.99). Figure 4 illustrates the tumor heterogeneity quantification in two lesions of the risk groups stratified on the basis of the mean of the Fourier transform. In comparison with the lower-risk lesions, the higher-risk lesions showed greater heterogeneity, characterized by a larger number of peaks in the chart showing the three-dimensional distribution of the gray levels and by the presence of more infiltrating regions in the local energy map.

DISCUSSION
Radiomics has proven to be a promising tool in the development of quantitative biomarkers for medical imaging, increasing diagnostic accuracy, improving prognostic assessment, and supporting personalized medicine (11,17) . In Table 2   the present study, we investigated the association between quantitative radiomic features on CT images and overall survival in patients with malignant lung neoplasms. In our sample, a radiomic feature related to tumor heterogeneity (the mean of the Fourier transform) correlated with overall survival.
Studies have shown that the features used for clinical staging in the TNM system are not the only CT features of primary lung tumors that correlate with prognosis. For example, the presence of cavities and the total tumor volume (including non-solid components) are factors that correlate with prognosis in patients with squamous cell NSCLC and adenocarcinoma, respectively (2) . There are studies that have quantitatively assessed NSCLC using radiomics. Van Timmeren et al. (27) presented three CT features with prognostic value for NSCLC: the mode (most common value) of the image histogram after the use of a Laplacianof-Gaussian filter, the mean intensity of a volume centered on the voxel with the highest gray level, and the inverse variance of the co-occurrence matrix calculated after a wavelet transform. Carvalho et al. (28) discovered the shortrun gray-level emphasis of the gray-level run length matrix on positron emission tomography images combined with CT, a feature that correlates with prognosis in patients with lung neoplasms. Aerts et al. (14) identified a radiomic signature associated with survival in patients with NSCLC composed of four features: (I) first-order energy, (II) shape compactness, (III) non-uniformity of the gray-level run length matrix, and (IV) non-uniformity of the gray-level run length matrix after a wavelet transform.
In the present study, we identified a radiomic signature, related to tumor heterogeneity, on CT images of lung lesions to have prognostic value for patients with lung cancer. To our knowledge, this finding has not been reported previously, except in a preliminary study presented in abstract form at a scientific conference. Radiomics-based prognostic assessment on chest CT images is an objective, noninvasive, low-cost method with great potential for use in routine clinical practice, depending only on appropriate scientific validation and definition of models for its inclusion as a tool in actual clinical settings (29) . Conceptually, the Fourier transform is used in order to obtain imaging features in the frequency domain. The mean frequency of the spectrum after the fast Fourier transform is associated with variations in smoother or rougher texture patterns, therefore being related to tumor heterogeneity (29) . Higher-risk lesions (overall mortality) had higher means of the Fourier transform and rougher, less uniform textures, whereas lower-risk lesions had lower means of the Fourier transform and smoother, more uniform textures.
Our study has some limitations. First, the sample size was relatively small. We chose to study only contrast-enhanced CT scans of diagnostic quality that were acquired by following an appropriate clinical protocol, rejecting unenhanced images and those with significant artifacts. Although all examinations included in our study followed the same clinical protocol for contrast medium administration, no corrections were made for possible differences in volume and flow rate depending on patient body type or cardiac status, factors that can affect enhancement pat-  terns. In addition, we included examinations performed on CT scanners with and without iterative reconstruction, which can also influence image resolution and the features analyzed. The number of deaths to be used in the overall survival analysis was also limited; only 16 deaths were directly related to lung cancer, which decreased the statistical power of our survival analysis. That is probably due to the fact that we focused on smaller (mainly T1 and T2) lesions, having a multidisciplinary discussion about the best course of action, whereas more aggressive tumors, which are associated with worse prognoses and shorter survival times, were excluded. Furthermore, the various courses of action dependent on the different types of lung neoplasms and the medications available can also affect the survival analysis and the prognosis. Studies like the present one serve as a proof of concept, showing the applicability of the radiomic model as a prognostic tool. However, for the effective inclusion of radiomics in the clinical setting, further studies should be conducted and models for its use within the diagnostic imaging workflow should be established. For example, it is important and advisable to conduct stability and reproducibility analyses of radiomic variables in larger volumes of data, in order to gather more evidence about the robustness of the features and to validate the radiomic approach. Although radiomics is meant to be more comprehensive and include aspects related to prognostic assessment and to therapeutic decision-making, it has conceptual and methodological bases in common with computer-aided diagnosis. The U.S. Food and Drug Administration has recently defined a set of rules aimed at facilitating the approval process of computer-aided detection systems (30) . These rules stipulate, among other requirements, that the documentation of the products include a detailed description of the patient population for which the system is indicated and a detailed description of the compatible equipment and compatible image acquisition protocols, as well as possible warnings and discussions about the product limitations, including situations in which the device can fail or may not achieve the expected performance level (e.g., because of poor image quality or use with certain subpopulations), as applicable. These rules aim to prevent, or at least minimize, performance variations in the process of image pattern recognition, enabling a more widespread use of the computer solution, assuming that the boundary conditions have been guaranteed. It is expected that something similar will be defined for the computational algorithms used in radiomics.
In conclusion, the present study investigated whether different radiomic methods could be considered effective quantitative biomarkers in images of malignant lung neoplasms. We identified a radiomic CT signature based on the Fourier transform that is potentially useful for prognostic assessment, risk stratification, and quantification of tumor heterogeneity in patients with lung cancer.