- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- NIHPA Author Manuscripts
- PMC2763559

# Automated Quantification of DNA Demethylation Effects in Cells via 3D Mapping of Nuclear Signatures and Population Homogeneity Assessment^{1}

^{*}Corresponding authors: Arkadiusz Gertych: Email: gro.shsc@ahcytreg, (Phone) 310-423 2090, (Fax) 310-423 7707, Jian Tajbakhsh: Email: gro.shsc@hshkabjat.naij

## Abstract

### Background

Today’s advanced microscopic imaging applies to the preclinical stages of drug discovery that employ high-throughput and high-content three-dimensional (3D) analysis of cells to more efficiently screen candidate compounds. Drug efficacy can be assessed by measuring response homogeneity to treatment within a cell population. In this study topologically quantified nuclear patterns of methylated cytosine and global nuclear DNA are utilized as signatures of cellular response to the treatment of cultured cells with the demethylating anti-cancer agents: 5-azacytidine (5-AZA) and octreotide (OCT).

### Methods

Mouse pituitary folliculostellate TtT-GF cells treated with 5-AZA and OCT for 48 hours, and untreated populations, were studied by immunofluorescence with a specific antibody against 5-methylcytosine (MeC), and 4,6-diamidino-2-phenylindole (DAPI) for delineation of methylated sites and global DNA in nuclei (n=163). Cell images were processed utilizing an automated 3D analysis software that we developed by combining seeded watershed segmentation to extract nuclear shells with measurements of Kullback-Leibler’s (K-L) divergence to analyze cell population homogeneity in the relative nuclear distribution patterns of MeC versus DAPI stained sites. Each cell was assigned to one of the four classes: *similar, likely similar, unlikely similar* and *dissimilar*.

### Results

Evaluation of the different cell groups revealed a significantly higher number of cells with *similar* or *likely similar* MeC/DAPI patterns among untreated cells (~100%), 5-AZA-treated cells (90%), and a lower degree of same type of cells (64%) in the OCT-treated population. The latter group contained (28%) of *unlikely similar* or *dissimilar* (7%) cells.

### Conclusion

Our approach was successful in the assessment of cellular behavior relevant to the biological impact of the applied drugs, i.e. the reorganization of MeC/DAPI distribution by demethylation. In a comparison with other metrics, K-L divergence has proven to be a more valuable and robust tool for categorization of individual cells within a population, with potential applications in epigenetic drug screening.

**KEY TERMS:**Cytomics, DNA methylation, dissimilarity, cell population homogeneity, Kullback-Leibler’s divergence, 3D nuclear mapping, watershed, image cytometry, epigenetic drug screening, high-content analysis

## INTRODUCTION

Topological analysis of the distribution of proteinaceous and nucleic acid components of the cell, in particular mammalian cell nuclei, is helpful in understanding cellular functions in the state of health versus disease [1–10]. Correlations between the distribution of cellular proteins and/or fractions of nuclear DNA and certain diseases has allowed mammalian cells to be utilized as useful models in the search for appropriate disease treatment, in the context of systems biology [11,12]. With the availability of today’s more advanced imaging approaches (including confocal laser scanning microscopy, two-photon excitation microscopy, high content cell imaging, and automated tissue scanning), high resolution optical imaging has evolved into an essential tool for moving new chemical entities through the pharmaceutical discovery pipeline utilizing cell-based assays. Imaging advantages for drug discovery are realized through the ability of high-resolution microscopic imaging to measure the spatial and temporal distribution of molecules and cellular components, which is vital to understanding the activity of drug targets at the cellular level. Thus, microscopic imaging applies to the preclinical stages of drug discovery for exploratory studies, target identification and validation, lead generation and optimization, and biomarker discovery [13]. Drug efficiency can be measured by the uniformity of cellular response upon drug application, focusing on what percentage of cells in a population has reacted to the applied drug. More interestingly, compound effects can be evaluated by imaging changes in the relevant proteins’ distribution patterns and or nucleic acid loci which function as drug targets. This new, cytomic approach [1,2] is gaining momentum by decreasing attrition in the very costly process of drug development.

Epigenetic changes, such as DNA methylation and histone modification, play a key role in cellular differentiation [14–16]. Aberrant global methylation patterns are associated with several cancer types. Methylation pattern imbalances in cancer cells include genome-wide hypomethylation and localized aberrant hypermethylation of CpG dinucleotides (CpG islands) in promoter regions of tumor suppressor genes [17,18]. The reversible nature of epigenetic aberrations constitutes an attractive therapeutic target, and epigenetic cancer therapy with demethylating agents has already shown to be promising [19]. Demethylating agents cause structural reorganization of the genome in cell nuclei, as they not only alter the DNA methylation load but also influence its spatial distribution [20,21]. Therefore, in a previous image-based cytometrical approach, we delineated methylcytosine (MeC) and overall DNA in AtT20 mouse pituitary tumor cells by means of immunofluorescence, and revealed significant differences in the patterns of MeC and DAPI-derived signals between untreated and a subpopulation of these cells treated with 5-azacytidine (5-AZA) [22], a demethylating agent that has been reported to change methylation patterns on a genomic scale [23]. Therefore, image-based assessment of DNA methylation patterns may provide a powerful technique for characterizing mammalian cells during differentiation and their status of health versus disease, as the underlying molecular processes involve large-scale chromatin reorganization, which is visible by light microscopy [24–29].

Today’s advanced cellular imaging systems can produce multispectral two-dimensional (2D) and 3D data in quantities that often require machine vision support to assess and quantify the degree of individual cell similarity within an entire cell population based on cellular features. Topological analyses typically necessitate the segmentation of cellular regions of interest (ROI), including the entire cell and/or subcellular compartments such as the nuclei. This process involves the delineation of the ROI, recognition of residing patterns, and statistical quantification of these patterns with dedicated algorithms. So far, nuclear features have been analyzed in one of the following three ways: (i) comparing a known or unknown pattern with a reference pattern using statistical tests; (ii) classification of patterns through supervised learning, utilizing decision trees, support vector machines and neural networks; or (iii) clustering, in which the distance between points in feature space is used as a discriminating factor [30]. The features are measurements reflecting complete cellular or just nuclear morphology, fluorescence intensity and texture. For example Strovas et al. normalized the intensity of a variant of green fluorescent protein (GFPuv) from methylotrophy promoter (P_{mxaF}) of single cells to their size, in *Methylobacterium extorquens* AM1 culture. This served as a descriptor of cell-to-cell heterogeneity in growth rate and gene expression in response to antibiotics [31]. Knowles et al. measured protein distribution through radial bright features within nuclei to identify changes in tissue phenotype [32]. Lin et al. employed linear discriminant analysis with nuclear models that were constructed from user-provided training examples to distinguish different cell types [33]. Markovian and fractal features [34], Zernike moments, co-occurrence matrices [35] and features generated by Gabor transformation have been commonly used in recognizing subcellular structures [36]. Yet, the sensitivity of texture features depends strongly on the optical system setup, such as focusing, image magnification and object positioning. In the description of cellular structures, the textural, morphological and intensity features are usually complementary.

The use of features in the quantitative description of 3D nuclear architecture is employed in many biological and medical applications, ranging from *in situ* studies of DNA, protein localization and migration in living cells, exploration of the structural aspects of cell division to investigations of the role of nuclear alterations in pathology [6–10,37,38]. These approaches mostly consider the statistical distribution of one target, a protein or DNA fragment (single gene copy or genomic region) to be analyzed. In those cases, a reference pattern detected under specific conditions is usually defined and compared to protein/DNA distribution patterns that result from changes in culture conditions. However, image-based cytometry, which readily considers two or more parameters at the same time, would largely benefit from algorithms that can statistically assess patterns of multiple cellular targets. This is especially valuable in the discovery of pathways that can be targeted in drug discovery. Here we report the development and application of a novel comparison-based approach that provides a statistical measurement on the two classes of DNAs; MeC and DAPI-positive global DNA, as nuclear targets. The algorithm compares the relative distribution of signals derived from these two targets (from two color-channels), projects them onto scatter plots, and then measures the degree of similarities between the plotted signal distributions of cells within a population [22]. This method offers a way to evaluate cellular response to external factors such as drugs and changes in culture conditions *via* a dissimilarity assessment of relevant cellular structures.

Similarity between two data objects is perceived through measurement of the objects proximity in a multi-dimensional space, and is used to express the objects’ relationships within a cluster or between clusters obtained through a partitioning process. Distance or similarity measurements between objects forming a cluster have been defined as equivalent notions [39]; however, appropriate metrics are required in order to identify objects with similar or dissimilar profiles. Commonly applied similarity measures can be organized into three groups according to object representation: (1) point-based, including Euclidean and Minkowski distances, (2) set-based including Jaccard’s, Tanimoto’s, and Dice’s [40] indices, and (3) probabilistic with Bhattacharyya [41], Kullback-Leibler’s, and correlation-based Mahalanobis [42] distances, respectively. In many practical applications the objects are described by discrete features, by which the similarity is assessed [39]. Furthermore, the sample homogeneity as cluster quality measure can be perceived as an averaged pairwise object similarity [36, 39].

We utilized the Kullback-Leibler’s measure with its properties in our study. The background of this approach is introduced here. Let us consider a random discrete variable *X* with probability distribution *p* = {*p _{i}*}, where

*p*is the probability for the system to be in

_{i}*i*-th state. The measure log (1/

*p*) is called the unexpectedness or surprise [43]. Two extreme states can occur: if

_{i}*p*= 1, then the event is certain to happen, and if

_{i}*p*≈ 0 then the event is nearly impossible. Now, consider two discrete distributions

_{i}*p*= {

*p*} and

_{i}*q*= {

*q*}, where

_{i}*p*and

_{i}*q*are the probabilities of occurrence of the

_{i}*i*-th state in a set of system states. The difference: log(1/

*q*) −log (1/

_{i}*p*)defines change of unexpectedness of the probability

_{i}*p*with respect to probability

*q*. Averaging the unexpectedness of the events over

*p*leads to:

_{i}
where: *H* (*p*) is the negative of Shannon’s entropy [44] and K(*p, q*) is the measure of information referred to as inaccuracy [45]. *KL*(*p*‖*q*) is nonnegative and delimited by the following constraints:
$\underset{{q}_{i\ne 0}}{\underset{{p}_{i\to 0}}{\text{lim}\phantom{\rule{thinmathspace}{0ex}}}}\phantom{\rule{thinmathspace}{0ex}}{p}_{i}\phantom{\rule{thinmathspace}{0ex}}\text{log}\phantom{\rule{thinmathspace}{0ex}}({p}_{i}/{q}_{i})=0,\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\underset{{p}_{i\ne 0}}{\underset{{q}_{i\to 0}}{\text{lim}\phantom{\rule{thinmathspace}{0ex}}}}\phantom{\rule{thinmathspace}{0ex}}{p}_{i}\phantom{\rule{thinmathspace}{0ex}}\text{log}\phantom{\rule{thinmathspace}{0ex}}({p}_{i}/{q}_{i})=\mathrm{\infty}$.

Function *KL*(*p*‖*q*) is known as the Kullback-Leibler’s divergence [46] of information linked to two probability distributions *p* and *q*. This is also a measure of how different two probability distributions (over the same system states space) are. Typically, *p _{i}* represents data, observations, or a precisely calculated probability distribution, and

*q*represents an “arbitrary” distribution, a model, a description or an approximation of

_{i}*p*. Following [46] it is assumed that: (i) 0log(0/

_{i}*q*) = 0; and (ii) terms in Eq. (1) where the denominator is zero are treated as undefined and are neglected in order to provide absolute continuity of

_{i}*p*with

_{i}*q*.

_{i}The Kullback-Leibler’s divergence can be used to measure the distance between various kinds of distributions [47]. For instance, it has been employed in medical and systems biology applications including registration of image datasets [48], image segmentation [49], temporal analysis of gene expression [50], clustering of gene expression data [51] and similarity analysis of DNA sequences [52]. The objects’ homogeneity assessment is then performed in two steps. First, distance-based similarity is measured between the combined 2D MeC/DAPI histograms of all nuclei and the histogram of each individual cell nucleus. Second, each nucleus (object) in the population is assigned into one of the predefined categories based on similarities.

Assessment of cell population homogeneity is not a trivial task as it is constrained by the imaging modalities and the cell type itself. In a typical setting, the evaluation of cellular response to external factors such as drugs can be achieved with a comparison of the treated population to an untreated (reference) population. However, in this work we present a method to assess each population by itself, in isolation. These populations were analyzed *a posteriori*, (i.e. without prior knowledge of relevant structural information). Regardless, our approach also allows for a global assessment of cellular patterns among populations.

In 3D image analysis of nuclei, the segmentation of the nucleus and the quantification of residing features are the most vital components. A common scheme in existing approaches is the watershed algorithm followed by extraction of pertinent features [53–58]. The aforementioned solutions require the extraction of tens of features for clustering or classifier training for the further application of a pattern recognition task. Hence, an algorithm utilized for feature extraction and pattern recognition, may be restricted by the morphology of a specimen, in which some features are redundant while others are irrelevant. Although some methods for cellular detection and segmentation have been proposed, a general-purpose system that can perform analysis and recognition tasks for a variety of confocal microscope images without necessitating an approach modification or system training (related to the target-specific applications) is still not available.

The main aim of this work is to develop a software system that can be robustly applied to the topological analysis of nuclear targets, such as MeC and DAPI, which will provide useful parameters in the elucidation of epigenetic mechanisms as well as the evaluation of epigenetic drugs tested in cultured cell models. The algorithm developed combines the three major tasks: (1) automated segmentation of nuclei in a cell population, (2) subsequent nuclear pattern extraction, and (3) distance-based statistical measurement of cell dissimilarity using Kullback-Leibler (K-L) divergence. This method considers the strength of statistical evaluation of intra-nuclear MeC/DAPI patterns, especially valuable when cell population homogeneity is difficult to be assessed due to lack of standardized reference and sample size. In this study, we evaluate the potential of using an unsupervised 3D seeded watershed algorithm coupled with K-L divergence measurement to calculate the dissimilarity of mouse pituitary folliculostellate TtT-GF cell response to treatment with the demethylating agents, 5-AZA and OCT. This response was quantitatively measured and displayed as the differential co-distribution of MeC/global DNA signals in treated and untreated cells. A comparison of K-L divergence with other commonly used similarity metrics demonstrates the superior performance of our method.

## MATERIALS AND METHODS

### Cell Culture

TtT-GF cells (ATCC) were grown in serum-containing low glucose Dulbecco's modified Eagle's medium (Invitrogen) supplemented with 10% fetal bovine serum, with addition of 2 mM glutamine and 1% antibiotic/antimycotic (100 units/ml penicillin G sodium, 100 µg/ml streptomycin sulfate) (Invitrogen), in 6% CO_{2}, 37 °C as described by Ben-Shlomo et al. [59]. Cells were plated at 1×10^{5} cells onto coverslips in multi-well plates, and allowed to attach for 24 hours. Then, cells were divided into two groups: (i) two control populations that were not treated for 48 hours (NT-TtT-1, NT-TtT-2), (ii) and two treated populations: AZA-TtT cells treated with 1 µM 5-azacytidine (Sigma-Aldrich) and OCT-TtT cells treated with 100 nM octreotide (Sigma-Aldrich), both for 48 hours.

### Immunofluorescence and Imaging

In order to preserve the three-dimensional structure, cells cultured on coverslips in 12-well microplates were fixed with 4 % paraformaldehyde/phosphate buffered saline (PBS) (Sigma-Aldrich) and permeabilized as previously described in [60,61]. Subsequently, cellular RNA was removed with RNase A (Novagen), particularly because transfer RNA (tRNA) contains methylated cytosine as previously described [22]. Cells were depurinated with 2N HCl and blocked with 2% BSA/PBS prior to application of antibodies: a monoclonal mouse 5-MeC antibody (EMD Biosciences) followed by a secondary Alexa 488-linked goat anti-mouse polyclonal IgG (Invitrogen). The specimens were counterstained with DAPI, and 3D imaging was performed using a confocal laser scanning microscope TCS SP2 (Leica Microsystems Inc.) equipped with a multi-line argon laser (458 nm, 488 nm, 514 nm) for Alexa 488 (MeC), and a 405 nm diode laser line for excitation of DAPI fluorescence: serial optical 2D sections were collected at increments of 200–300 nm with a Plan-Apo 63× 1.4 oil immersion lens; pinhole size was 1.0 airy unit. To avoid bleed-through, the imaging of each channel was acquired sequentially. The typical image size was 1024 × 1024, with a respective voxel size of 116nm × 116nm × 230.5 nm (*x*,*y*, and *z* axes), and resolution was 8 bits per pixel in all channels. Example images of NT-TtT cells are presented in Figure 1. Fluorescence intensity of 5-MeC and DAPI signals, *I _{MeC}* and

*I*, from optical sections were recorded into separate 3D channels.

_{DAPI}### Image Analysis

Image analysis was performed in three main steps (see Figure 2): (1) 3D image segmentation resulting in the delineation of a 3D shell for each individual nucleus; (2) extraction of MeC and DAPI signal intensity distribution within each 3D shell; and (3) dissimilarity assessment of MeC and DAPI signal distribution patterns between each individual nucleus and a reference pattern derived from the entire cell population (Fig. 2). This workflow was designed based on the images taken from the NT-TtT-1 and the following assumptions: the background in each image stack was considered to be quasi-uniform, meaning that there are very small to zero low frequency fluctuations or trends in the background through a single image plane or across the depth of the image stack. Moreover, all images in each stack are assumed to be acquired under nearly identical conditions and modality settings, and so the drift of the settings during acquisition can be considered minimal and thus neglected. In order to reduce computational complexity during the segmentation phase, the image resolution was decreased by a factor of four for this step only in the *x* and *y* directions. The developed methodology was subsequently applied to all image stacks.

#### STEP 1: 3D Segmentation of Nuclei

The *I _{DAPI}* and

*I*image stacks were combined in the following way:

_{MeC}*I*(

*x*,

*y*,

*z*) = max (

*I*(

_{DAPI}*x*,

*y*,

*z*),

*I*(

_{MeC}*x*,

*y*,

*z*)), thus intensity of the output image

*I*is always a maximum of the intensities in corresponding channels at pixel position (

*x*,

*y*,

*z*) (Fig. 3A). To separate the nuclei from the background a histogram of image

*I*was constructed. We apply the technique described in [62] yielding the threshold value

*T*that splits the histogram into two parts; a main peak representing the background, and a histogram tail reflecting intensities of the nuclear content. A binary image was obtained in which background pixels and nuclear content were converted to the values 0 and 1, respectively. This image was then subjected to enhancement by means of 3D morphological operations (

_{b}*closing*and

*filling holes*), yielding a refined binary image

*I*(Fig. 3B). We note that in

_{b}*I*the majority of nuclei were distinct. However, some nuclei touch (or nearly so) one another to form larger clusters. These two groups of objects were processed separately to better delineate all nuclei.

_{b}A reduction of the original resolution by factor of four of images *I _{b}* and

*I*creates two down-sampled images

*I*'

_{b}and

*I*' respectively. Labeling and counting of the binary objects in

*I*'

_{b}was carried out according to Haralick et al. [34], and the volume of each object was found. A mean volume value

*T*, served as criterion to split the image

_{vol}*I*'

_{b}into two binary masks, one with small components

*I*'

_{bs}and one with large components

*I*'

_{bl}(Fig. 3C). Then, all voxels of image

*I*' under the mask

*I*'

_{b}were replaced by a constant value

*T*, creating an image

_{b}*I*'

_{m}that models the nuclei (Fig. 3D). Such approach is useful for object segmentation, because it is comprised of image intensities equal to or lower than the automatically defined threshold

*T*. This model, is used to create 3D seeds that define location of each nucleus, and serves also as the input for the seeded watershed segmentation technique.

_{b}Next, image *I*' _{m} was subjected to smoothing by two anisotropic Gaussian filters, *G _{s}* and

*G*for small and large binary components, respectively. Infinite Gaussian kernel is approximated and its size is defined by

_{i}*N*

_{x},

*N*

_{y}and

*N*

_{z}representing mask size in each direction. The smoothing effect in 3D is controlled by three parameters

*G*(σ

_{x},σ

_{y},σ

_{z}). To assure that smoothing can produce a signal strong enough to detect a seed, the approximated filter kernels were adaptively adjusted to the relative volume of the binary objects in

*I*'

_{bl}and

*I*'

_{bs}respectively. The kernel size is adjusted first. We chose a spherical model for cell nuclei, and allocated three kernels for each

*x*,

*y*and

*z*axis of a sphere. This approach provides a predefined number of filter kernels that fit the hypothetical nuclear size, in our case seven (

*n*= 7). Since the image voxels in our data stacks are not isotropic,

*N*can be almost twice as much compared to

_{z}*N*=

_{x}*N*, and the filter size can therefore be calculated from

_{y}*T*≥

_{vol}*n*·

*N*. Substituting

_{x}N_{y}N_{z}*N*= 2

_{z}*N*and

_{x}*N*=

_{y}*N*the filter size ${N}_{x}\le \sqrt[3]{{T}_{\mathit{\text{vol}}}/2n}$ can be derived as the largest odd number satisfying this inequality. Thus, mean volumes of binary objects in

_{x}*I*and

_{bs}*I*can be used to calculate filter size

_{bl}*N*for kernels

_{x}*G*and

_{s}*G*. Second, the remaining filter coefficients σ

_{l}_{x}, σ

_{x}, and σ

_{x}were empirically set to one half of the mask size in each direction. In general, sizes of

*G*and

_{s}*G*kernels are proportional to the mean volume of binary objects under respective masks, and so the corresponding filter coefficients. Also, the size of

_{l}*G*is never smaller than

_{l}*G*.

_{s}The image *I*' _{m} is separately smoothed once (by each kernel) to obtain the images *I*' _{ms} and *I*' _{ml} . The larger the kernel is, the smoother the created surface of the ROI (nucleus) will be. After filtering, the results were combined into one output binary image according to:

where *trh* denotes a threshold function expressed as:

and where *Q*(*x*,*y*,*z*) is an image, *T* is the threshold, *I*’_{f} is the output image, *I*' _{ms}, *I*' _{ml} are the smoothed components, 2297 denotes element-by-element multiplication and is the matrix logical union.

The smoothing procedure produces slowly varying intensity fields in *I*’_{ms} and *I*’_{ml} with maxima and local plateaus resembling blobs in 3D space which are located inside the nuclei, with intensities oscillating around *T _{b}*. The location and size of the maxima depend on the smoothing kernel and the nucleus size. The thresholding of the smoothed image at the level of

*T*yields binary seeds in

_{b}*I*’

_{f}, with one seed per nucleus (Fig. 3E). Small seeds were eliminated and converted to background.

The watershed algorithm [63] in its original form has several well-known limitations; it typically over-segments the image and does not take into account image-inherited cues such as intensity gradients, topology and content of segmented objects. Thus, the seeds serve as *a priori* knowledge about segmented structures and form numerous points for algorithm initialization. Such an approach has the potential to generate a number of unique regions that closely matches the number of seeds. In this study we extend the existing implementation of the 2D seeded watershed method [64,65] to obtain 3D nuclear shells (Fig. 2F). During this segmentation each nucleus receives a label for further identification and visualization. Then, the segmented image *I*' _{s} was up-sampled by factor of four with the nearest neighbor interpolation technique, resulting in the image *I _{s}* that contains the 3D nuclear shells. This image can also be superimposed onto

*I*or

_{DAPI}*I*and displayed, as shown later in Figure 4.

_{MeC}#### STEP 2: Extraction of MeC and DAPI patterns

A powerful aspect of scatter plots is their ability to depict mixture models of simple relationships between variables. These relationships can reflect cellular patterns as specific signatures, in which the variables can be nuclear structures as shown in the case of DNA methylation patterns versus DAPI-stained DNA [22]. These nuclear entities are not static and reorganize during cellular differentiation, as well as upon the application of demethylating agents. Earlier we showed that such reorganizations can be dynamically monitored by scatter plotting the two types of DNA, with their differential distribution becoming visible as changes in the plotted patterns. In this case, we first individually segmented nuclei to create three-dimensional ROIs (3D-shells). Then, we plotted the fluorescent MeC and DAPI signal distributions within these shells. Utilizing K-L divergence, the degree of similarity between two scatter plots can be easily measured, and reflects the similarity of target (MeC and DAPI signals) topology between two cell nuclei (in Kullback-Leibler sense).

#### STEP 3: Nuclear Pattern Analysis by means of Kullback-Leibler’s divergence

In our approach, we applied the K-L divergence as a statistical measure of dissimilarity between two normalized scatter plots: the value of *q _{i}* denotes a probability of occurrence of intensity

*i*in an analyzed nucleus outlined by 3D shells and

*p*signifies a reference scatter plot component. The reference scatter plot is constructed from all individual plots. To the best of our knowledge, no such work on identification of nuclear patterns based on Kullback-Leibler’s measure has been reported so far. Therefore, this is an innovative way to perform an intra-population assessment of cells with regard to their homogeneity in response to environmental changes in culture, and is especially suitable for high-throughput multi-parameter analyses.

_{i}The K-L divergences represent distinctive and relative measurements derived from a unique cell population. A comparison of K-L values between experiments, in principle, requires identical reference distributions to be applied. However, a lack of reproducibility in sample preparation, drift and instability of imaging modality settings is the primary constraint in determining such a universal reference. In order to reduce the influence of these constrains, and to make the K-L values more descriptive, we introduced four soft-qualifiers for defining the similarity degree of a cell versus the entire cell population. These degrees are associated with particular ranges of K-L divergences derived for two idealized Gaussian distributions. For the multivariate *d*-dimensional Gaussian densities given by *G*(**x**,**μ**, **Σ**) = (2π)^{−d/2} |**Σ**|^{−1/2} exp(−0.5(**x**− **μ**)^{T} **Σ**^{−1} (**x**−**μ**)) the Kullback-Leibler’s divergence is expressed by:

where: **x** is the random variable, **μ** is the vector of means, **Σ** is the covariance matrix, **tr** is the trace function, and |·| is the determinant of a matrix.

The K-L divergence in Eq. (4) between two one-dimensional univariate Gaussian distributions
${p}_{G}(x)=N(x;{\mu}_{p},{\sigma}_{p}^{2})\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}{q}_{G}(x)=N(x;{\mu}_{q},{\sigma}_{q}^{2})$ with *x* as the random variable comes down to [60]:

Furthermore, assuming that σ_{p} ≈ σ_{q} and that σ can be substituted instead of σ_{p} and σ_{q} in Eq. (5), we obtain *KL _{G}* (μ

_{p},σ;μ

_{q},σ) = (μ

_{q}− μ

_{p})

^{2}/2σ

^{2}, where the numerator reflects the distance between the peaks of the two Gaussian distributions. The

*KL*in the simplified formula can be also related to the fraction of the distributions’ overlap area and used as a way of articulating dissimilarity. Also, when expressing μ

_{G}_{p}−μ

_{q}as a multiple of σ, the

*KL*value becomes solely dependent on the standard deviation in the evaluated distributions. Table 1 illustrates the four soft-qualifiers defining the similarity degree of

_{G}*KL*divergence linked to σ, obtained on the basis of the aforementioned assumption. The four soft-qualifiers are defined as:

_{G}*similar*

*KL*[0,0.5),

_{G}*likely similar*

*KL*[0.5,2),

_{G}*unlikely similar*

*KL*[2,4.5), and

_{G}*dissimilar*for

*KL*[4.5, ∞). Thus this procedure can be perceived as a classification process. As a side note, the K-L divergence between two bivariate normal densities is a function of Pearson’s correlation coefficient [66].

_{G}### Evaluation of Similarity Measures

Three commonly used similarity metrics including Mahalanobis, Bhattacharyya distances, and Dice’s index were implemented into the image analysis workflow together with the proposed K-L divergence and then applied to NT-TtT-1, AZA-TtT and OCT-TtT cellular images. Since none of these metrics have been documented for assessing cell culture homogeneity through 2D methylation pattern histograms, we compared their performance to determine the most appropriate approach for measuring demethylating effects by nuclear topology. Unlike the method and system validation characteristics such as accuracy and reliability that are based on individual results, the characteristic of the uncertainty of results delivered by a classification method needs to be determined on a method-to-method based comparison [67]. Therefore, using the uncertainty as a validation characteristic raises the objectivity of our comparative evaluation. In our case we used similarity values of nuclei within a cell population. Assuming that a similarity metric can label a nucleus in a way that it reflects its natural proximity to other nuclei in the feature (nuclear pattern) space, then such labeling should have a low uncertainty.

Our evaluation steps were as follows: (i) each of the tested metrics yielded a similarity value for all nuclei; (ii) the nuclei were grouped into classes based on assigned similarity value. For this a minimum distance criterion in class forming scheme was applied, and up to six classes were generated; (iii) clustering results were evaluated as described in [67]. The entropies of the results were calculated as a measure of uncertainty, in which the lowest entropy indicates the least uncertainty of results produced by the evaluated method. Finally (iv), a normalized certainty was used for method comparison [67]:

where: *M* is the number of classes used in the classification scheme, and *Entropy _{M}* is calculated from the results of the similarity measure classification into

*M*classes.

## RESULTS

Untreated (NT-TtT-1, NT-TtT-2) and treated mouse pituitary tumor cells (OCT-TtT and AZA-TtT), (total number of cells n=163) were imaged, and then analyzed by our in-house developed, MATLAB-based software. Following our algorithm, the three-dimensional nuclear shells were first delineated (Fig. 3), and then for each nucleus within an image field the fluorescent signals derived from MeC-specific staining and DAPI staining were mapped as respective scatter plots. The K-L divergences of the distribution of MeC and DAPI signals between individual plots (nuclei) and the reference plot (cumulative plot from all nuclei) were then calculated. The algorithm displays the K-L values and the digital ROI for each cell nucleus, as shown in Figure 4. Six nuclei (two from each of OCT-TtT, AZA-TtT and NT-TtT-1 cell group) illustrating different nuclear MeC and DAPI patterns were selected as examples for visualization purposes. The fields appearing in these figures are smaller than the complete microscopic field of view. Figure 3 shows the earlier intermediate steps of the algorithm described in the methods section, followed by the actual results in Figure 4.

The applicability of the K-L divergence was tested for the categorization of nuclear patterns with significantly different DAPI signal distributions. One-dimensional MeC and DAPI histograms were generated for each of the two 5-AZA-treated as well as the two OCT-treated nuclei, and plotted next to their respective 2D joint MeC/DAPI diagrams (Figure 5). This separation shows that both signals, MeC and DAPI, differ in their intensities (indicated by the curves’ shapes) between cells, which can be interpreted as the result of differences between cells in their response to the demethylating agents.

**...**

Based on the definition of soft-qualifiers in Table 1, we have chosen four categories into which the processed nuclei fall: *similar*, *likely similar*, *unlikely similar*, and *dissimilar*.

This categorization helps to characterize a cell population in a quantitative and readable fashion (Table 2). The classification was performed twofold: (i) using solely the MeC histogram, and (ii) using joint MeC/DAPI histograms, of individual cells versus the entire population. In the first case a combined MeC histogram was used as the reference distribution. The outcome provides statistical information about the number of cells that fall into each category. Different cell populations can then be compared based on their category statistics.

*similar*or

*likely similar*MeC/DAPI patterns among untreated cells

**...**

Utilizing the joint MeC/DAPI patterns in the categorization of the four groups of cell populations revealed that all NT-TtT-1 cells are classified as at least *likely similar*, with a majority of 76 % being *similar*. This signifies a relatively high homogeneity of MeC versus DAPI distribution within the NT-TtT-1 cell population. Likewise, 74.5% *similar*, 23.5% *likely similar* and 2% of *unlikely similar* cells was found in NT-TtTGF-2 population. Our assessment of untreated cells revealed that the distribution of the cell categories was quite consistent in populations with different numbers of cells. In comparison, OCT-TtT cells display a higher portion (64 %) of *likely similar* cells and to a lesser degree (36 %) also *unlikely similar* cells. The AZA-TtT cells represent very low ratio of dissimilarity with 90% of *similar* and 10% of *likely similar* cells. However one can note that their intracellular architecture is different comparing to NT-TtT and OCT-TtT cells in that, fewer loci is seen within AZA-TtT cell nuclei vs. nuclei in the remaining cultures.

Utilizing only MeC histograms to categorize cells yielded no *dissimilar* cells in all four tested populations. In NT-TtT-1 and NT-TtT-2 control cell lines there were identical fractions of approximately *similar* cells (88%). In NT-TtT-1 12% of cells and 10% in NT-TtT-2 were classified as *likely similar*, with 2% of cells found *unlikely similar* in NT-TtT-2 population. OCT-treated cells revealed almost equal (28–35%) allocation of cells among all three *similar* cells categories. The cell population treated with 5-AZA was characterized as highly represented by *similar* cells (97%) with only one cell (3.3%) classified as *likely similar*.

The cell categorization was implemented into the image visualization and analysis software we developed, as shown in Fig. 6. Such visualization is a valuable feature of image-based cytometry, providing dual information of cell behavior/category and localization within the sample environment. Processed images of the three cell groups used in this study underwent a visual check by an expert (J.T.) and the dissimilarity evaluations between cells matched the automated analytical results.

**...**

In our definition of soft qualifiers, normality of the sampled population was assumed. In order to evaluate normality of the individual MeC/DAPI distributions, we estimated two Gaussian components by means of the expectation-maximization clustering algorithm [68] in each of the segmented nuclei of the NT-TtT-1 population (Fig. 7). The components estimated in this way constitute approximately 75% of data points of each nucleus. In addition using Lilliefors’ statistical tests [69] we tested a null hypothesis, which considers that the data derives from a multivariate family of normal distributions. This test was performed for each nucleus and separately for each dimension. The null hypothesis was not rejected at the 5% significance level. Therefore, we assume that the scatter plots obtained throughout our experiments can be approximated by multivariate Gaussian components.

**...**

Selected similarity metrics, including the Mahalanobis and Bhattacharyya distances as well as the Dice’s coefficient and the proposed K-L divergence, were calculated for each of the individual two-dimensional MeC and DAPI plots and the combined distribution. The normalized certainty (Eq. 6) of the results determined by the different metric methods is presented in Table 3.

**...**

Our comparison of the different most applicable metrics indicates that in the majority of cases (73%) the normalized certainties reached their highest values when the classification was based on the K-L divergence. Moreover, if more than two classes (a more frequent scenario) are considered in the classification scheme, the proposed K-L similarity measure achieves the highest certainty scores in even more of the cases (91%).

## DISCUSSION

The main goal of this study was to develop an automated image analysis tool that would be suitable for measuring the effects of demethylating agents through the differential analysis of relevant nuclear structures, as represented by methylated CpG-dinucleotides (MeCs) and global DNA, in cells. For this purpose, a dedicated tool was designed that performs the three sequential steps on individual cells within a population: (1) unsupervised segmentation of 3D imaged cell nuclei via seeded watershed algorithm, (2) multi-channel quantitative distribution analysis of nuclear entities, and (3) similarity testing of cells in regard of their distribution profiles by means of Kullback-Leibler’s divergence measurement. Our experience with mouse pituitary tumor cells confirms that demethylating agents can exert the two known effects: (i) a decrease in the number of MeCs in global DNA [70], and (ii) the subsequent decondensation of highly compact heterochromatic regions of the genome, that lead to spatial reorganization in the nucleus and affect nuclear architecture [28]. The image analysis we developed utilizes these coexisting phenomenon to measure and display the relevant changes in intensity distribution of the two types of signals that reflect said phenomena: (a) MeC-signals created through immunofluorescence targeting of methylated cytosine and (b) DAPI-signals generated by subsequent counter-staining of the same cells, as DAPI intercalates into AT-rich DNA the main component of highly repetitive and compact heterochromatic sequences. Our computational approach minimized the usual obstacles in automated cellular analysis such as intra-specimen variation in background and morphological properties of nuclei, including size, shape, and structural density. Furthermore, cellular clustering seen for some types of cells such as pituitary tumor cells in culture, can create a poor contrast between nuclear borders. The implementation of the seeded watershed algorithm in here allowed for a conservative separation of nuclei. In addition, the change of object resolution during image processing allowed for process acceleration through reduction of computational complexity. The segmentation masks can be overlaid onto the corresponding raw MeC/DAPI images for performing visual assessment of segmentation accuracy. It should be noted that the visual classification of the composite MeC/DAPI signals can be very time consuming and quite subjective, as compared to computer-aided classification in an automated fashion. This fact is especially true when large sets of image data with a highly non-geometrical distribution of nuclear targets need to be processed. In this way, both the delineation of the nuclei and the topological quantification of the complex patterns will be streamlined and results will be produced with higher confidence. The developed method is amenable to scale and suitable for high-content, high-throughput analysis of cells in both research and at the industrial volume.

In previous studies we showed that the nuclear distribution of MeC versus DAPI signals, displayed as a 2D scatter plot, can serve as a signature by which cells differing in their state of differentiation or in treatment can be distinguished [22]. We also observed that untreated and drug-treated cells of the same kind display different degrees of dissimilarity within their populations, as judged by the resulting scatter plots. This led to the development of the synthesized image analysis method described here, which utilizes the resulting scatter plots in a statistical fashion to assess structural and behavioral dissimilarity within a cell population. These features are generally studied in relevance to a variety of cell biological applications. Our notion was to develop and test an algorithm that can be meaningfully and robustly applied to the evaluation of demethylating agents such as 5-azacytidine and octreotide. However, the developed algorithm can be flexibly utilized for similar topological studies, in which nuclear entities and their distribution are targeted in a biological context. Especially, the modular integration of the K-L divergence measure is a valuable feature that allows for the statistical evaluation of cells, when the targets do not have a consistent location within the considered ROI, such as the nucleus. Furthermore, our analyses indicate that if only 1D histograms of MeC signal distribution were utilized for K-L divergence measurement, significantly different results were observed in homogeneity assessment when compared to those using the joint 2D MeC/DAPI histograms. The exclusion of DAPI causes a shift of the cells to the lower categories, suggesting that the DAPI signal is a meaningful dynamic parameter as it increases the differential resolution in the image-based analysis of nuclear methylation patterns. This can be reconciled with the aforementioned biological effects on nuclear DNA. In particular, heterochromatin decondensation, as a secondary effect of global demethylation, results in the relocation of heterochromatic sites within the nucleus (which is associated with genome destabilization). As a consequence of these conformational and organizational changes of the DAPI-positive nuclear sites, the same DAPI signal intensity is spread out over a higher number of voxels. Thus, both MeC and DAPI have dynamic patterns in the cell nucleus that become more discernable in a joint 2D plots than in a 1D MeC plot, or even when the two signals are separately displayed in one dimension (see Figure 5). Notably, our snapshots of untreated cells also display dissimilarity in MeC/DAPI signal distribution, however to a much lesser degree than treated cells (Figure 4). We assume that this could be due to the fact that the cells were in different cell cycle phases, as this study did not apply any synchronizing agents for two reasons: (i) to minimize other induced effects that could interfere with demethylation, and (ii) to more closely model the *in vivo* situation in which synchronicity of cells within their native tissue environment is naturally not the case. Therefore untreated cells that display a lower MeC load signal may represent replicating cells in S-Phase that had not completed methylation of *de novo* synthesized DNA strands, as delay times between the two processes of replication and methylation have been reported for various types of cells [71,72].

Our approach directly illustrates the distribution of voxel intensities. The changes of these distributions are derived from the underlying changes in the topology (spatial patterns) of global DNA in response to drug treatment. Consequently, we are able to demonstrate that when the topological nuclear distribution patterns of methylated cytosine and global DNA are converted into two-dimensional histograms, they can be utilized as differential biosignatures in the evaluation of cellular response to treatment with demethylating anti-cancer agents. This characteristic is in line with the larger purpose of our approach, namely to create a rapid image analysis method that is of low complexity and therefore computationally inexpensive with potential for high-throughput cell screening tasks.

Other statistical methods such as cluster or bimodal analyses [73,74], commonly utilized in gene expression analysis [75], are important when targets (with respective intensities) have a definite location (coordinates). These methods are valuable in assessing ratio-labeling of targets when hybridized to arrayed nucleic acid fragments that are immobilized and have defined coordinates on the supporting material (DNA microarrays) [76–78], or when hybridized to genomic loci with known chromosomal locations on metaphase chromosomes of normal cells [79]. In contrast, nuclear targets such as genomic loci on largely decondensed DNA or proteinaceaous entities may strongly vary in their localization between nuclei. In these cases, the K-L divergence becomes of value as it does not require dealing with absolute target coordinates for similarity testing. Moreover and unlike *k*-means clustering or bimodal analysis of gene expression, the K-L approach tolerates the occurrence of null categories that may not be filled by any object (in this case nucleus). Fig. 6 shows an example in which the fourth category, namely *dissimilar*, is not represented by any of the nuclei in the tested population (no red-colored nucleus is present in Fig. 6B).

The Kullback-Leibler’s divergence is a valuable method for quantitating dissimilarities within a cell population and this measure can be applied to any multi-color cellular assay that utilizes topological information of intracellular structures to assess cellular behavior. Our comparison of the metrics most frequently used for similarity measurements demonstrates that the K-L method produces the highest certainty (least uncertainty) for the nuclear MeC/DAPI pattern analysis within the imaged cell populations. Moreover, the Pearson’s correlation coefficient between two distributions can be directly calculated from the K-L divergence if the distributions are normal, especially in cases when correlating samples do not have equal size. However, proving normality of multimodal distributions may increase computational complexity in practical cases. A way of identifying a distribution’s normal components described and implemented in our study supports the suitability of K-L divergence to be used for our data, especially in determining the soft qualifiers, because in our study the majority of the acquired 2D signals had a normal distribution.

We observed the robustness of the K-L divergence against potential intra-experimental data variability introduced through the biochemical processing of specimen or the modality settings in between imaging sessions, which both may additively alter the intensity levels within the MeC and the DAPI channel. We did not detect any difference in K-L divergences, which was confirmed by the fact that the shape of the scatter plots remained unchanged. On the contrary influences of multiplicative nature may skew the results of all types of metrics. Additionally, the K-L divergence measurement has the advantage of being independent from image rotation and the inherent anisotropy of confocal microscopy images.

As one would expect, statistical methods in the form of similarity measures gain more confidence when applied to large datasets, in this case large cell populations with thousands of nuclei. To our pleasant surprise, the K-L divergence outperformed the comparative metrics when utilized for smaller cell populations of only around twenty cells. This underlines not only the robustness of the method, but also its flexibility in dealing with a high dynamic range in sample size. This characteristic is quite valuable in connection with the current limited capabilities of our imaging systems that are restricted in the field of view size when acquiring highest-resolution 3D images. Thus, it is necessary to collect and tile multiple image stacks in order to obtain a complete picture of the entire sample. The robustness of the K-L measurement allows it to be applied across the entire tiled image. Such an approach could be helpful in the assessment of relationships between single cells and their macro- and micro-level neighborhoods for studying intra- and inter-population functional relationships through epigenetic effects such as DNA methylation via tissue diagnostics in disease pathology and cell-based assays for compound screening in drug development.

## Acknowledgments

This work was supported by the US Navy Bureau of Medicine and Surgery, the National Science Foundation, and the National Institutes of Health.

## Footnotes

^{1}We thank Dr. Anat Ben-Shlomo (Cedars-Sinai Medical Center) for providing treated and control TtT-GF cells.

## REFERENCES

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (3.2M)

- Measuring topology of low-intensity DNA methylation sites for high-throughput assessment of epigenetic drug-induced effects in cancer cells.[Exp Cell Res. 2010]
*Gertych A, Farkas DL, Tajbakhsh J.**Exp Cell Res. 2010 Nov 15; 316(19):3150-60. Epub 2010 Sep 8.* - 3-D DNA methylation phenotypes correlate with cytotoxicity levels in prostate and liver cancer cell models.[BMC Pharmacol Toxicol. 2013]
*Gertych A, Oh JH, Wawrowsky KA, Weisenberger DJ, Tajbakhsh J.**BMC Pharmacol Toxicol. 2013 Feb 11; 14:11. Epub 2013 Feb 11.* - Covisualization of methylcytosine, global DNA, and protein biomarkers for In Situ 3D DNA methylation phenotyping of stem cells.[Methods Mol Biol. 2013]
*Tajbakhsh J.**Methods Mol Biol. 2013; 1052:77-88.* - DNA methylation topology: potential of a chromatin landmark for epigenetic drug toxicology.[Epigenomics. 2011]
*Tajbakhsh J.**Epigenomics. 2011 Dec; 3(6):761-70.* - Targeting cellular memory to reprogram the epigenome, restore potential, and improve somatic cell nuclear transfer.[Anim Reprod Sci. 2007]
*Eilertsen KJ, Power RA, Harkins LL, Misica P.**Anim Reprod Sci. 2007 Mar; 98(1-2):129-46. Epub 2006 Oct 21.*

- Nuclear DNA Methylation and Chromatin Condensation Phenotypes Are Distinct Between Normally Proliferating/Aging, Rapidly Growing/Immortal, and Senescent Cells[Oncotarget. ]
*Oh JH, Gertych A, Tajbakhsh J.**Oncotarget. 4(3)474-493* - 3-D DNA methylation phenotypes correlate with cytotoxicity levels in prostate and liver cancer cell models[BMC Pharmacology & Toxicology. ]
*Gertych A, Oh JH, Wawrowsky KA, Weisenberger DJ, Tajbakhsh J.**BMC Pharmacology & Toxicology. 1411* - DNA methylation topology: potential of a chromatin landmark for epigenetic drug toxicology[Epigenomics. 2011]
*Tajbakhsh J.**Epigenomics. 2011 Dec; 3(6)761-770* - Early In Vitro Differentiation of Mouse Definitive Endoderm Is Not Correlated with Progressive Maturation of Nuclear DNA Methylation Patterns[PLoS ONE. ]
*Tajbakhsh J, Gertych A, Fagg WS, Hatada S, Fair JH.**PLoS ONE. 6(7)e21861* - Measuring Topology of Low-Intensity DNA Methylation Sites for High Throughput Assessment of Epigenetic Drug-Induced Effects in Cancer Cells[Experimental cell research. 2010]
*Gertych A, Farkas DL, Tajbakhsh J.**Experimental cell research. 2010 Nov 15; 316(19)3150-3160*

- Automated Quantification of DNA Demethylation Effects in Cells via 3D Mapping of...Automated Quantification of DNA Demethylation Effects in Cells via 3D Mapping of Nuclear Signatures and Population Homogeneity AssessmentNIHPA Author Manuscripts. Jul 2009; 75(7)569PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...