# Identification of microRNAs with regulatory potential using a matched microRNA-mRNA time-course data

^{1}School of Mathematics and Statistics,

^{2}Sydney Bioinformatics, Centre for Mathematical Biology, University of Sydney, Sydney,

^{3}Blood Stem Cell and Cancer Research Unit, St Vincent Centre for Applied Biomedical Research, Department of Haematology, St Vincent's Hospital and St Vincent's Clinical School, University of New South Wales, Darlinghurst, NSW, Australia

## Abstract

Over the past decade, a class of small RNA molecules called microRNAs (miRNAs) has been shown to regulate gene expression at the post-transcription stage. While early work focused on the identification of miRNAs using a combination of experimental and computational techniques, subsequent studies have focused on identification of miRNA-target mRNA pairs as each miRNA can have hundreds of mRNA targets. The experimental validation of some miRNAs as oncogenic has provided further motivation for research in this area. In this article we propose an odds-ratio (OR) statistic for identification of regulatory miRNAs. It is based on integrative analysis of matched miRNA and mRNA time-course microarray data. The OR-statistic was used for (i) identification of miRNAs with regulatory potential, (ii) identification of miRNA-target mRNA pairs and (iii) identification of time lags between changes in miRNA expression and those of its target mRNAs. We applied the OR-statistic to a cancer data set and identified a small set of miRNAs that were negatively correlated to mRNAs. A literature survey revealed that some of the miRNAs that were predicted to be regulatory, were indeed oncogenic or tumor suppressors. Finally, some of the predicted miRNA targets have been shown to be experimentally valid.

## INTRODUCTION

MicroRNAs (miRNAs) are short non-coding RNAs, approximately 20 nucleotides long, that control gene expression by either repressing the translation of mRNA into proteins or directing the cleavage of mRNA in nematodes and higher organisms including humans. miRNAs play an important role in various biological processes e.g. the miRNAs lin-4 and let-7 have been shown to regulate the larval development in *Caenorhabditis elegans* (1,2). Though some human miRNAs have been shown to be oncogenic or tumor suppressors (3–7), the functions of most human miRNAs are currently unknown. Perhaps one of the reasons for this is the fact that a single miRNA targets possibly hundreds of mRNAs thereby making it hard to determine a miRNA's function without first accurately identifying its target mRNAs. The target identification process commonly involves two steps—(i) identification of miRNA–mRNA pairs using prediction models (8,9) and (ii) experimental validation of the relevant miRNA–mRNA pairs. In addition to target identification, miRNA research has focused on microarray analysis for experimental validation of oncogenic miRNAs (3–7), comparison of miRNA normalization techniques (10) and identification of coexpressed miRNAs (11).

To identify miRNAs that regulate mRNAs, one needs to co-analyze the changes in miRNA and mRNA expressions. Once the expression profiles of miRNAs and mRNAs have been obtained using microarray experiments, statistical methods are required to determine the association between the two expression profiles. Yona *et al*. (12) evaluated several measures of similarity between expression profiles of genes, e.g. Euclidean distance, Pearson's correlation and Spearman's rank correlation. The authors observed that the best metric varied from one data set to another though Spearman's rank correlation was consistently among the best performers. In our analysis, we used the Spearman's rank correlation as one of the measures for evaluating the association between miRNAs and mRNAs.

An important component of miRNA–mRNA data integration is the knowledge of potential mRNA targets for each miRNA. There are a number of popular target prediction algorithms such as PicTar (9), miRanda (13) [implemented in miRBase (14)] and TargetScanS (8), as well as methods that combine different prediction algorithms, e.g. miRGen (15). Each of these prediction algorithms has its strengths and weaknesses. For example, TargetScanS focuses on identification of target mRNAs by searching for 5′-dominant mRNA sites. Therefore, it is likely to miss targets that contain 3′-compensatory sites. Other algorithms such as miRanda considers both 5′-dominant and 3′-compensatory sites for target identification. While TargetScanS and miRanda focus on identification of targets for each miRNA separately, PicTar considers the combinatorial effect of coexpressed miRNAs for target prediction.

A recent paper (13) showed that TargetScanS, miRanda and PicTar have almost identical sensitivity values, where sensitivity was calculated as ‘true positives/(true positives + false negatives)’. Here, true positives corresponded to the number of experimentally validated miRNA-target mRNA pairs that were predicted by an algorithm and false negatives corresponded to the number of experimentally validated miRNA-target mRNA pairs that were not predicted by the algorithm. However, miRanda predicted nearly double the number of miRNA-target mRNA pairs predicted by the other two algorithms. The number of predicted miRNA-target mRNA pairs could be reduced by considering the intersection of two or more algorithms at the cost of lower sensitivity values. Currently only the intersection of TargetScanS and PicTar has a sensitivity value close to those returned for the individual algorithms (13).

While these predictive algorithms provided a good starting point, they returned a few hundred mRNAs as potential targets. In order to have better biological interpretation, developing statistics to identify miRNA–mRNA pairs that are most likely to be of biological significance is an important goal. To this end, we developed an odds-ratio (OR) statistic for measuring the association between putative miRNA-target mRNA pairs and identifying regulatory miRNAs.

Recently, Huang *et al*. (16) used a Bayesian model to determine the posterior probability of an mRNA being targeted by a miRNA. Unlike our approach that focuses on the identification of regulatory miRNAs, they focused on filtering the predicted miRNA-target mRNA pairs using an Expectation-Maximization approach. Cheng and Li (17) used the changes in expression profiles of mRNAs and knowledge of predicted miRNA-target mRNA pairs to infer whether a miRNA is regulatory. Our approach is based on the matched analysis of miRNA and mRNA expression data and considers a miRNA to be regulatory if and only if the change in expression profile of a miRNA and its predicted target-mRNAs is correlated.

Time-course studies provide information that could often be missed in a cross-sectional study based on a single time point. Currently, typical microarray time-course data is short with uneven time points and very few replicates [for a detailed review, refer (18)]. Therefore, standard time series analysis methods like Fourier transform are usually not applicable. In this article, we have used moderated *t*^{2}-statistic (19) and moderated *F*-statistic (20); methods that have been developed for handling short time-course microarray data.

In this article we propose an integrative analysis of miRNA and mRNA data that incorporates time information to identify (i) miRNAs that are likely to regulate gene expression and (ii) their target mRNAs. We first describe the OR-statistic and later demonstrate the potential value of OR-statistic using a data set obtained from a cancer study.

## METHODS

To identify regulatory miRNAs in matched microRNA-mRNA time-course data, we performed a number of distinct steps—(i) data pre-processing, (ii) identification of differentially expressed (DE) miRNAs, (iii) identification of regulatory miRNAs and (iv) identification of mRNAs that were negatively correlated to the relevant miRNAs. Application of an integrative approach to the last two steps is the main focus of this article. We applied the OR-statistic as well as the gene set test (GST)-based methods to a longitudinal time-course cancer dataset to illustrate this approach.

### Experimental data

The cancer dataset corresponded to a drug study involving a multiple myeloma cell line U266, consisting of six time points—0, 2 h, 4 h, 8 h, 24 h and 48 h with two biological replicates per time point for both miRNA and mRNA. The same RNA sample for each time-course was hybridized to both the miRNA and mRNA microarrays. The miRNA expression profiles were determined using two-color Exiqon arrays V8.1 and the mRNA expression profiles were determined using Human Genome U133 Plus 2.0 Affymetrix arrays. Raw expression data from Exiqon array was extracted using the image analysis package Spot (21).

### Statistical analysis

#### Data Pre-processing

#### Differentially expressed miRNA (mRNA)

We fit a linear model and tested the null hypothesis that there was no change in expression at any time point *x* with respect to time point 0, where *x* = 2, 4, 8, 24, and 48 h. The *P*-values for the *F*-test were adjusted for multiple comparisons using the BH correction method (28) and miRNAs with adjusted *P*-values less than 0.05 were considered to be statistically significant and differentially expressed.

#### Discretized expression profiles

We obtained *t*-statistic for the null hypothesis H_{0}: *μ*_{tg} = *μ*_{0g} versus the alternate hypothesis H_{1}: *μ*_{tg} ≠ *μ*_{0g}, where *μ*_{tg} is the average expression of mRNA g at time point *t* and *μ*_{0g} is the average expression at time point 0. Let **M*** _{g}* = [

*m*

_{1}, …,

*m*]

_{k}^{T}denote the classification of

*t*-statistic as up-regulated, down-regulated or non-differentially expressed for the

*g*-th mRNA, where

*k*is the number of time points excluding time point 0. In other words,

*m*takes the values +1, –1 or 0 based on whether the

_{i}*g*-th mRNA is up-regulated, down-regulated or non-differentially expressed, respectively, at time point

*t*with respect to time point 0. We will henceforth refer to

**M**

*as the discretized expression profile for the*

_{g}*g-*th mRNA. Similar discretized expression profiles were obtained from miRNA expression data. It should be noted that the

*t*-statistics were discretized using the limma function

*decideTests*.

### Integrative analysis

We propose a few approaches for measuring the strength of association between a miRNA and its predicted target mRNAs.

#### OR-statistic

Let **W** and **M** denote the discretized expression profiles for miRNA and mRNA, respectively. Let **M**^{1}, …, **M*** ^{j}* denote the discretized expression profiles of the

*j*mRNAs that are predicted as targets of a miRNA. As a first step, we focused on whether there was a change in expression rather than the direction of change. Let

where |*w _{r}*| denotes the absolute value of the

*r*-th element of miRNA vector

**W**, |

*m*| denotes the

^{s}_{r}*r*-th element for the

*s*-th target mRNA, and

*I*(

*x*=

*y*) is an indicator function that takes the value 1 if the condition is satisfied and 0, otherwise. The variables

*a*,

*b*,

*c*and

*d*were used to populate a 2 × 2 contingency table (Table 1) and obtain the OR.

Let Odds_{1} = *b*/*a* and Odds_{2} = *d*/*c*. Then, OR = Odds_{2}/Odds_{1}. In other words,

where G_{mRNA} denotes the set of *j* mRNAs that were predicted as a miRNA's targets.

The null hypothesis H_{0}: OR = 1, i.e. a change in the expression of predicted target mRNAs is independent of a change in the miRNA's expression is tested using a chi-squared test with one degree of freedom. Alternatively, H_{0} can be tested using a *G*-test (29) if |*O _{i}* –

*E*| >

_{i}*E*for the

_{i}*i-*th cell in the 2 × 2 contingency table. Here,

*O*denotes the observed value for the

_{i}*i-*th cell,

*E*denotes the expected value for the

_{i}*i-*th cell, and 1 ≤ i ≤ 4.

Since different miRNA-target prediction algorithms return different results for the same miRNA, the OR-statistic is dependent on the prediction algorithm. Therefore, a ranking of miRNAs based on the OR-statistic would vary from one algorithm to another. This problem is similar to many statistical problems in clinical studies that require meta-analysis techniques. In the absence of the ability to determine the optimal prediction algorithm, one solution is to combine the results from several miRNA-target prediction algorithms and determine the overall rank of a miRNA. To this end, we propose the use of Fisher's combined test (30) with the test statistic , where *p _{i}* denotes the

*p*-value obtained using the OR-statistic for the

*i-*th algorithm and

*n*denotes the number of algorithms. Here, the χ

^{2}-statistic has a chi-squared distribution with 2 ×

*n*degrees of freedom. It should be noted that

*p*values are not independent as the results are obtained for the same data set using miRNA-target prediction algorithms with partial overlap. Therefore, the

_{i}*p*-value for the chi-squared test should be treated with caution. In this article, we calculated the χ

^{2}-statistic for only those miRNAs that were predicted as regulatory using each of the

*n*miRNA-target prediction algorithms. Although all the ranked miRNAs had regulatory potential, miRNAs that were ranked high by two or more algorithms were ranked high overall and were more likely to be regulatory.

#### OR-statistic with time lag

Since a change in miRNA expression may not necessarily produce an instantaneous change in target mRNA's expression, we expanded our previously discussed model to incorporate a delayed change in target mRNA's expression. We considered five different time lags (Table 2) and, for each time lag, performed the following steps:

- calculated OR for each miRNA;
- tested the null hypothesis that OR = 1 using a chi-squared test; and
- obtained miRNAs that had OR > 1 and
*p*-values lower than 0.05 for the chi-squared test.

It should be noted that for each time lag, the vectors **W** and **M** corresponded to matched time points. For example, for time-lag 1, **W** = [*w*_{2}, *w*_{4}, *w*_{8}, *w*_{24}]^{T} and **M** = [*m*_{4}, *m*_{8}, *m*_{24}, *m*_{48}]^{T}, where *w _{i}* and

*m*denote the ternary value at time point

_{i}*i*h.

#### Negatively correlated miRNA–mRNA pairs

While it is important to identify regulatory miRNAs, for experimental validation of a miRNA's regulatory effect, it is equally important to determine mRNAs that are negatively correlated to it. Therefore, we identified target mRNAs whose expression levels changed in the opposite direction to that of miRNA.

Let *I*(*w _{r}m_{r}* = −1) and

*f*=

*m*–

*a*. Here, we dropped the superscript

*s*from

*m*as we considered miRNA–mRNA pairs one at a time. The variables

_{r}*e*and

*f*were used to populate Table 3 and obtain the odds of a change in the discretized miRNA's expression profile resulting in a change in the opposite direction in its target mRNA. If the odds-value was greater than one, then the miRNA–mRNA pair could be considered to be negatively correlated.

#### Gene set test-based methods

In addition to the OR-statistic, we used two GST-based methods for identifying regulatory miRNAs. In principle, the GST-based methods are similar to the gene set enrichment analysis (GSEA) (31) that is used to determine whether a group of genes, selected on the basis of *a priori* biological knowledge, e.g. genes in a biological pathway or belonging to the same gene ontology, has an expression profile different from that for the remaining genes. Here, we determined whether a group of mRNAs, predicted as targets of a particular miRNA, has a change in expression for a change in the relevant miRNA's expression. The two GST-based methods are described below:

*Correlation-coefficient-based GST method (CC-GST):*We obtained*t*-statistic for the null hypothesis H_{0}:*μ*_{tg}=*μ*_{0g}versus the alternate hypothesis H_{1}:*μ*_{tg}≠*μ*_{0g}, where*μ*_{tg}is the average expression of miRNA (mRNA) g at time point*t*and*μ*_{0g}is the average expression at time point 0. Next, for a given miRNA, mi, we obtained the Spearman's correlation coefficient for all mi_{v}-mRNA pairs using the_{v}*t*-statistic. Let*X*denote the set of correlation coefficients for those mRNAs that were predicted as targets of miand let_{v}*Y*denote the set of remaining correlation coefficients. Since changes in miRNA expression are negatively correlated to changes in mRNA expression, we tested the null hypothesis H_{0}:*μ*=_{X}*μ*versus the alternate hypothesis H_{Y}_{1}:*μ*<_{X}*μ*, where_{Y}*μ*denotes the average correlation coefficient for set_{X}*X*and*μ*denotes the average correlation coefficient for set_{Y}*Y*. We used the Wilcoxon rank-sum test (32) for testing the null hypothesis and if the*P*-value (after adjusting for multiple comparisons) was lower than a pre-determined cut-off value, e.g. 0.05, then we considered mito be statistically significant._{v}*F-statistic-based GST method (F-GST)*: For each mRNA, we obtained the*F*-statistic for the null hypothesis that there was no change in expression with respect to time point 0. Next, for a given miRNA, mi, we tested the null hypothesis H_{v}_{0}:*μ*=_{X}*μ*versus the alternate hypothesis H_{Y}_{1}:*μ*≠_{X}*μ*, where_{Y}*μ*denotes the average of_{X}*F*-statistics for mRNAs that were predicted as targets of miand_{v}*μ*denotes the average of_{Y}*F*-statistics for the remaining mRNAs. We used the Wilcoxon rank-sum test for hypothesis testing and if the*P*-value (after adjusting for multiple comparisons) was lower than a pre-determined cut-off value, e.g. 0.05, then we considered mito be statistically significant._{v}

## RESULTS

### Data pre-processing

We used standard pre-processing methods for single-color (mRNA) and two-color (miRNA) arrays. However, unlike the usual print-tip loess normalization method (33) for two-color gene expression data, we used global loess normalization. As evident in Table 4, there were too few highly expressed miRNAs to allow for estimation of print-tip based loess lines at highly expressed ‘spots’. The small number of highly expressed miRNAs could be due to the fact that multiple-species miRNAs were placed on the same array and most of ‘spots’ would not have hybridized with miRNAs extracted from cancer patients.

### Differentially expressed (DE) miRNAs

Using a linear model for the experimental data set, we obtained 726 DE miRNAs, out of which 193 miRNAs corresponded to a known or predicted human miRNA giving a total of 135 unique human miRNAs as DE. As mentioned earlier in the ‘Methods’ section, the DE miRNAs were obtained by testing the null hypothesis that there was no change in miRNA expression at any time point *x* with respect to time point 0, where *x* = 2, 4, 8, 24 and 48 h. We denote the set of these 135 miRNAs as **U** and it would be used later for identifying the regulatory miRNAs.

While miRNAs with adjusted *p*-values lower than a pre-determined cut-off value may be regulatory, a better method of determining regulatory miRNAs would take into account changes in expressions of both miRNA and mRNA. This is because a miRNA that is regulatory should be (i) differentially expressed over the time course and (ii) associated with changes in its target mRNAs expressions.

We found that the discretized expression profiles were usually vectors of all 0s (Table 5) which was in agreement with the widely known observation that at any given time point, the majority of miRNAs/mRNAs are not DE. It should be noted that the discretized expression profiles were obtained by testing pairwise hypotheses (refer ‘Methods’ section) which is different from the hypothesis used for determining **U**. Therefore, the two results were slightly different.

For the cancer study, we considered five different time lags (Table 2) and determined the corresponding 2 × 2 contingency table (see ‘Methods’ section). Since the majority of miRNAs/mRNAs had no change in expression, many of the OR contingency tables (i.e. Table 1) had one or more elements as 0. If even one of the elements was 0, the OR was not calculated and the results in Table 6 are based on only those miRNA for which every element in the OR matrix was strictly greater than 0.

### Estimating regulatory miRNAs

We obtained *p*-values for the null hypothesis H_{0}: OR = 1 for (i) different miRNA-target prediction algorithms and (ii) different time lags. We considered four different miRNA-target prediction algorithms—(i) PicTar, (ii) TargetScanS, (iii) miRBase and (iv) miRGen (intersection of target mRNAs returned by PicTar (4-way) and TargetScanS). Figure 1 shows the concordance between the miRNA rankings for time-lag 0 using the four different algorithms. The rankings were obtained using the *p*-values for the OR-statistic such that the miRNA with the lowest *p*-value was ranked 1. While the rankings obtained using PicTar and TargetScanS were quite similar, they differed from that obtained using miRBase. Since the rankings were not consistent, we used the Fisher's combined test to obtain the overall rank of miRNAs. The *G*-test and the chi-squared test selected the same miRNAs as regulatory for every combination of miRNA-target prediction algorithm and time-lag. Also, the overall ranks obtained using *G*-test and chi-squared test were similar with only some of the miRNAs being ranked slightly different. We decided to use only the chi-squared test's results for the rest of the analysis.

**...**

For a particular time lag, we considered a miRNA to be regulatory if (i) the *p*-value for the chi-squared test (based on the OR-statistic) was statistically significant, (ii) the OR value was greater than 1, and (iii) the miRNA was DE (i.e. miRNA was found in set **U**). We obtained 20 miRNAs of interest and some of these were identified as regulatory for more than one time lag, e.g. hsa-miR-16 was found to be regulatory for time-lag 0 and time-lag 1. We obtained 33 miRNA-time lag combinations of interest and hsa-miR-16 was ranked the highest with a time lag of 0. Other top-ranked miRNAs included hsa-miR-30b (time-lag 1), hsa-miR-20a (time-lag 1), hsa-miR-148a (time-lag 2) and hsa-miR-181c (time-lag 2).

It should be noted that since the dataset was longitudinal, considering the miRNA/mRNA expressions at different time points as independent and using the moderated *t*-statistic to discretize the expression profiles may have resulted in some false positives and false negatives e.g. (34).

### Negative correlation between miRNA and mRNA

For each regulatory miRNA, we determined mRNAs that were negatively correlated using matched miRNA–mRNA discretized expression profiles. For example, while considering miRNA–mRNA correlation for time-lag 1 (Table 2), a change in miRNA expression at time point *t* = 2 h was matched to a change in mRNA expression at time point *t* = 4 h.

Since a single mRNA maps to *i* probesets on the Affymetrix chip (*i* ≥ 1), we obtained the values *e _{i}* and

*f*for each miRNA-probeset combination, where

_{i}*e*and

_{i}*f*denote the elements

_{i}*e*and

*f*in Table 3 for miRNA-probeset

*combination. Next we calculated the odds of negative correlation i.e. the ratio . For each miRNA-target prediction algorithm, we found a few miRNA–mRNA pairs with odds greater than one and therefore, negatively correlated.*

_{i}Figure 2 shows the log_{2}-fold change over the time course for hsa-miR-16 and some of the negatively correlated target mRNAs. These target mRNAs were selected as PDCD4, CREBL2 and RAB11FIP2 have been experimentally validated as hsa-miR-16's targets (3). BAG5 was selected as Bcl2 is a known hsa-miR-16 target (3), but it was identified using proteomics and not mRNA microarray data. Perhaps hsa-miR-16 regulates Bcl2 expression via BAG5.

_{2}-fold change values for hsa-miR-16 and some of its targets mRNAs identified using OR-statistic. The horizontal blue lines correspond to 1.5-fold change (log

_{2}value of 0.58).

Since each target mRNA maps to multiple probe-sets, the values in Figure 2 represent the median values per time point. For each target mRNA, we observed some variability in fold change values among the probesets and this is shown in Figure 3.

### Concordance with results obtained using GST-based methods

We also obtained regulatory miRNAs using the two GST-based methods as follows:

*CC-GST*: For time-lag 0, we obtained statistically significant miRNAs for each of the four miRNA-target prediction algorithms. We considered a miRNA to be regulatory if it was statistically significant using CC-GST and DE (i.e. miRNA was found in set**U**). Unlike the result obtained using OR-statistic, we did not find any miRNA that was common to all the four algorithms.*F-GST*: For each miRNA target prediction algorithm, we obtained statistically significant miRNAs for each of the four miRNA-target prediction algorithms. We considered a miRNA to be regulatory if it was statistically significant using F-GST and DE (i.e. miRNA was found in set**U**). We found 19 miRNAs common to all the four target prediction algorithms. However, only six of these miRNAs were also obtained using the OR-statistic. In fact, hsa-miR-16, the highest ranked miRNA obtained using the OR-statistic, was not found using the F-GST method. Therefore, there was little agreement between the miRNAs returned by the two methods.

## DISCUSSIONS

In this article, we propose an OR-statistic for integrating miRNA and mRNA expression profiles using time-course data and obtaining the miRNA–mRNA pairs of interest. Since miRNAs are a part of gene regulatory mechanism, they could be possible targets for drug development.

A literature search revealed that some of the miRNAs identified using the OR-statistic have been shown to be oncogenic or tumor suppressors. For example, hsa-miR-16, the highest ranked miRNA in our analysis, has been linked to chronic lymphocytic leukemia (35) and mantle cell lymphoma (36). Similarly, hsa-miR-20a has been linked to breast cancer and lung cancer (37). hsa-miR-148 has been shown to target DNA methyltransferase 3b (DNMT3b) gene (38) and a reduced expression of DNMT3b has been shown to induce apoptosis of cancer cells (39). Another miRNA, hsa-miR-21 (time-lag 0 and rank 22), has been shown to be oncogenic in multiple myeloma cells (40).

We observed that only 26.6% of the DE miRNAs mapped to known human miRNAs. Although miRNA sequences are highly conserved across species, many miRNAs that are found in species such as mouse, and are likely to have human counterparts, have currently not been validated in humans. Infact only 30.6% of the probes on Exiqon arrays were mapped to a known human miRNA. Another possible reason could be that ~10% of the probes on Exiqon arrays correspond to computationally predicted or poorly characterized miRNAs. It is likely that some of these miRNAs are present in humans but are currently un-annotated.

The OR-statistic could be extended to incorporate some of the combinatorial effects of miRNA-based gene regulation. Instead of evaluating a miRNA's regulatory potential, one could look at miRNAs that are co-expressed and determine this group's regulatory potential as a unit. However, this would require either *a priori* knowledge of miRNA modules or a model-based approach to miRNA module identification. Recently Joung *et al*. (11) used an evolutionary algorithm to identify miRNA-modules and matched them to mRNA-modules but currently their method is not applicable to time-course data.

Due to the nature of short time-course data, the calculation of correlation between miRNA and mRNA based on actual expression values introduces too much noise. Therefore, we chose to discretize the miRNA/mRNA expression profiles into vectors of 0, +1 and –1. Though there could be a loss of information owing to this discretization, we believe that the reduction in noise outweighs this potential problem. Moreover, our approach could be easily adapted to longer time-course data and the OR-statistic could be calculated using Pearson's correlation-coefficient (based on actual expressions) or Spearman's correlation-coefficient (based on moderated *t*-statistic) with correlation-coefficients above a threshold being discretized to 1 and those below the threshold being discretized to 0.

The OR-statistic, CC-GST, and F-GST are different metrics for identifying regulatory miRNAs. For time-lag 0, we obtained results using not only the OR-statistic but also the GST-based methods. For time-lag 0, the CC-GST method did not return any miRNA that was common to all the four miRNA-target prediction algorithms. This raises concerns about the reliability of correlation-coefficient based methods for short time-course data because a miRNA that is identified by several algorithms is more likely to be regulatory compared to one that is identified by only one algorithm. Unlike the OR-statistic that can be used for all possible time lags (Table 2), the CC-GST can only be used for time-lag 0. For the remaining time lags, there are very few (≤4) data points making it hard to distinguish between genuine correlations and those by chance. However, for a longer time-series data set, this will not be a limitation. We also used the F-GST method to obtain regulatory miRNAs for time-lag 0. The results obtained using the F-GST method and the OR-statistic were not in agreement. The F-GST method returned 19 regulatory miRNAs and the OR-statistic-based method returned 20 regulatory miRNAs but only six miRNAs were common to the two methods. It should be noted that the F-GST method cannot be used to determine the time lag between changes in miRNA expression and mRNA expression. Since the identification of time lag is essential for any experimental validation of results, the F-GST method may be of limited use.

If we had *a priori* biological knowledge of the miRNAs that were regulatory (i.e. a gold standard), we could have compared the results obtained using F-GST and OR-statistic. We would have preferred the method that had more ‘true’ regulatory miRNAs in the list but currently such a gold standard is unavailable. A literature survey (7) revealed that some of the miRNAs identified using the OR-statistic have been experimentally validated in multiple myeloma patients. However, currently only one of the miRNAs returned by the F-GST method has been experimentally validated. Therefore, the experimental data favor OR-statistic as the method of choice for identifying regulatory miRNAs.

Finally, both OR-statistic and GST-based methods are dependent on the quality of miRNA-target prediction algorithms. Since currently there is no one algorithm that outperforms others in terms of sensitivity and specificity, we used the popular algorithms and obtained miRNAs of interest by combining the various algorithm-specific results. As the accuracy of miRNA-target prediction improves, the accuracy of these methods will also improve. The methods described in this article have been implemented in R (41). The GST-based analysis is a part of the mirGst package and the R scripts related to the integrative analysis are available upon request.

## FUNDING

Australian Research Council grant DP0770395 (to Y.Y. and V.J.); St. Vincent's Hospital Haematology Research Fund (to D.M. and M.L.); Arrow Bone Marrow Transplant Foundation (to M.L.) and Sydney Medical Research Foundation (to D.M.); financial support from Arrow BMT Foundation and Sydney Medical Research Foundation (to M.L. and D.M.). Funding for open access charge: School of Mathematics and Statistics' Research Incentive Scheme (University of Sydney).

*Conflict of interest statement*. None declared.

## ACKNOWLEDGEMENTS

The authors would like to acknowledge Dr Tiffany Khong and Dr Andrew Spencer of the Myeloma Research Group, Alfred Hospital, Melbourne, Australia for providing the treated cell line samples. The authors would like to thank the two reviewers for their comments and feedback that helped in improving the paper. The authors would like to thank Dr Ru-Fang Yeh for discussions relating to this work.

## REFERENCES

*C. elegans*. Cell. 1989;57:49–57. [PubMed]

**Oxford University Press**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (1.0M) |
- Citation

- Computational identification of hepatitis C virus associated microRNA-mRNA regulatory modules in human livers.[BMC Genomics. 2009]
*Peng X, Li Y, Walters KA, Rosenzweig ER, Lederer SL, Aicher LD, Proll S, Katze MG.**BMC Genomics. 2009 Aug 11; 10:373. Epub 2009 Aug 11.* - Modeling microRNA-mRNA interactions using PLS regression in human colon cancer.[BMC Med Genomics. 2011]
*Li X, Gill R, Cooper NG, Yoo JK, Datta S.**BMC Med Genomics. 2011 May 19; 4:44. Epub 2011 May 19.* - Correlation of expression profiles between microRNAs and mRNA targets using NCI-60 data.[BMC Genomics. 2009]
*Wang YP, Li KB.**BMC Genomics. 2009 May 12; 10:218. Epub 2009 May 12.* - Embryonic stem cell microRNAs: defining factors in induced pluripotent (iPS) and cancer (CSC) stem cells?[Curr Stem Cell Res Ther. 2009]
*Gunaratne PH.**Curr Stem Cell Res Ther. 2009 Sep; 4(3):168-77.* - Working together: combinatorial regulation by microRNAs.[Adv Exp Med Biol. 2013]
*Friedman Y, Balaga O, Linial M.**Adv Exp Med Biol. 2013; 774:317-37.*

- miRNA Temporal Analyzer (mirnaTA): a bioinformatics tool for identifying differentially expressed microRNAs in temporal studies using normal quantile transformation[GigaScience. ]
*Cer RZ, Herrera-Galeano JE, Anderson JJ, Bishop-Lilly KA, Mokashi VP.**GigaScience. 320* - BayMiR: inferring evidence for endogenous miRNA-induced gene repression from mRNA expression profiles[BMC Genomics. ]
*Radfar H, Wong W, Morris Q.**BMC Genomics. 14592* - CHO microRNA engineering is growing up: Recent successes and future challenges[Biotechnology Advances. 2013]
*Jadhav V, Hackl M, Druz A, Shridhar S, Chung CY, Heffner KM, Kreil DP, Betenbaugh M, Shiloach J, Barron N, Grillari J, Borth N.**Biotechnology Advances. 2013 Dec; 31(8)1501-1513* - Integrative Omics Analysis Reveals the Importance and Scope of Translational Repression in microRNA-mediated Regulation[Molecular & Cellular Proteomics : MCP. 2013...]
*Liu Q, Halvey PJ, Shyr Y, Slebos RJ, Liebler DC, Zhang B.**Molecular & Cellular Proteomics : MCP. 2013 Jul; 12(7)1900-1911* - Interplay of microRNAs, transcription factors and target genes: linking dynamic expression changes to function[Nucleic Acids Research. 2013]
*Nazarov PV, Reinsbach SE, Muller A, Nicot N, Philippidou D, Vallar L, Kreis S.**Nucleic Acids Research. 2013 Mar; 41(5)2817-2831*

- MedGenMedGenRelated information in MedGen
- PubMedPubMedPubMed citations for these articles
- SubstanceSubstancePubChem Substance links
- TaxonomyTaxonomyRelated taxonomy entry
- Taxonomy TreeTaxonomy Tree

- Identification of microRNAs with regulatory potential using a matched microRNA-m...Identification of microRNAs with regulatory potential using a matched microRNA-mRNA time-course dataNucleic Acids Research. May 2009; 37(8)e60

Your browsing activity is empty.

Activity recording is turned off.

See more...