Identification of a three-gene expression signature of poor-prognosis breast carcinoma

Background The clinical course of breast cancer is difficult to predict on the basis of established clinical and pathological prognostic criteria. Given the genetic complexity of breast carcinomas, it is not surprising that correlations with individual genetic abnormalities have also been disappointing. The use of gene expression profiles could result in more accurate and objective prognostication. Results To this end, we used real-time quantitative RT-PCR assays to quantify the mRNA expression of a large panel (n = 47) of genes previously identified as candidate prognostic molecular markers in a series of 100 ERα-positive breast tumor samples from patients with known long-term follow-up. We identified a three-gene expression signature (BRCA2, DNMT3B and CCNE1) as an independent prognostic marker (P = 0.007 by univariate analysis; P = 0.006 by multivariate analysis). This "poor prognosis" signature was then tested on an independent panel of ERα-positive breast tumors from a well-defined cohort of 104 postmenopausal breast cancer patients treated with primary surgery followed by adjuvant tamoxifen alone: although this "poor prognosis" signature was associated with shorter relapse-free survival in univariate analysis (P = 0.029), it did not persist as an independent prognostic factor in multivariate analysis (P = 0.27). Conclusion Our results confirm the value of gene expression signatures in predicting the outcome of breast cancer.

Breast cancer initiation and progression is a process involving multiple molecular alterations, many of which are reflected by changes in gene expression in malignant cells. Many clinical studies have attempted to identify correlations between altered expression of individual genes and breast cancer outcome, but often with contradictory results. Examples of such genes include ERBB2, CCDN1, MYC, UPA and PAI1 [1][2][3]. It is thus likely that these genes have limited predictive power when considered in isolation, but that their clinical relevance may be increased when several genes are considered together.
The recent development of effective tools for monitoring gene expression on a large scale is providing new insights into the involvement of gene networks and regulatory pathways in various tumor processes [4]. It has also led to the discovery of new diagnostic and prognostic indicators, and to the identification of new molecular targets for drug development [5]. These tools include cDNA microarrays, which can be used to explore the expression of thousands of genes at a time, and real-time RT-PCR assays for more accurate and quantitative studies of the expression of a smaller number of selected candidate genes.
In this study, we used real-time quantitative RT-PCR assays to quantify the mRNA expression of 47 candidate prognostic molecular markers in a series of 100 ERα-positive breast tumor samples. We identified a three-gene expression signature (BRCA2, DNMT3B and CCNE1) associated with poor clinical outcome. We then tested this "poor prognosis" signature on an independent panel of ERα-positive breast tumor samples from a well-defined cohort of 104 postmenopausal breast cancer patients treated with primary surgery followed by adjuvant tamoxifen alone with known long-term follow-up.

Patients and samples
We analyzed samples from two series of women with primary unilateral ERα-positive breast carcinoma. ERα-positive status was determined at both the protein level by the Dextran-coated charcoal method until 1988 and enzymatic immuno-assay thereafter, and at the mRNA level by real-time quantitative RT-PCR assay [6].
The first series consisted of 100 women whose breast tumors were excised at Centre René Huguenin from 1977 to 1987. The patients (mean age 58.1 years, range 34-91) were pre-or post-menopausal (37 and 63 patients, respectively). Sixty patients received adjuvant therapy, consisting of chemotherapy alone in 14 cases, hormone therapy alone in 15 cases, and both treatments in 31 cases. The standard prognostic factors are presented in Table 1. The median follow-up was 9.3 years (range 1.4-16.2 years). Thirty-seven patients relapsed within 10 years after sur-gery. The first relapse events consisted of local and/or regional recurrences in 11 patients, metastases in 22 patients, and both events in four patients.
The second series consisted of 104 post-menopausal women whose breast tumors were excised at Centre René Huguenin from 1980 to 1994. The patients (mean age 70.9 years, range 54-86) all received post-operative adjuvant hormone therapy consisting of tamoxifen (20 mg daily for 3-5 years) and no other treatment. The standard prognostic factors are reported in Table 2. The median follow-up was 5.9 years (range 1.4-18.1 years). Thirty-one patients relapsed within 10 years after surgery. The first relapse events consisted of local and/or regional recurrences in five patients, metastases in 24 patients, and both events in two patients.
Complete clinical, histological and biological information was available for the two series of breast cancer patients; no radiotherapy or chemotherapy was given before surgery, and full follow-up took place at Centre René Huguenin. The histological type of the tumor and the number of positive axillary nodes were established at the time of surgery. The malignancy of infiltrating carcinomas was scored according to Scarff Bloom and Richardson's (SBR) histoprognostic system.
Both series of tumor samples were placed in liquid nitrogen until total RNA extraction immediately following surgery.

Real-time RT-PCR (1) Theoretical basis
Quantitative values are obtained from the cycle number (Ct value) at which the increase in fluorescent signal associated with an exponential growth of PCR products starts to be detected by the laser detector of the ABI Prism 7700 Sequence Detection System (Perkin-Elmer Applied Biosystems, Foster City, CA) using the PE Biosystems analysis software according to the manufacturer's manuals.
The precise amount of total RNA added to each reaction (based on optical density) and its quality (i.e. lack of extensive degradation) are both difficult to assess. We therefore also quantified transcripts of the gene TBP (Genbank accession NM_003194) encoding for the TATA boxbinding protein (a component of the DNA-binding protein complex TFIID) as an endogeneous RNA control, and normalized each sample on the basis of its TBP content.
Results, expressed as N-fold differences in target gene expression relative to the TBP gene, termed "Ntarget", were determined by the formula: Ntarget = 2 ∆Ct sample , where ∆Ct value of the sample was determined by subtracting the average Ct value of the target gene from the average Ct value of the TBP gene.
The Ntarget values of the samples were subsequently normalized such that the Ntarget value of the tumor sample which contained the smallest amount of target gene mRNA in each tumor series would equal a value of 1.
(2) Primers and probes Primers and probes for TBP and the 47 target genes were chosen with the assistance of the computer programs Oligo 5.0 (National Biosciences, Plymouth, MN). We conducted searches in dbEST, htgs and nr databases to confirm the total gene specificity of the nucleotide sequences chosen for the primers and probes, and the absence of single nucleotide polymorphisms. In particular, the primer pairs were selected to be unique when compared with the sequences of the closely related family member genes or of corresponding retropseudogenes. To avoid amplification of contaminating genomic DNA, one of the two primers or the probe was placed at the junction between two exons. Agarose gel electrophoresis allowed us to verify the specificity of PCR amplicons. The list of the 47 target genes tested in this study is indicated in Table 3.
(3) RNA extraction Total RNA was extracted from frozen tumor samples by using the acid-phenol guanidinium method. The quality of the RNA samples was determined by electrophoresis through agarose gels and staining with ethidium bromide, and the 18S and 28S RNA bands were visualized under ultraviolet light. (5) PCR amplification All PCR reactions were performed using a ABI Prism 7700 Sequence Detection System (Perkin-Elmer Applied Biosystems). PCR was performed using either the TaqMan ® PCR Core Reagents kit or the SYBR ® Green PCR Core Reagents kit (Perkin-Elmer Applied Biosystems). The thermal cycling conditions comprised an initial denaturation step at 95°C for 10 min and 50 cycles at 95°C for 15 s and 65°C for 1 min.

Statistical Analysis
The distributions of the gene mRNA levels were characterized by their median values and ranges. Relationships between mRNA levels of the different target genes and    were judged significant at confidence levels greater than 95% (p < 0.05).
To visualize the efficacy of a molecular marker to discriminate two populations (in the absence of an arbitrary cutoff value), we summarized the data in a ROC (receiver operating characteristic) curve [7]. This curve plots the sensibility (true positives) on the Y axis against 1 -the specificity (false positives) on the X axis, considering each value as a possible cutoff value. The AUC (area under curves) was calculated as a single measure for the discriminate efficacy of a molecular marker. When a molecular marker has no discriminative value, the ROC curve will lie close to the diagonal and the AUC is close to 0.5. When a test has strong discriminative value, the ROC curve will move up to the upper left-hand corner (or to the lower right-hand corner) and the AUC will be close to 1.0 (or 0).
Hierarchical clustering was performed using the GenA-NOVA software [8].
Relapse-free survival (RFS) was determined as the interval between diagnosis and detection of the first relapse (local and/or regional recurrences, and/or metastases).
Survival distributions were estimated by the Kaplan-Meier method [9], and the significance of differences between survival rates was ascertained using the log-rank test [10]. Cox's proportional hazards regression model [11] was used to assess prognostic significance.

mRNA expression of 47 genes in 100 ERα-positive breast tumors
The results for the 47 genes are summarized in table 4, with medians and ranges of mRNA levels in patients who relapsed (n = 37) and those who did not (n = 63).
Seven genes showed significantly different expression according to relapse status (P < 0.05), namely BRCA2, DNMT3B, CCNE1, HMMR/RHAMM, MKI67, TERT and CCND1. The prognostic performance of these 7 genes was also assessed using ROC-AUC analysis. BRCA2 emerged as the most discriminatory marker of relapse status (ROC-AUC, 0.696). The mRNA expression of this gene, as well as DNMT3B, CCNE1, HMMR/RHAMM, MKI67 and TERT, was higher in patients who relapsed than in patients who did not relapse, while only CCND1 mRNA expression was lower in patients who relapsed.
The prognostic value of a two-gene expression signature based on only BRCA2 and DNMT3B was lower than that of the three-gene expression signature. The addition of HMMR/RHAMM and/or MKI67 to the three-gene signature provided no additional prognostic value.
Using a Cox proportional hazards model, we also assessed the prognostic value, for RFS, of parameters that were significant or near-significant (P < 0.2) in univariate analysis, i.e. SBR grade, lymph-node status (Table 1) and the threegene expression signature ( Figure 1A). Only the prognos-  . The prognostic significance of these three parameters for RFS, calculated in terms of the relative risk, did not change after adjustment for age and macroscopic tumor size (data not shown).

Validation of the three-gene expression signature in an independent series of 104 ERα-positive postmenopausal breast tumor samples
The results for each of the three genes are summarized in table 5, with medians and ranges of mRNA levels in the 31 patients who relapsed and the 73 patients who did not relapse, as well as ROC-AUC values. As in the initial tumor series, BRCA2, DNMT3B and CCNE1 mRNA levels were significantly higher in patients who relapsed than in those who did not relapse.
On hierarchical clustering of the samples, the three-gene expression signature dichotomized the 104 patients into two subgroups (n = 30 and n = 74, respectively) of similar sizes to those of the initial patient population (n = 35 and n = 65, respectively).
Multivariate analysis based on a Cox proportional hazards model showed that, among the parameters that were significant or near-significant (P < 0.2) in univariate analysis, i.e. SBR grade, lymph-node status, macroscopic tumor size (Table 2) and the three-gene expression signature ( Figure  1B), only SBR grade was an independent predictor of RFS (P = 0.00023); the three-gene expression signature only showed a trend towards significance (P = 0.27).

Discussion
We used real-time quantitative RT-PCR assays to quantify the mRNA expression of 47 genes previously identified as candidate prognostic molecular markers in 100 ERα-positive breast tumor samples. We identified a three-gene expression signature (BRCA2, DNMT3B and CCNE1) with independent prognostic significance in breast cancer (P = 0.007 by univariate analysis; P = 0.006 by multivariate analysis). This "poor prognosis" signature was then tested on an independent set of 104 ERα-positive breast tumors from a well-defined cohort of postmenopausal breast cancer patients treated with primary surgery followed by adjuvant tamoxifen alone. It was found to be significant in univariate analysis (P = 0.029), but not in multivariate analysis (P = 0.27). We have previously published individual data for 18 of these 47 genes, namely ERBB1-4 [12]; MYC [13]; TERT [14]; CCND1 [15]; CGB, CGA, ERα, ERβ, PR, PS2 [16]; AR [17]; DNMT3B [18], PAI1, PAI2 and UPA [19], obtained using the same real-time RT-PCR method but in a heterogeneous series of 130 ERα-positive and ERα-negative breast tumors.
Large-scale real-time quantitative RT-PCR is a promising complement and/or alternative to cDNA microarrays for molecular tumor profiling. CDNA microarrays have been used to identify gene expression profiles associated with poor outcome in breast cancer [20][21][22][23][24][25][26], but discrepancies have been reported. For example, only 2 of 456 genes identified by Sorlie et al. [21] was among the 70 genes identified by van de Vijver et al. [24].
These discrepancies may be due to the clinical, histological and ethnic heterogeneity of breast cancer, but also to the fact that breast tumors consist of many different cell types -not just tumoral epithelial cells, but also additional epithelial cell types, stromal cells, endothelial cells, adipose cells, and infiltrating lymphocytes. Real-time RT-PCR requires smaller starting amounts of total RNA (about 1-2 ng per target gene) than do cDNA microarrays, making it more suitable for analyzing small tumor samples, cytopuncture specimens and microdissected samples. Real-time RT-PCR also has a linear dynamic range of at least four orders of magnitude, meaning that samples do not need to contain equal starting amounts of RNA. Real-time RT-PCR is also more suitable than cDNA microarrays for analyzing weak variations in gene expression and weakly expressed genes (e.g. TERT as in the present study), and for distinguishing among closely related family member genes or alternatively spliced specific transcripts (e.g. the gene cluster p14/ARF, p16/CDKN2A and p15/CDKN2B as in the present study). Finally, real-time quantitative RT-PCR assay is a reference in terms of its performance, accuracy, sensitivity and throughput for nucleic acid quantification, and is more appropriate for routine use in clinical laboratories, being simple, rapid and yielding good inter-laboratory agreement and statistical confidence values.
In this study, we chose to include well known genes involved in breast carcinogenesis reported in the literature and representing a broad range of cellular functions, such as cell cycle control, cell-cell interactions, signal transduction pathways, apoptosis and angiogenesis (Table 3). Many important genes were not studied, but our results nevertheless demonstrate the usefulness of real time RT-PCR by identifying a potentially useful gene expression signature with prognostic significance.
The comparison of median target gene mRNA levels between patients who did and did not relapse provided two interesting results: (a) ERBB2 mRNA levels were very similar between the two subgroups, with ROC-AUC values close to 0.5 (ROC-AUC, 0.573), confirming that the ERBB2 mRNA expression level is not a major prognostic factor in breast cancer; (b) ESR1/ERα mRNA levels were not different between the two subgroups (ROC-AUC, 0.530), suggesting that the ESR1/ERα mRNA expression level in ERα-positive tumors is not predictive of outcome.
The three-gene expression signature predictive of subsequent relapse status comprised genes involved in cell cycle control (CCNE1), DNA methylation (DNMT3B) and DNA damage repair (BRCA2). This gene expression signature is an interesting candidate for routine clinical use, especially as the three genes encode well-characterized proteins for which specific antibodies are already commercially available. Furthermore, the three protein products are amenable to pharmacological control.
CCNE1 codes for cyclin E, a protein involved in regulating the early G1 to late G1 phase "restriction point traversal", an irreversible commitment to undergo one cell division [27]. We found that high CCNE1 mRNA levels were associated with poor outcome, confirming published data suggesting that cyclin E upregulation may be a major prognostic marker in breast cancer [28][29][30][31].
BRCA2 codes for a ubiquitously expressed tumor suppressor protein involved in processes fundamental to all cells, including DNA repair, DNA recombination and cell cycle checkpoint control [32]. We found that high BRCA2 mRNA levels were associated with poor outcome and correlated positively and strongly with cell proliferation. By hierarchical clustering analysis of the 47 genes, we identified BRCA2 as the leading gene in a cluster of proliferation genes also including TERT, BRCA1, HMMR/RHAMM and MKI67 (data not shown). We also observed a strong positive link between BRCA2 and MKI67, which encodes the proliferation-related Ki-67 antigen (Spearman rank correlation test: r=+0.670, P < 10 -7 ). The observed strong associations between BRCA2, HMMR/RHAMM and MKI67 mRNA expression explain why four-and five-gene expression signatures, comprising HMMR/RHAMM alone or together with MKI67, showed no additional prognostic value relative to the three-gene signature.
Our results for BRCA2 expression ex vivo are in keeping with reports from several authors [33,34] showing that BRCA2 mRNA expression is upregulated in rapidly proliferating cells in vitro. Our results are also in agreement with those of Egawa et al. [35] showing that high BRCA2 expression carries a poor prognosis in breast cancer. This link between BRCA2 overexpression and poor outcome should be taken into account when evaluating future BRCA2-based therapeutic approaches to breast cancer.
Finally, DNMT3B, the third gene in our expression signature, codes for one of the three functional DNA methyltransferases (DNMT1, DNMT3A and DNMT3B) that catalyze the transfer of methyl groups to the 5-position of cytosine (DNA methylation). We previously showed that, among the three DNA methyltransferases (DNMT1, DNMT3A and DNMT3B), only DNMT3B overexpression is associated with poor outcome in breast cancer [18]. DNMT3B (like DNMT3A) is known to be a de novo methylator of CpG sites. Abnormal DNA methylation is thought to be a major early event in the development of tumors characterized by widespread genome hypomethylation leading to chromosome instability and localized DNA hypermethylation; the latter may be important in tumorigenesis by silencing tumor suppressor genes [36].

Conclusions
In conclusion, by studying the expression of 47 genes previously identified as candidate prognostic markers in breast cancer, we identified a three-gene expression signature (BRCA2, DNMT3B and CCNE1) with prognostic sig-nificance. The practical value of this signature remains to be validated in large prospective randomized studies.

Authors' contributions
Real-time RT-PCR have been carried out by ST and IG. IB and RL interpreted the result, performed bioinformatics and statistical analyses.