Expression analysis of mitotic spindle checkpoint genes in breast carcinoma: role of NDC80/HEC1 in early breast tumorigenicity, and a two-gene signature for aneuploidy

Background Aneuploidy and chromosomal instability (CIN) are common abnormalities in human cancer. Alterations of the mitotic spindle checkpoint are likely to contribute to these phenotypes, but little is known about somatic alterations of mitotic spindle checkpoint genes in breast cancer. Methods To obtain further insight into the molecular mechanisms underlying aneuploidy in breast cancer, we used real-time quantitative RT-PCR to quantify the mRNA expression of 76 selected mitotic spindle checkpoint genes in a large panel of breast tumor samples. Results The expression of 49 (64.5%) of the 76 genes was significantly dysregulated in breast tumors compared to normal breast tissues: 40 genes were upregulated and 9 were downregulated. Most of these changes in gene expression during malignant transformation were observed in epithelial cells. Alterations of nine of these genes, and particularly NDC80, were also detected in benign breast tumors, indicating that they may be involved in pre-neoplastic processes. We also identified a two-gene expression signature (PLK1 + AURKA) which discriminated between DNA aneuploid and DNA diploid breast tumor samples. Interestingly, some DNA tetraploid tumor samples failed to cluster with DNA aneuploid breast tumors. Conclusion This study confirms the importance of previously characterized genes and identifies novel candidate genes that could be activated for aneuploidy to occur. Further functional analyses are required to clearly confirm the role of these new identified genes in the molecular mechanisms involved in breast cancer aneuploidy. The novel genes identified here, and/or the two-gene expression signature, might serve as diagnostic or prognostic markers and form the basis for novel therapeutic strategies.


Introduction
A very large proportion of cancers consist of cells with an abnormal chromosome content, a feature known as aneuploidy [1]. Aneuploidy is often associated with chromosomal instability (CIN), a condition in which cancer cells show a high rate of chromosomal gain and loss compared with normal cells.
The mechanisms underlying CIN, although poorly understood, are likely to include defects in the mitotic machinery used to segregate duplicated chromosomes between daughter cells [2]. Mounting evidence points to the mitotic spindle checkpoint as the point of failure in CIN. The normal function of the spindle checkpoint is to ensure that all chromosomes are correctly aligned in metaphase cells and properly attached to the mitotic spindle before chromosome separation can proceed. Like other phenotypes characteristic of cancer, it was first thought that nucleotide mutations in genes that control chromosome stability were responsible for CIN. However, somatic point mutations in mitotic-spindlecheckpoint genes, including MAD1, BUB1 and BUBR1/ BUB1B, are infrequent [3]. One possible explanation for this paradox is that mitotic-spindle-checkpoint genes are mainly altered at the transcriptional level. Indeed, amplification and overexpression of AURKA (which encodes aurora-A kinase) have been observed in breast tumors and other cancers exhibiting aneuploidy [4]. PLK1 and NEK2 mRNA and protein expression is also elevated in a wide variety of tumors and cancer cell lines [5,6]. However, despite the importance of the mitotic spindle checkpoint in CIN, no detailed analyses of mitotic spindle checkpoint gene expression in tumors has yet been performed.
The recent development of effective tools for large-scale analysis of gene expression is providing new insights into the involvement of gene networks and regulatory pathways in various tumor processes [7]. It has also led to the discovery of new diagnostic and prognostic indicators, and to the identification of new molecular targets for drug development [8]. These tools include cDNA microarrays, which can be used to explore the expression of thousands of genes at a time, and real-time RT-PCR assays for more accurate quantitative studies of the expression of a smaller number of selected candidate genes.
As aneuploidy is common in breast cancer and is associated with a poor prognosis [9], we examined the expression of selected mitotic spindle checkpoint genes in breast tumors. We used real-time quantitative RT-PCR to measure the mRNA expression of a large number of selected genes in DNA aneuploid breast tumor samples, in comparison with DNA diploid breast tumor samples. We assessed the expression level of 76 genes known to be involved in various molecular mechanisms associated with the mitotic spindle checkpoint (Table 1). We identified nine genes involved in early breast tumorigenesis, and also a two-gene expression signature (PLK1 + AURKA) associated with aneuploid status.

MRNA expression of 76 mitotic-spindle-checkpoint genes in invasive breast tumors relative to normal breast tissue
To select for further study those mitotic-spindle-checkpoint genes whose expression is dysregulated in breast tumors, we quantified the mRNA expression of the 76 selected genes in 10 invasive breast tumors relative to 5 normal breast tissues.
MRNA of all 76 genes was reliably quantifiable by means of real-time quantitative RT-PCR (Ct < 35) in both invasive breast tumors and normal breast tissues.
Forty (52.6%) of the 76 genes were significantly upregulated (P < 0.05) in the invasive breast tumors compared to the normal breast tissues ( Table 2). The expression of 20 of these 40 upregulated genes was markedly higher (> 3-fold) in the breast tumors. The most strongly upregulated gene was NEK2 (29-fold).
In contrast, only 9 (11.8%) of the 76 genes were significantly down-regulated (P < 0.05) in the invasive breast tumors compared to the normal breast tissues, and none showed markedly lower expression (> 3-fold) in the breast tumors.

Relationship between the mRNA expression of the 20 markedly upregulated genes and steps of breast tumor progression
To determine whether the 20 genes showing marked upregulation (> 3-fold) in the invasive breast tumors are altered at an early step of breast tumorigenicity, we analyzed their mRNA expression in 9 normal breast tissues, 14 benign breast tumors, 14 ductal carcinoma in situ (DCIS) of the breast, 11 invasive ductal grade I breast tumors and 12 invasive ductal grade III breast tumors ( Table 3).
With the exception of CCNB3, the expression of all 20 genes increased from benign breast tumors to DCIS.
Only TACC3, NEK2, AURKA and PLK1 expression increased from benign breast tumors to invasive ductal grade I breast tumors, while expression of all 20 genes (except CCNB3 and UBD) increased from grade I to ductal grade III breast tumors. Figure 1 shows the mRNA levels of three characteristic genes (NDC80, NEK2 and AURKB) in the different sample types. Figure 2 shows the order in which these genes are dysregulated during the different steps of breast tumor progression.
In the same set of 60 samples, we also examined the expression of the proliferation-associated gene MKI67, which encodes the proliferation-related antigen Ki-67. MKI67 only showed significant overexpression in ductal carcinoma in situ (DCIS) and invasive ductal grade III breast tumors (Table 3).
MRNA expression of the 20 markedly upregulated genes in breast cancer cell lines and in primary cultures of epithelial cells and fibroblasts from normal breast tissues and breast tumor cells To determine in which tumor cell type (epithelial cells or stromal cells) the mitotic-spindle-checkpoint genes were upregulated, we measured the RNA levels of the 20 markedly upregulated genes in 12 breast cancer cell Centrosome cohesion and duplication (n = 2) lines (five ERα-positive and seven ERα-negative cell lines). As compared to normal breast tissues, all 20 selected genes (except UBD) showed marked upregulation in the 12 breast cancer cell lines (median 3.9-to 87-fold), suggesting that these 19 genes are expressed in epithelial cells and upregulated in tumor epithelial cells (Table 4). Interestingly, the expression of these genes was generally higher in ERα-negative breast tumor cell lines than in ERα-positive lines. Despite the small number of cell lines analysed, seven genes (AURKB, TPX2, CDC20, BUB1, CCNA2, AURKA, and CCNB1) were upregulated significantly (p < 0.05) more strongly in the ERα-negative cell lines. These genes are probably not estrogenregulated, but are rather upregulated mainly in undifferentiated breast tumors (i.e. ERα-negative tumors), independently of ERα status. Individual expression levels of these genes in the 12 breast tumor cell lines are shown in Additional File 1.
As tumors are composed not only of tumor epithelial cells but also of fibroblasts (the main cell type of the stromal compartment), we also measured the expression of the same 20 genes in primary cultures of epithelial cells and fibroblasts from normal breast tissues and breast tumor cells. We confirmed that these genes were expressed in epithelial cells and, to a lesser extent, in stromal fibroblasts, and that they were all upregulated in tumor epithelial cells, as compared to normal epithelial cells (Table 4).

Relationship between the mRNA expression level and DNA amplification level of the 20 markedly upregulated genes
One of the 20 markedly upregulated genes (AURKA) has previously been shown to be upregulated by a DNA amplification mechanism [4]. Thus, to obtain further insight into the molecular mechanisms leading to overexpression of these 20 markedly upregulated genes, we used both real-time quantitative RT-PCR and high  resolution array CGH to quantify the mRNA expression and DNA amplication of these genes in a series of 39 breast tumors ( Table 5). Five of these genes (NEK2, PLK, BIRC5, TPX2 and AURKA) displayed DNA amplification (or polysomy) in more than 30% of breast tumors. Interesting, 3 of these 5 genes (BIRC5, TPX2 and AURKA) showed significantly higher mRNA levels in amplified tumors than in unamplified tumors. It is noteworthy that the other two genes (NEK2 and PLK), that showed similar mRNA levels in amplified and unamplified breast tumors, are located on chromosome arms (1q and 16p, respectively) showing polysomy and no DNA amplification in breast tumors [10,11].
MRNA expression of the 49 dysregulated genes in 23 individual DNA aneuploid breast tumors and 24 DNA diploid breast tumors The expression level of the 49 dysregulated genes identified in our screening study was then determined in a    series of 23 DNA aneuploid breast tumors and 24 DNA diploid breast tumors (Table 6). Twenty-four (49.0%) of the 49 dysregulated genes were significantly upregulated in the 23 DNA aneuploid breast tumors relative to the DNA diploid breast tumors, while only one gene (FZR1) among the 49 dysregulated genes was significantly down-regulated (P < 0.05; Table 7).
In the same set of 47 samples, we examined the expression of MKI67 and ESR1/ERa. As CIN of cancer cells could also be caused by telomere erosion [12], we examined the expression of the TERT gene encoding telomerase reverse transcriptase. MKI67 and TERT were significantly upregulated in the 23 DNA aneuploid breast tumors, while ESR1/ERa expression was similar in the diploid and aneuploid breast tumor subgroups (Table 7).
Prediction Analysis for Microarrays (PAM) and Class Prediction results obtained with the BRB Array Tools software packages were then used to identify a gene expression signature capable of discriminating between DNA aneuploid and DNA diploid breast tumors. Class Prediction identified a signature composed of 8 genes (PLK1, AURKA, CCNB1, BUB1, TACC3, CDC20, CDC2 and TPX2), while PAM identified a signature composed of only two genes (PLK1 and AURKA) that were also present in the Class Prediction signature.
Finally, hierarchical clustering of the 47 samples, based on PLK1 and AURKA expression, subdivided the patient population into three groups with significantly different ploidy (P = 0.0000015; figure

Validation of the two-gene expression signature in an independent series of breast tumor samples
To validate our two-gene expression signature for tumor ploidy, we analyzed six additional classical DNA aneuploid breast tumors (1.10 ≤ ploidy index ≤ 1.90). All six tumors fell into the DNA aneuploid group (n = 5) or  Recent studies suggest that abnormal division of tetraploid cells might facilitate genetic changes that give rise to aneuploid cancers and therefore that tetraploidy could be a transitional step between diploid status and classical aneuploid status [1]. Thus, we also analyzed 8 DNA tetraploid breast tumors (1.90 ≤ ploidy index ≤ 2.05) with our two-gene expression signature. All but one of these DNA tetraploid breast tumors fell into the DNA aneuploid group (n = 3) or the intermediate group (n = 4) (figure 5). It is noteworthy that the DNA tetraploid tumor (5081-T) included in the DNA diploid group had a low SPF value.
As the validation set includes a limited number of breast tumor samples, this two-gene expression signature capable of discriminating between DNA aneuploid and diploid breast tumors needs to be further validated in a large prospective randomized study.

Discussion
To obtain further insight into the molecular mechanisms leading to aneuploidy in breast cancer, we used real-time quantitative RT-PCR to quantify the mRNA expression of a large number of selected genes in various types of breast tumor.
Real-time quantitative RT-PCR is a promising alternative to cDNA microarrays for molecular tumor profiling. In particular, real-time RT-PCR is far more precise, reproducible and quantitative than cDNA microarrays. Real-time RT-PCR is also more useful for analyzing weakly expressed genes, such as TERT in the present study. Finally, real-time RT-PCR requires smaller amounts of total RNA (about 2 ng per target gene), and is therefore suitable for analyzing small (benign or malignant) and microdissected tumor samples.
We studied a number of genes involved in various molecular mechanisms associated with the mitotic spindle checkpoint, and particularly genes already known to be altered (mainly at the transcriptional level) in various cancers [13][14][15]. These genes mainly encode proteins involved in mitotic spindle formation, centrosome cohesion and duplication, kinetochore-mitotic spindle interactions, CDK-cyclin complexes, and sister chromatid separation (see list in Table 1). This analysis was by no means exhaustive, and many possibly relevant genes were certainly missed, but it nevertheless demonstrates the ability of real-time RT-PCR to identify potentially useful marker genes.
To investigate if these genes are involved early in breast tumorigenesis (i.e. the transition from normal breast tissue to benign breast tumors and DCIS) or in tumor progression (i.e. the transition from invasive ductal grade I to invasive ductal grade III breast tumors), we studied the expression level of the 20 markedly upregulated genes in large panel of breast tissues, including normal breast tissues, benign breast tumors, DCIS, and grade I and III invasive ductal breast tumors (Table 3 and Figure 2).  Like MKI67, which encodes the proliferation-related antigen Ki-67, the expression of most of these genes (except CCNB3 and UBD) increased during the transition from grade I to ductal grade III breast tumors. Twelve genes (NDC80, BUB1, CDC2, CCNA2, BUB1B, TACC3, TPX2, ZWINT, CCNB2, AURKB, NEK2 and BIRC5) showed marked upregulation in ductal grade III breast tumors (more than 10-fold higher than in normal breast tissues), as well as in the breast tumor cell lines (up to 70-fold higher than in normal breast tissues). Most of these genes were specifically altered in tumor epithelial cells during malignant transformation.

Normal breast tissue
These results are in total agreement with the literature showing a strong link between aneuploidy/CIN and tumor grade, i.e. between mitotic spindle checkpoint pathways and cell proliferation pathways. Indeed, several of the mitotic spindle checkpoint genes identified in this study (in particular TPX2, NEK2, AURKA and PLK1) have previously been included in a "proliferation signature" discriminating histological grades I and III [16], or in a "poor prognosis" signature [17,18].
These genes also showed marked upegulation in DCIS (higher than in ductal grade I breast tumors), confirming the major role of mitotic spindle checkpoint genes in pre-invasive lesions of the most common human cancers [19,20].
We identified a two-gene expression signature (PLK1 + AURKA) associated with aneuploidy. PLK1 and AURKA are well-known mitotic spindle checkpoint genes that encode mitotic kinases (polo-like kinase-1 and aurora A, respectively). These enzymes are emerging as critical regulators of centrosome cycling and formation of the bipolar mitotic spindle [23][24][25]. These two genes are overexpressed in many types of solid tumor. AURKA lies within a region of human chromosome arm 20q13 that is amplified in breast cancer [4], as confirmed here (Table 5). Further in vitro studies (cultured cells) and in vivo studies (animal models) will be required for full confirmation of the role of these two genes in the molecular mechanisms leading to breast cancer aneuploidy. Based on our two-gene expression signature, we subdivided the patient population (n = 47) into three groups with significantly different ploidy, namely a DNA diploid group (n = 17), a DNA aneuploid group (n = 19), and an intermediate group (n = 11) including both DNA aneuploid and DNA diploid tumors ( figure 3). Interestingly, the SPF values of all the DNA diploid tumors in the intermediate group were high, confirming the relationship between aneuploidy and proliferation. A large prospective randomized study will be necessary to confirm the existence of this intermediate group and to determine the diagnostic and prognostic relevance of these 3 subgroups.
It is also noteworthy that the expression of the TERT gene, encoding telomerase reverse transcriptase, was significantly upregulated in DNA aneuploid breast tumors compared to DNA diploid breast tumors, confirming that aneuploidy may also be caused by telomere erosion [12].
Based on this two-gene expression signature, some DNA tetraploid tumor samples failed to cluster in the DNA aneuploid breast tumor group, in keeping with the observation that aneuploidy can be preceded by tetraploidy [26].
In conclusion, this study confirms the strong relationship between aneuploidy and proliferation. Among a panel of 76 mitotic spindle checkpoint genes, we identified several genes of interest whose expression status might serve to guide individual breast cancer patient management. Some of the genes identified here are already used to predict tumor recurrence and the response to treatment, while AURKA and PLK1 are frequently included in "poor prognosis" signatures [17,18,27]. Multivariate analyses will be necessary to assess the potential of our 2-gene signature as comparated to existing gene-expression signatures such as Mammaprint ® and Oncotype DX ® , and a already identified gene expression signature of genomic instability to improve grading of breast tumors [28] or to predict the clinical outcome of breast cancer patients [29]. AURKA amplification induces resistance to taxol [30] and several aurora kinase inhibitors and polo-like kinase 1 inhibitors are in the preclinical development phase [6,[31][32][33]. Finally, the finding that NDC80/HEC1 is involved early in breast carcinogenesis suggests that it too may have clinical relevance.

Patients and Samples
To characterize gene expression signatures associated with breast tumor ploidy, we analyzed samples of 47 primary breast tumors (23 DNA aneuploid and 24 DNA diploid tumors) excised from women at our institution. Samples containing more than 70% of tumor cells were considered suitable for this study. Tumor cellularity was assessed on hematoxylin and eosin-stained tissue sections. Immediately after surgery the tumor samples were The samples were placed in liquid nitrogen until RNA extraction.
The patients met the following criteria: primary unilateral non metastatic breast carcinoma; complete clinical, histological and biological information available; no radiotherapy or chemotherapy before surgery; and full follow-up at our institution.
Standard prognostic factors are shown in Table 6. The median follow-up was 7,8 years (range 26 months to 11.25 years).
The patients had physical examinations and routine chest radiography every 3 months for 2 years, then annually. Mammograms were done annually.
To validate and explore our gene expression signature associated with tumor ploidy, we analyzed 14 additional DNA aneuploid breast tumors, comprising 6 classical aneuploid and 8 DNA tetraploid breast tumor.
To investigate the relationship between the mRNA levels of candidate genes and breast cancer progression, we also analyzed the RNA of 14 benign breast tumors, 14 ductal carcinoma in situ (DCIS) of the breast, 11 invasive ductal grade I breast tumors, and 12 invasive ductal grade III breast tumors. Standard prognostic factors for the 11 invasive ductal grade I breast tumors and 12 invasive ductal grade III breast tumors are indicated in Additional File 2, along with standard prognostic factors for the 10 invasive breast tumors used for initial screening of the dysregulated genes.
Patients' consent and approval from the Local Ethical Committee (Breast Group of René Huguenin Hospital) was obtained prior to the use of these clinical materials for research purposes in agreement to the Declaration of Helsinki. The biological collection has been recorded at the French Ministry of Research (N°DC-2008-355).    Nine specimens of adjacent normal breast tissue from breast cancer patients or normal breast tissue from women undergoing cosmetic breast surgery were used as sources of normal RNA.
Primary cell culture and differential isolation of epithelial cells and fibroblasts from normal breast tissues and breast tumor cells To determine which cells (epithelial cells and/or fibroblasts) overexpressed mitotic-spindle-checkpoint genes, we measured the RNA levels of the selected genes in primary cultures of epithelial cells and fibroblasts from normal breast tissues and breast tumor cells.
Breast tumors and normal tissues were minced with a scalpel and incubated overnight with Liberase Blendzyme 2 (Roche Applied Science, Meylan, France) for enzymatic dispersion. Organoids and aggregated cells (epithelial fraction) and isolated cells (fibroblast fraction) were separated by filtration and centrifugation. The fibroblast fraction was cultured in Ham's F10 medium containing L-glutamine (3 mM), insulin (5 mg/mL), T3 (1 nM), hydrocortisone (1 mg/mL), kanamycin (0.1 mg/ mL), and 10% fetal calf serum. The epithelial fraction was cultured in the same conditions, plus epidermal growth factor (5 ng/mL), transferrin (5 mg/mL) and 5% human serum (instead of fetal calf serum). Cells were incubated in humidified air with 5% CO2 at 37°C, and the medium was changed three times a week. Cells were cultured for two weeks before RNA extraction. Epithelial cells and fibroblasts were identified by their morphological features and by detecting epithelial (keratin 19) and fibroblast marker expression with real-time RT-PCR.

Flow cytometric DNA analysis and S-phase fraction (SPF) classification
Cell preparation and DNA staining were performed as previously described [34]. Flow cytometry (FCM) was performed on a FACScalibur device (Becton Dickinson, CA, USA). Cell cycle analysis was performed with the Modfit LT 2.0 program (Verity Software House, Topsham, ME). The DNA-diploid peak was located on DNA histograms by using an external standardization procedure with normal human lymphocytes positioned in the fifth part of the red fluorescence scale. DNA ploidy and the S-phase fraction (SPF) were obtained after gating on a dot plot (FL2-width versus FL2-area), selecting a representative amount of debris and excluding doublets.
The DNA ploidy pattern was expressed as the DNA index (DI) that is the ratio between the mean fluorescence channel number of the tumor G0/G1 peak and the diploid G0/G1 reference peak. Rules established during a previous inter-laboratory control procedure [35] were applied when using the cell-cycle software models. The tumors were classified as follows based on the DNA index. A tumor showing a single peak with a DNA index comprised between 0.95 and 1.1 was classified as DNA diploid; if an additional peak was present, the tumor was placed in one of the following DNA aneuploid subcategories, if they contain at least 10% of total cell counts and a corresponding G2M peak: DNA aneuploid with a DI comprised between 1.10 and 1.90 and > 2.05; DNA tetraploid with a DI comprised between 1.90 and 2.05. There were no hypodiploid (DI < 0.95) or multiploid (several aneuploid peaks) tumors in this series. The ploidy-adjusted SPF was categorized as low, intermediate or high, based on the 33rd and 66th percentiles. The debris and aggregate subtraction options were used when appropriate.

Real-time RT-PCR (1) RNA extraction
Total RNA was extracted from breast specimens by using the acid-phenol guanidium method. The quantity of the RNA samples was accurately measured by using a NanoDrop spectrophotometer, and their quality was determined by electrophoresis through agarose gel staining with ethidium bromide, and visualization of the 18S and 28S RNA bands under ultraviolet light." (2) Theoretical basis Real-time PCR reactions are characterized by the point during cycling when amplification of the PCR product is first detected, rather than the amount of PCR product accumulated after a fixed number of cycles. The parameter Ct (threshold cycle) is defined as the fractional cycle number at which the fluorescence generated by cleavage of a TaqMan probe (or by SYBR green dyeamplicon complex formation) passes a fixed threshold above baseline. The increase in the fluorescence signal associated with exponential growth of PCR products is detected by the laser detector of the ABI Prism 7700 Sequence Detection System (Perkin-Elmer Applied Biosystems, Foster City, CA), using PE Biosystems analysis software, according to the manufacturer's manuals.
The precise amount of total RNA added to each reaction mix (based on optical density) and its quality (i.e. lack of extensive degradation) are both difficult to assess. We therefore also quantified transcripts of two endogenous RNA control genes involved in two cellular metabolic pathways, namely TBP (Genbank accession NM_003194), which encodes the TATA box-binding protein, and RPLP0 (NM_001002), which encodes human acidic ribosomal phosphoprotein P0. Each sample was normalized on the basis of its TBP (or RPLPO) content.
Results, expressed as N-fold differences in target gene expression relative to the TBP (or RPLPO) gene, and termed "Ntarget", were determined as Ntarget = 2 ΔCtsample , where the ΔCt value of the sample is determined by subtracting the average Ct value of the target gene from the average Ct value of the TBP (or RPLP0) gene [36,37].
The Ntarget values of the samples were subsequently normalized such that the median of the nine normal breast tissue Ntarget values was 1.

(3) Primers and controls
Primers for TBP, RPLP0 and the 76 target genes were chosen with the assistance of the Oligo 5.0 computer program (National Biosciences, Plymouth, MN).
To avoid amplification of contaminating genomic DNA, one of the two primers was placed at the junction between two exons. In general, amplicons were between 70 and 120 nucleotides long. Gel electrophoresis was used to verify the specificity of PCR amplicons.
The 76 target genes tested in this study are listed in Table 1. They were selected from the literature for their potential involvement in molecular mechanisms associated with the mitotic spindle checkpoint.
cDNA synthesis and PCR conditions were as described elsewhere [37]. Experiments were performed with duplicates for each data point. All patient samples with a CV of Ct values higher than 10% were retested.

High-resolution array CGH (comparative genomic hybridization)
Tumor samples were analyzed with the Agilent Human Genome CGH Microarray 44K. DNA samples for array CGH were labeled as previously described [38]. Briefly, 1 μg each of breast tumor DNA and commercial pooled human normal genomic DNAs (Promega, Madison, WI) was digested with 5 μg of AluI (50 units) and 5 ml of RsaI (50 units) (Promega, Madison, WI) and labeled by random priming with CY3-and CY5-dUTP, respectively (Agilent Technologies, Massy, France). The labeled solutions were then filtered on a Microcon YLM-30 column (Millipore, Billerica, MA), denatured and hybridized with unlabeled Cot-1 DNA (Invitrogen, Carlsbad, CA) to the CGH arrays. After hybridization in an oven rotating at 15 rpm (Model1012, Sheldon Manufacturing, Cornelius, OR), the slides were washed and scanned with the Agilent G2565AA Microarray Scanner.

Statistical Analysis
As the mRNA levels did not fit a Gaussian distribution, (a) the mRNA levels in each subgroup of samples were expressed as the median and range rather than the mean and coefficient of variation, and (b) relationships between the molecular markers and clinical and histological parameters were analyzed with the chi-square test (link between two qualitative parameters) or the non parametric Mann-Whitney U test (link between one qualitative parameter and one quantitative parameter) [39]. Differences between two populations were considered significant at confidence levels greater than 95% (p < 0.05).
To visualize the capacity of a given molecular marker to discriminate between two populations (in the absence of an arbitrary cutoff value), we summarized the data in a ROC (receiver operating characteristics) curve [40]. ROC curves plot sensitivity (true positives) on the Y axis against 1-specificity (false positives) on the X axis, considering each value as a possible cutoff. The AUC (area under curve) was calculated as a single measure of the discriminatory capacity of each molecular marker. When a molecular marker has no discriminatory value, the ROC curve lies close to the diagonal and the AUC is close to 0.5. In contrast, when a molecular marker has strong discriminatory value, the ROC curve moves to the upper left-hand corner and the AUC is close to 1.0.
A gene expression signature associated with tumor ploidy was sought with the BRB Array Tools program, using the Prediction Analysis for Microarrays (PAM) and Class Prediction results modules.

Additional material
Additional file 1: mRNA levels of the 20 marked upregulated genes in ERa-negative and ERa-positive breast cancer cell lines.
Additional file 2: Characteristics of the 33 breast tumors (10 for pre-screnning, 11 invasive grade I and 12 invasive grade III).