Integrative genomics analysis of chromosome 5p gain in cervical cancer reveals target over-expressed genes, including Drosha

Background Copy number gains and amplifications are characteristic feature of cervical cancer (CC) genomes for which the underlying mechanisms are unclear. These changes may possess oncogenic properties by deregulating tumor-related genes. Gain of short arm of chromosome 5 (5p) is the most frequent karyotypic change in CC. Methods To examine the role of 5p gain, we performed a combination of single nucleotide polymorphism (SNP) array, fluorescence in situ hybridization (FISH), and gene expression analyses on invasive cancer and in various stages of CC progression. Results The SNP and FISH analyses revealed copy number increase (CNI) of 5p in 63% of invasive CC, which arises at later stages of precancerous lesions in CC development. We integrated chromosome 5 genomic copy number and gene expression data to identify key target over expressed genes as a consequence of 5p gain. One of the candidates identified was Drosha (RNASEN), a gene that is required in the first step of microRNA (miRNA) processing in the nucleus. Other 5p genes identified as targets of CNI play a role in DNA repair and cell cycle regulation (BASP1, TARS, PAIP1, BRD9, RAD1, SKP2, and POLS), signal transduction (OSMR), and mitochondrial oxidative phosphorylation (NNT, SDHA, and NDUFS6), suggesting that disruption of pathways involving these genes may contribute to CC progression. Conclusion Taken together, we demonstrate the power of integrating genomics data with expression data in deciphering tumor-related targets of CNI. Identification of 5p gene targets in CC denotes an important step towards biomarker development and forms a framework for testing as molecular therapeutic targets.


Background
The short arm of chromosome 5 (5p) frequently undergoes nonrandom changes in cervical cancer (CC) by exhibiting both copy number increase and deletions. Gain of 5p due to frequent appearance of isochromosome 5p in squamous cell carcinoma has been documented by karyotypic and chromosomal comparative genomic hybridization analyses [1][2][3][4]. Paradoxically, 5p also exhibits frequent loss of heterozygosity, which occurs early in the development of CC [5,6]. These findings suggest the presence of important proliferation-regulating genes on chromosome 5p involved in malignant progression of cervical epithelium.
Despite the successful use of pap-smear screening programs in early detection and treatment of CC, this tumor remains a major cause of cancer deaths in women worldwide [7]. CC progresses by distinct morphological changes from normal epithelium to carcinoma through low-grade squamous intraepithelial lesions (LSIL) and high-grade SILs (HSIL). Currently, no biological or genetic markers are available to predict which precancerous lesions progress to invasive CC. Although infection of high-risk human papillomavirus (HPV) is recognized as an essential initiating event in cervical tumorigenesis, this alone is not sufficient for the progression to invasive cancer [8]. In spite of the recent progress in molecular aspects of CC, the genetic basis of progression of precursor SILs to invasive cancer in the multi-step progression of CC remains poorly understood [9]. Therefore identification of other "genetic hits" in CC is important in understanding its biology.
Chromosomal gain and amplification is a common cellular mechanism of gene activation in tumorigenesis [10]. The aim of the present study was to examine the contribution of chromosome 5 copy number alterations (CNA) in CC tumorigenesis and identify copy number driven gene expression changes. We performed single nucleotide polymorphism (SNP) array and fluorescence in situ hybridization (FISH) analysis on invasive cancer and identified 5p CNI in a high frequency of primary tumors and cell lines. To unravel the consequence of 5p CNI on transcription, we utilized Affymetrix U133A gene expression array and identified a number of over expressed genes on 5p, which include RNASEN, POLS, OSMR, and RAD1 genes. These data, thus, suggest that transcriptional activation of multiple genes on 5p plays a role as driver genes in the progression of CC.

Tumor specimens and cervical cancer cell lines
A total of 219 specimens were utilized in the present study in various investigations. These include 9 cell lines, 148 primary tumors, 42 pap smears, and 20 normal cervical tissues. The cell lines (HT-3, ME-180, CaSki, MS751, C-4I,  C-33A, SW756, HeLa, and SiHa) were obtained from  American Type Culture Collection (ATCC, Manassas, VA) and grown in tissue culture as per the supplier's specifications. Twenty age-matched normal cervical tissues from hysterectomy specimens obtained from Columbia University Medical Center (CUMC), New York, were used as controls after enrichment for epithelial cells by microdissection. Cytologic specimens were collected using the ThinPrep Test Kit (Cytc Corporation, Marlborough, MA). After visualization of the cervical os the ectocervix was sampled with a spatula and endocervical cells obtained with a brush rotated three hundred sixty degrees. Exfoliated cells were immediately placed in PreservCyt Solution (Cytc Corporation, Marlborough, MA) for routine processing by a cytopathologist. Pap smears were collected from normal and precancerous lesions by simultaneous preparation of slides from the same spatula for both cytology and FISH. FISH slides were immediately fixed in 3:1 methanol and acetic acid, and stored at 4°C until hybridization. A total of 42 pap smears with the diagnosis rendered by a cytopathologist as normal/squamous metaplasia/ASCUS (N = 10), LSIL (N = 13) or HSIL (N = 19) obtained from CUMC were used for FISH analysis. The diagnosis of all HSILs was also confirmed by a biopsy. Of the 148 primary tumors, 93 were obtained as frozen tissues and 55 specimens as formalin-fixed paraffin-embedded tissues. All primary invasive cancer specimens were obtained from patients evaluated at CUMC, Instituto Nacional de Cancerologia (Bogota, Colombia) [11], and the Department of Gynecology of Campus Benjamin Franklin, Charité-Universitätsmedizin Berlin (Germany) with appropriate informed consent and approval of protocols by institutional review boards. All primary tumors were diagnosed as squamous cell carcinoma (SCC) except five that were diagnosed as adenocarcinoma (AC). Clinical information such as age, stage and size of the tumor, follow-up data after initial diagnosis and treatment was collected from the review of institutional medical records. Tissues were frozen at -80°C immediately after resection and were embedded with tissue freeze medium (OTC) before microdissection. All primary tumor specimens were determined to contain at least 60% tumor by examination of hematoxylin and eosin (H&E) staining of adjacent sections. High molecular weight DNA and total RNA from tumor, normal tissues, and cell lines was isolated by standard methods. The integrity of all RNA preparations was tested by running formaldehyde gels and samples that showed evidence of degradation were excluded from the study.

Microarray analysis
The Affymetrix 250 K NspI SNP chip was utilized for copy number analysis as per the manufacturer's protocol. Briefly, 250 ng of genomic DNA was digested with NspI, generic linkers were added followed by PCR amplification, end-labeling, and fragmentation following standard protocols. Hybridization, washing, acquisition of raw data using GeneChip Operating Software (GCOS), and generation of .CEL files was performed by the Affymetrix Core facility at our institute. We utilized 79 CC cases (9 cell lines and 70 primary tumors enriched for tumor cells by microdissection) and 7 microdissected normal cervical squamous epithelial samples as controls to serve as reference for copy number analysis. SNP data of test samples and normal cervical epithelial specimens were loaded to dChip to calculate signal intensity values using the perfect match/mismatch (PM/MM) difference model followed by normalization of signals within chip and between chips using model-based expression [12,13]. DNA copy number gains were obtained as determined by dChip using analysis of signal intensity values based on the Hidden Markov Model. Arrays with > 93% call rates were included in the analysis as per Affymetrix manual. Copy number data was obtained for chromosome 5 using Cyto-Band information files from the dChip website [14]. Both the raw copy number and log 2 ratio (Signal/mean signal of normal samples at each SNP) were computed to estimate copy number changes in chromosome view. Copy numbers < 1.5 were considered as deletion, 2.5 or more as gain in the raw copy number view. All the original data files were submitted to Gene Expression Omnibus (GEO Accession number: GSE10092).
We utilized Affymetrix U133A oligonucleotide microarray (Santa Clara, CA) containing 14,500 probe sets for gene expression analysis. RNA isolated from 30 CC cases (21 primary tumors enriched for tumor cells by microdissection and 9 cell lines) and 20 microdisssected normal cervical squamous epithelial cells were utilized for expression studies. Biotinylated cRNA preparation and hybridization of arrays was performed by the standard protocols supplied by the manufacturer. Arrays were subsequently developed and scanned to obtain quantitative gene expression levels. Expression values for the genes were determined using the Affymetrix GeneChip Operating Software (GCOS) and the Global Scaling option, which allows a number of experiments to be normalized to one target intensity to account for the differences in global chip intensity. The .CEL files obtained from the GCOS software were processed and normalized by dChip algorithm as described above. An average percent present call of 54% was obtained among all samples, which is expected for high quality RNA as per the manufacturer. Arrays were normalized at PM/MM probe level and a median intensity array from normal as the baseline array using invariant set normalization [12,13]. Followed by normalization, model based expression values were calculated using PM/MM data view to fit the model for all probe sets. All original data files were deposited to GEO (Accession number: GSE9750). To obtain a list of differentially expressed gene signatures, we compared all normal with all tumor samples using the criteria of 1.75-fold change between the group means at 90% confidence interval and a significance level of P < 0.05. All negative expression values for each probe set were truncated to 1 before calculating fold changes and < 10% of samples with present call in each group were excluded. A list of differentially expressed genes identified on chromosome 5 was used in all subsequent supervised analyses using the same criteria between various groups to obtain relevant gene signatures.

Fluorescence in situ hybridization (FISH) and HPV typing
FISH was performed by standard methods on frozen tissue sections fixed in 3:1 methanol: acetic acid, tissue microarrays prepared from paraffin embedded tissues, and on pap smears fixed in 3:1 methanol: acetic acid. A dual color locus specific probe set containing spectrum orange labeled EGR1 (map to 5q31) and spectrum green labeled D5S23/D5S721 (map to 5p15.2) was obtained from Vysis (Downers Grove, IL). Hybridization signals on 100-500 interphase cells on DAPI counterstained slides were scored on Nikon Eclipse epi-fluorescence microscope equipped with Applied Imaging CytoVision software (San Jose, CA). Scoring of FISH signals on frozen and paraffin-embedded tissue sections was restricted to tumor cells based on the identification of areas of tumor on adjacent H&E sections by the pathologist (MM). FISH signal scoring on Pap smear slides was restricted to large and atypical epithelial cells. Presence of signals suggestive of gain in at least 3% cells was considered positive and the results correlated with parallel cytomorphologic findings. Human papillomavirus types were identified as described earlier [15].

Identification of 5p gain as the most frequent genomic alteration in invasive CC
Affymetrix 250 K NspI SNP array analysis was performed on a panel of 79 CC cases (70 primary tumors and 9 cell lines) to identify genome-wide copy number alterations (CNA) (unpublished data). The dataset of chromosome 5 CNA from this analysis was utilized in the present study. CNA of chromosome 5 was found in 42 (53.2%) CC cases. Of these, 5p exhibited copy number gains in 34 (43%) cases while no detectable copy number losses were found on this chromosomal arm ( Figure 1A). Gain of 5p was the most commonly affected regions in the CC genome (see Additional file 1). On the other hand, gain of long arm of chromosome 5 (5q) was rare with only 3 (3.8%) tumors showing CNI. However, copy number losses on 5q were found in 25 (31.6%) tumors. Of these, 17 had concurrent 5p gains and 5q losses, while the remaining 8 only showed 5q deletion ( Figure 1B). Among the tumors that exhibited 5p CNI the entire 5p was gained and no minimal region of duplication or amplification could be delineated. Similarly, deletions on 5q span large regions often spanning the entire chromosomal arm and no consensus minimal deletion could be identified (Figure 1B). This data demonstrate that the chromosome 5p is a frequent target of CNI in CC, while accompanying deletions on 5q were found less frequently. To identify the clinical significance, we evaluated the association of chromosome 5 CNA with pathologic features such as histology, age, stage and size of the tumor, treatment outcome, and HPV type by univariate analyses and found no signif-Identification of chromosome 5p genomic alterations in cervical cancer icant associations. These data thus suggest that chromosome 5 CNA is a critical genetic alteration that may occur early in the development of CC.

FISH validation of 5p gain in CC
To validate the 5p gain identified by SNP array, we performed FISH analysis using a cocktail of two probes containing spectrum green-labeled 5p15.2 locus and spectrum orange-labeled 5q31 region on 101 CC cases. These include an independent panel of 55 tumors on a paraffin-embedded tissue microarray and an additional 46 tumors as frozen sections or pap smears. The latter include 23 tumors studied by SNP array (see Additional file 2). A total of 64 (63%) tumors showed an evidence for increased copies (3 or more) of 5p ( Figure 1C-E). An average of 4.4 copies (range 3-11) of 5p15 signals were found among the 64 cases that exhibited gain, while only 2.6 copies (range: 1-8) of the 5q31 region were present (Figure 1C-E). These data, thus, suggest that the 5p CNI is independent of ploidy of the tumor and support the SNP data showing the gain of 5p and associated loss of 5q.
All the tumors that exhibited 5p gain by SNP array also showed gain by FISH. For example, the tumors T-207, T-218, and T-1981 showed simultaneous high copy numbers of 5p and loss of 5q by SNP array analysis. The FISH results on the same tumors are in complete agreement with the SNP data ( Figure 1). These results, thus, validate the SNP data and establish that 5p CNI as the most frequent genetic alteration in CC.

Chromosome 5p gain is a late genetic event in CC progression
CC progresses through distinct morphological changes during the transition from normal epithelium to carcinoma through low-and high-grade SILs. To identify the earliest stage in CC development in which the 5p CNI occur, we used a FISH assay on 42 consecutively ascertained pap smears simultaneously diagnosed by cytology as normal, squamous metaplasia or with atypical cells of undetermined significance (ASCUS) (N = 10), LSIL (N = 14) and HSIL (N = 19). Five of 19 (26.3%) HSILs showed four or more copies (range 4-7) of 5p ( Figure 1F). Of these, three HSILs exhibited tetrasomy 5 while 2 others showed evidence of 5p gain (5-7 copies vs. 3-4 copies of 5q) ( Figure 1F). No evidence of gain of 5p was found in any specimens diagnosed as LSIL, normal, squamous metaplasia or ASCUS. Thus, these data suggest that 5p gain is a relatively late event in the progression of CC.
The biological behavior of HSILs varies where only a small proportion progresses to invasive cancer if left untreated [16][17][18]. Cytologic characterization alone doesn't permit the identification of HSILs at risk for progression from those that regress or persist. Because of this, all HSILs are currently treated by surgical excision or with an ablative therapy. Identification of genetic signatures defining the subset of high-risk HSILs could alter the treatment strategies. Chromosome 5p gain may serve as such a genetic marker in predicting the progression of HSILs.

Identification of transcriptional targets of 5p gain, including Drosha, in CC
We have shown 5p CNI as the most frequent genomic alteration in CC by combined SNP (see Additional file 1) and FISH analyses. We hypothesize that the increased 5p dosage may result in deregulation of genes that may confer oncogenic properties to its host cell. To identify such transcriptional targets on 5p, we utilized gene expression profiling data on Affymetrix U133A array analysis of 20 normal squamous epithelial samples (age range, 27-64 yr; Mean ± SD, 46.9 ± 7.6) and 30 CC cases (21 primary tumors; age range 28-70 yr; Mean ± SD, 48.3 ± 11.3; and 9 cell lines). Initial identification of differentially expressed gene signatures on chromosome 5 in CC was obtained by comparison of all probe sets on chromosome 5 present in U133A array between tumors and normal that exhibit significant (P < 0.05) differences using the criteria described in materials and methods. This algorithm identified 122 non-redundant probe sets with significant differences in expression levels in tumors compared to normal. This unique CC chromosome 5 gene signature, which distinguishes normal from tumor, includes 26 probe sets with down-regulated expression and 96 probe sets with increased expression (see Additional file 3). We anticipate that this differentially expressed gene data set will be useful in identifying target genes of CNI of 5p and loss of 5q in CC. Therefore, we focused our attention on this gene dataset in all subsequent supervised analyses of gene expression.
Although a similar type of analysis identified down-regulated gene signature on 5q in invasive CC, no specific signature associated with 5q deletion could be identified (see Additional file 5). Analysis performed to identify the down regulated genes on 5q using all probes on chromosome 5q on U133A array showed a total of 17 down regulated genes (EGR1, PITX1, MAST4, GALNT2, ATP10B,  DUSP1, HBEGF, RMND5B, HMGCR, CAST, CLTB, GX3,  SPINK5, LOC653314, CXCL14, ISL1, and PIK3R1) in CC compared to normal cervical epithelium. Thus, these data suggest that the 5p gain is a critical genetic change in CC and the genes identified as a consequence of 5p gain may be important in its tumorigenesis. These data further suggest that the 5q loss may have little consequence to CC biology and may represent a by-stander genetic alteration associated with 5p gain.

Discussion
We provide multiple levels of evidence to support that genomic gain of chromosome 5p is an important genetic target in CC development. First, our SNP analysis identified 5p gain as the most frequent genetic alteration in invasive CC (see Additional file 1). By FISH we confirmed this finding using an independent cohort of CC specimens and seen only in high-grade SILs. Several previous studies have identified recurrent gain of 5p in many types of human cancers [19], including CC [1][2][3][4][20][21][22]. Gain of 5p also appears to arise at latter passages in HPV-immor-Supervised analysis of over expressed genes identified as a consequence of gain of chromosome 5q in cervical cancer Figure 2 Supervised analysis of over expressed genes identified as a consequence of gain of chromosome 5q in cervical cancer. Significantly differentially expressed genes were identified by filtering all the over expressed genes on chromosome 5p between tumor that showed gain of 5p and tumors with out 5p gain. In the matrix, each row represents the gene expression relative to group mean and each column represents a sample (shown on Top). T, represents primary tumor; CL, represents cell line. The dendrogram on left shows unsupervised clustering of genes differentially expressed between tumors with and without gain. The names of genes are shown on right. The scale bar (-2 to +2) on the bottom represents the level of expression with intensities of blue represents decrease and red for increase in expression. The groups within tumors shown at top represent no gain of chromosome 5p (I) and 5p gain (II).
talized cervical keratinocytes and its acquisition confers the ability to invade collagen in tissue culture [23]. This close recapitulation of 5p gain in latter stages of an in vitro model and in the clinical specimens from CC patients provide a strong evidence that this change occurs late in the development and may play role in invasion.
Of these genes, RNASEN (Drosha) over expression was identified in all tumors with 5p gain ascertained by SNP analysis (Figure 2). This finding suggests that RNASEN is one of the critical targets conferred by 5p CNI that may play a major role in tumor progression. Drosha executes the initial step in microRNA (miRNA) processing by cleaving pri-miRNA to release pre-miRNA. Drosha is also involved in pre-rRNA processing with specificity to double-strand RNA [25]. Drosha over expression was shown to regulate proliferation and predicts poor prognosis in esophageal cancer [26]. Drosha copy number gain and over expression was shown to influence global miRNA profiles in CC [27]. miRNAs play critical roles in various biological processes including cancer where miRNA fin-gerprinting can distinguish different lineage tumors [28]. Although the role of Drosha over expression in cancer is not well studied, a number of possibilities exist. Overexpressed Drosha may more efficiently process pri-miRNAs resulting in increased levels of mature miRNAs and the resulting miRNAs may effect transcription of several mRNAs that in turn affect the production of other pri-miRNAs [29]. In the context of its role in miRNA processing, our data suggest that Drosha over expression due to 5p gain is likely an important mechanism in later stages of CC progression.
Previous studies have shown that oncostatin M receptor (OSMR) gene is gained and over expressed in CC, which is associated with adverse clinical outcome [30,31]. Oncostatin M (OSM) is a cytokine related to the IL-6 family of cytokines and its biological activity is mediated through the receptor complex. Upon ligand binding, OSMR can activate signaling pathways implicated in cancer such as STAT, PI3/AKT, and mediates inhibition of tumor growth [32]. Angiogenic factor VEGF is induced upon OSM stimulation in cervical cancer cell lines suggesting OSMR over expression contributes in CC tumorigenesis [31].
Our expression analysis also showed a number of genes that possess functions related to nucleic acid binding, DNA repair, and mitotic cell cycle (BASP1, TARS, PAIP1, BRD9, RAD1, SKP2, and POLS). Of these, the S-phase kinase-associated protein 2 (SKP2) plays a critical role in coordinating the G1/S transition, cell cycle progression, forms a substrate recognition subunit of SCF ubiquitinprotein ligase complex, and inhibits the tumor suppressor function of FOXO1. Over expression of SKP2 was found in many tumor types, consistent with a role of an oncogene, and is associated with poor clinical outcome [33]. RAD1 is a component of the 9-1-1 cell-cycle checkpoint response complex that plays a major role in DNA repair [34]. However, its role in cancer is not well understood. Three nuclear genes (NNT, SDHA, and NDUFS6) encoding mitochondrial proteins that play a role in oxidative phosphorylation (OxPhos) were also over expressed as a consequence of 5p gain. The mitochondrial OxPhos system plays a key role in energy production, the generation of free radicals, and apoptosis, the hallmark features of cancer cells [35]. Since tumor cells display enhanced biosynthesis capacity, a key feature of the metabolic transformation of tumor cells that support growth and proliferation, the mitochondrial OxPhos system may stimulate signaling pathways critical in tumor progres-sion. Although nothing is known about these genes in cancer, it remains to be determined whether one or more of these genes act individually or synergistically as oncogenes in regulating the metabolic transformation in CC.
Since genetic activation of therapy targets such as ABL, C-KIT, Her2/neu, and EGFR has been successfully demonstrated to be essential for treatment response [36], our finding of 5p gene targets such as RNASEN, SKP2, and OSMR emphasize the need for functional analysis and dissecting signaling cascades involving these genes in ultimately obtaining therapeutic targets needed for cure and prevention of this devastating cancer.

Conclusion
In summary, we integrated multiple genomic data to identify 5p gain as the most recurrent chromosomal alteration Relative expression of differentially expressed genes as a consequence of 5p gain in relation to GAPDH in normal and tumors with and without gain of 5p gain Figure 3 Relative expression of differentially expressed genes as a consequence of 5p gain in relation to GAPDH in normal and tumors with and without gain of 5p gain. Genes are shown on top left-side corner of each panel.
that occur at high-grade precancerous lesions in the development of CC. We identified the target 5p gain associated over expressed genes that play a role in miRNA processing, signal transduction, DNA repair and mitotic cycle, and oxidative phosphorylation, suggesting a functional role for this chromosomal region in progression of CC. Thus, the genes identified here will form a basis for functional testing of 5p gain and the gene expression levels can be used as a biomarker to identify patients with aggressive disease. Further studies in the context of 5p gain will allow deciphering critical gene targets to develop molecular based therapies for CC.