Pathway-specific differences between tumor cell lines and normal and tumor tissue cells
© Ertel et al. 2006
Received: 23 January 2006
Accepted: 02 November 2006
Published: 02 November 2006
Cell lines are used in experimental investigation of cancer but their capacity to represent tumor cells has yet to be quantified. The aim of the study was to identify significant alterations in pathway usage in cell lines in comparison with normal and tumor tissue.
This study utilized a pathway-specific enrichment analysis of publicly accessible microarray data and quantified the gene expression differences between cell lines, tumor, and normal tissue cells for six different tissue types. KEGG pathways that are significantly different between cell lines and tumors, cell lines and normal tissues and tumor and normal tissue were identified through enrichment tests on gene lists obtained using Significance Analysis of Microarrays (SAM).
Cellular pathways that were significantly upregulated in cell lines compared to tumor cells and normal cells of the same tissue type included ATP synthesis, cell communication, cell cycle, oxidative phosphorylation, purine, pyrimidine and pyruvate metabolism, and proteasome. Results on metabolic pathways suggested an increase in the velocity nucleotide metabolism and RNA production. Pathways that were downregulated in cell lines compared to tumor and normal tissue included cell communication, cell adhesion molecules (CAMs), and ECM-receptor interaction. Only a fraction of the significantly altered genes in tumor-to-normal comparison had similar expressions in cancer cell lines and tumor cells. These genes were tissue-specific and were distributed sparsely among multiple pathways.
Significantly altered genes in tumors compared to normal tissue were largely tissue specific. Among these genes downregulation was a major trend. In contrast, cell lines contained large sets of significantly upregulated genes that were common to multiple tissue types. Pathway upregulation in cell lines was most pronounced over metabolic pathways including cell nucleotide metabolism and oxidative phosphorylation. Signaling pathways involved in adhesion and communication of cultured cancer cells were downregulated. The three way pathways comparison presented in this study brings light into the differences in the use of cellular pathways by tumor cells and cancer cell lines.
Cell lines derived from tumors and tissues comprise the most frequently used living systems in research on cell biology. Limitations on the abundance of tissue samples necessitate the use of animal models and cell lines in the studies of tumor-related phenomena. Cancer cell lines have been extensively used in screening studies involving drug sensitivity and effectiveness of anti cancer drugs . Other studies using cultured cells aimed at the determination of the phenotypic properties of cancer cells such as proliferation rates, migration capacity and ability to induce angiogenesis . In other studies, human cultured cells were used to create tumors in the mice models .
Whether measurements on cell lines provide information about the metastatic behavior of cancer cells in vivo is currently under investigation. Unsupervised classification of gene expression profiles of cancer tissue and cancer cell lines result in separate clustering of cancer cell lines from tissue cells for both solid tumors and blood cancers . Sets of genes responsible for differences between solid tumors and cell lines in their response to anti cancer drugs have been identified in the Serial Analysis of Gene Expression (SAGE) Database . Most optimal cell lines to represent given tumor tissue types were determined with the use of a quantitative tissue similarity index . Results were striking: only 34 of the 60 cell lines used in the analysis were most similar to the tumor types from which they were derived. The study provided valuable information about selection of most appropriate cell lines in pharmaceutical screening programs and other cancer research. In a more recent work Sandberg et al.  identified those gene function groups for which cell lines differed most significantly from tumors based on meta-analysis using Gene Ontology (GO). Genes involved in cell-cycle progression, protein processing and protein turnover as well as genes involved in metabolic pathways were found to be upregulated (an increase in expression reflected by mRNA transcript levels) in cell lines, whereas genes for cell adhesion molecules and membrane signaling proteins in cell lines were downregulated (a decrease in expression reflected by mRNA transcript levels) in comparison with tumors . To build on this approach, functional enrichment analysis based on Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways [8, 9] can be used to illustrate causal relationships between genes (gene products). While GO is organized into hierarchical annotations in the context of normal cellular function, the KEGG database organizes the genes (gene products) into pathway reaction maps and functional complexes, including some disease-specific pathways.
The present study focuses on pathway specific differences in gene expression patterns between cancer cell lines and tumors as well as cancer cell lines and normal tissue and tumors and normal tissue. Extension of microarray data analysis to three-way comparison allows for the identification of gene expression patterns unique to cell lines. Such patterns might have arisen due to factors related to the cell culture environment. We used publicly accessible microarray data available for normal and cancer tissues and associated NCI60 cell lines in a pathway-specific quantitative analysis of gene expression profiles. A dominant theme that emerged from our analysis was that pathway-specific gene expression differences between cancer cell lines and cancer tissue were similar both in magnitude and direction to corresponding differences between cell lines and normal tissue cells. Cell cycle associated differences between normal and tumor tissue were amplified in cell lines. Results on metabolic pathways suggested an increase in the velocity of RNA and DNA production and increased flow of metabolites in the oxidative phosphorylation pathway. On the other hand, a small fraction of significantly altered genes in tumor-to-normal comparison had similar expressions in cancer cell lines and tumor cells. These genes were tissue-specific and were positioned sparsely along multiple pathways.
Materials and methods
Microarray data presented by Staunton et al.  and Ramaswamy et al.  used in the three way comparison of gene expression patterns in cell lines, tumors and normal tissue.
Quality of probe set annotations
Quality of the Hu6800 GeneChip annotation was assessed because this platform is several versions away from current human microarrays. While the Hu6800 design is old and probe designs have since been greatly improved, the quality of probe annotation is maintained through regular updates by Affymetrix. The annotations used in this study are based on a July 12th 2006 update of Affymetrix annotations according to the March 2006 (NCBI Build 36.1) version of the human genome. A comparison was done between gene annotations for the Hu6800 GeneChip obtained from Webgestalt (web-based gene set analysis toolkit)  and from the Affymetrix website on August 7th, 2006. Out of the 7129 probesets on the chip, 6058 had the same annotations from both Webgestalt and Affymetrix. Of the remaining 1071 probesets, 692 were not annotated, 288 were annotated in the Affymetrix list but not in Webgestalt, 28 were annotated in Webgestalt but not Affymetrix, and 63 (~1%) probesets had conflicting annotations in Webgestalt and Affymetrix. Only 42 (~0.70% of all genes) genes belonging to any known KEGG pathway had discrepancies between Webgestalt and Affymetrix. While there were very few probes with discrepant annotations in any given pathway, this list of 42 probes was enriched for Antigen processing and presentation, Natural killer cell mediated cytotoxicity, Cell adhesion molecules (CAMs), Type I diabetes mellitus, and SNARE interactions in vesicular transport pathways. A review of this probe list revealed that discrepancies were merely due to updates and minor revisions to the official gene symbol that may reflect increased understanding of these genes functions. Genes associated with KEGG pathways represent a subset of well-studied and sequenced genes. Overall, the probe sets of genes belonging to KEGG pathways have well established and reliable annotations on the Hu6800 GeneChip. Annotations retrieved from Webgestalt were used for the remainder of the analysis.
Gene expression data was normalized for each tissue type by computing the Robust Multichip Average (RMA) [13, 14] directly from the Affymetrix .CEL files for cell line, tumor, and normal samples. RMA consists of three steps: a background adjustment, quantile normalization and finally summarization. Quantile normalization method utilizes data from all arrays in an experiment in order to form the normalization relation [13, 14] RMA generated expression measure is on the log base 2 scale.
Normalized data was generated using the Bioconductor (package for R)  implementation of RMA. R 2.3.1  was first installed on an Intel Xeon machine running a Windows Professional Operating System. The Biobase 1.10.1 (dated 20 June 2006) package which contains the base functions for Bioconductor was installed by accessing the getBioC.R script directly from the Bioconductor website . The "readaffy" command was used to load all .CEL files for a single tissue type. The RMA expression measures for each tissue type were computed using the "rma" function with default settings, including the Perfect Match Adjustment Method setting as Perfect Match Only so that expression signal calculation was based upon the perfect match values from each probe set as described in . The RMA computed expression values were written out to a comma separated text file.
The resulting expression values for each sample were checked against the average expression across cell line, tumor, and normal populations by calculating their correlation coefficients. Two anomalous samples (one normal tissue sample from colon and one tumor sample from prostate) were identified having correlations well outside the remaining population (R < 0.9) and removed; RMA for those tissues was recomputed excluding the suspect samples. The RMA generated gene expression data for the Affymetrix chips was clustered using a hierarchical clustering algorithm with Pearson correlation coefficient as the distance metric using average linkage using TIGR MeV Version 3.1. For each of the six tissues under consideration, the cell line samples clustered together in a single branch distinct from the branches containing tumor and normal tissue samples. This result confirmed that all the cell line samples have characteristics that are significantly different from the tumor tissue.
Significance analysis for gene expression
The Significance Analysis of Microarray Data (SAM) implementation  in the TIGR MeV Version 3.1 software  was used to identify those genes that had statistically significant differences in expression between tumor samples, cell lines, and normal tissue. SAM analysis was performed using all default parameters and adjusting the delta-value to obtain a maximum number of genes while maintaining a conservative false discovery rate of zero. A list of significant genes was identified for cell line-tumor cell line-normal and normal-tumor combinations for each of the six tissue types. When the set of significant genes was deleted from the microarray data, clustering analysis based on the remaining genes interspersed microarray datasets for cell lines with corresponding datasets for tissue.
Identification of significantly altered pathways
Two different methods were used for identifying significantly altered pathways. First, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways [8, 9] were identified as significantly altered by performing a functional enrichment analysis on genes identified as significant by SAM analysis. The analysis was carried out using the Webgestalt system , comparing significant genes obtained by SAM against all genes in the Affymetrix HU6800 array, for each comparison under study. A p-value for pathway enrichment was obtained using the hypergeometiric test documented in . Four different p-value cutoffs (0.001, 0.01, 0.05 and 0.1) were used in order to assess the dependence of the significant pathway identification on p value. This process was also applied to subsets of significant genes, for example, the intersection of significant genes from (CL - N) and (T - N).
A second method was applied to KEGG pathway genes in order to detect changes that were not apparent on a single-gene basis. For this method, KEGG pathways were deemed significantly altered if at least 80% of the genes for that pathway contained on the HU6800 array were shifted in the same direction for a given comparison. For each of the six tissues, three-way comparisons were performed between averaged cell line, tumor, and normal samples. Similar examples of how significant changes in functional pathways are revealed by a population of related genes that are not evident from observations of a single gene are found in [20, 21].
Number of significant genes identified by SAM in comparisons of cell line-to-tumor (CL - T), cell line-to-normal (CL - N), and tumor-to-normal (T - N) comparisons.
CL-T (upregulated %)
CL-N (upregulated %)
T-N (upregulated %)
CL-T ∩ CL-N
(T-N) - (T-N ∩ CL-T)
(T-N ∩ CL-N) - (T-N ∩ CL-N ∩ CL-T)
SAM genes that were upregulated in cell lines compared to tumors in all the 6 tissues considered in the study (CL - T).
ATP synthase, H+ transporting, mitochondrial F1 complex, beta polypeptide
Oxidative phosphorylation, ATP synthesis
ATP synthase, H+ transporting, mitochondrial F0 complex, subunit C3 (subunit 9)
ATP synthesis, Oxidative phosphorylation
complement component 1, q subcomponent binding protein
chromobox homolog 3 (HP1 gamma homolog, Drosophila)
chaperonin containing TCP1, subunit 5 (epsilon)
CDC20 cell division cycle 20 homolog (S. cerevisiae)
Ubiquitin mediated proteolysis, Cell cycle
cyclin-dependent kinase inhibitor 3 (CDK2-associated dual specificity phosphatase)
chromatin assembly factor 1, subunit A (p150)
cytoskeleton associated protein 1
CDC28 protein kinase regulatory subunit 1B
CDC28 protein kinase regulatory subunit 2
cytochrome c oxidase subunit 8A (ubiquitous)
DNA (cytosine-5-)-methyltransferase 1
dynein, light chain, LC8-type 1
EBNA1 binding protein 2
high-mobility group box 2
kinesin family member 2C
MCM3 minichromosome maintenance deficient 3 (S. cerevisiae)
MCM4 minichromosome maintenance deficient 4 (S. cerevisiae)
MCM7 minichromosome maintenance deficient 7 (S. cerevisiae)
mitochondrial ribosomal protein L12
NADH dehydrogenase (ubiquinone) Fe-S protein 8, 23kDa (NADH-coenzyme Q reductase)
phosphoribosylaminoimidazole carboxylase, phosphoribosylaminoimidazole succinocarboxamide synthetase
proliferating cell nuclear antigen
polymerase (RNA) II (DNA directed) polypeptide G
Purine metabolism, RNA polymerase, Pyrimidine metabolism
protein arginine methyltransferase 1
Selenoamino acid metabolism, Nitrobenzene degradation, Aminophosphonate metabolism, Tryptophan metabolism, Histidine metabolism, Androgen and estrogen metabolism, Tyrosine metabolism
proteasome (prosome, macropain) subunit, alpha type, 1
proteasome (prosome, macropain) subunit, beta type, 2
proteasome (prosome, macropain) subunit, beta type, 5
proteasome (prosome, macropain) subunit, beta type, 6
proteasome (prosome, macropain) 26S subunit, non-ATPase, 14
RAN binding protein 1
splicing factor, arginine/serine-rich 9
small nuclear ribonucleoprotein polypeptide A
small nuclear ribonucleoprotein polypeptides B and B1
small nuclear ribonucleoprotein polypeptide C
small nuclear ribonucleoprotein D2 polypeptide 16.5kDa
small nuclear ribonucleoprotein D3 polypeptide 18kDa
small nuclear ribonucleoprotein polypeptide E
small nuclear ribonucleoprotein polypeptide F
small nuclear ribonucleoprotein polypeptide G
transcription elongation factor B (SIII), polypeptide 1 (15kDa, elongin C)
Ubiquitin mediated proteolysis
tubulin, gamma 1
thioredoxin reductase 1
Pyrimidine metabolism, One carbon pool by folate
ubiquitin-conjugating enzyme E2C
Ubiquitin mediated proteolysis
ubiquitin-conjugating enzyme E2S
Comparison of results obtained from this study with those based on Gene Ontology Processes by Sandberg et al. 
Related GO category
Direction of regulation in cell lines with respect to tumors
Gene Ontology Study 
ATP synthesis coupled proton transport (GO:0015986)
Cell cycle (GO:0007049)
One carbon pool by folate
Nucleotide biosynthesis (GO:0009165)
Oxidative phosphorylation (GO:0006119)
Ubiquitin-dependent protein catabolism (GO:0006511); Modification-dependent protein catabolism (GO:0019941)
Purine nucleotide metabolism (GO:0006163)
Nucleobase, nucleoside, nucleotide and nucleic acid metabolism (GO:0006139)
Protein biosynthesis (GO:0006412)
Nucleobase, nucleoside, nucleotide and nucleic acid metabolism (GO:0006139)
Cell adhesion molecules (CAMs)
Cell adhesion (GO:0007155)
Cell adhesion (GO:0007155)
Complement and coagulation cascade
Complement activation (GO:0006956)
Cell adhesion (GO:0007155)
Cell adhesion (GO:0007155)
Phenol metabolism (GO:0018958)
Gene expression changes in metabolic pathways
Gene expression pattern changes in cell cycle
Perhaps the most obvious feature of this color map is how subtle the changes in (T - N) comparisons are relative to (CL - T) and (CL - N) comparisons in all six tissues under consideration. Genes such as CCNA2, CCNB1, CDC20, CDK4, and MDM2 through MDM7 are consistently upregulated in cell lines compared to tumors and normal tissue. On the other hand, genes such as CCND1, CCND3, CDC16, and CDK2 do not exhibit quickly a recognizable pattern. A multitude of gene expression profiles in cell cycle may point towards the same disease process.
SAM genes common to cancer cell lines and tumor cells
Genes that were identified by SAM in both (T - N) and (CL - N) comparisons but not in (CL - T) comparisons; (T - N ∩ CL - N) – (T - N ∩ CL - N ∩ CL - T).
Our study shows that a large portion of genes implicated in the emergence and progression of cancer have similar gene expression values in tumors and cancer cell lines indicating the value of cultured cell lines in cancer research. However, the pair-wise comparisons of gene expression profiles of CL, T, and N across all tissues illustrate that there are pronounced changes in gene expression specific to cell lines (CL - T; CL - N) that may not represent a disease process. This study also identified the signaling and metabolic pathways in cell lines that have distinctly different gene expression patterns than those associated with normal and tumor tissue. Pathway-specific gene expression changes in (CL - T) and (CL - N) comparisons were more consistent than (T - N) comparisons in the set of six tissues under consideration. Just as the gene expression changes in tumor – normal tissue comparison were largely tissue-specific, the significantly altered pathways among tumor – normal comparisons were limited to a small number of tissues. Functional enrichment analysis allows us to explore significant changes in pathways despite having heterogeneous changes in gene expression across different tissues. Cellular pathways that were significantly upregulated in cell lines compared to tumor cells and normal cells of the same tissue type included ATP synthesis, cell cycle, oxidative phosphorylation, purine, pyrimidine and pyruvate metabolism, and proteasome. Results on metabolic pathways suggested an increase in the velocity nucleotide metabolism and RNA production.
The dominant trend in the gene expression profiles along significantly altered pathways in cell lines appeared to be upregulation of genes when compared either to tumor or normal tissue. Exceptions included genes in the cell adhesion molecules, cell communication, and ECM-receptor interaction, focal adhesion, and complement/coagulation cascade pathways. The apparent downregulation of the complement/coagulation cascade in cell lines may be due to the heterogeneous mixture of cells in tumor samples including immune cells as well as tissue-specific cells.
The composition of the cell culture medium may be the reason why gene expression patterns that differentiate cancer cell lines from tumor tissue are similar to those patterns that differentiate between cell lines and normal tissue. Typical cell culture medium is replete with metabolites, growth factors, and cytokines, among others, for which cells normally must compete in vivo . Multicellular interfaces with which tumor cells interact in vivo are not replicated for cells grown in cell culture plates [26–29]. The differences in environmental selection pressures may help explain the differential gene expression patterns between the tumor tissue and the cell lines. Our finding about the upregulation of oxidative phosphorylation in cell lines is supported by previous metabolic studies [30, 31]. The documentation of gene expression differences along signaling and metabolic pathways is important in compound screening during the drug discovery process. Compounds may affect significantly altered pathways between cell lines and tumor tissue differently. Recent studies are taking advantage of the technological advances in microfluidics and tissue engineering to develop three-dimensional cell culture systems that aim simulating in vivo culture conditions. Whether cell lines can be made to mimic tumor cell gene expression patterns by altering the culture medium conditions is a question yet to be fully explored.
This study was supported by the National institute of Health (NIH) grant #232240 and by the National science Foundation (NSF) grant # 235327.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.