In Silico analysis of Gastric carcinoma Serial Analysis of Gene Expression libraries reveals different profiles associated with ethnicity
© Ossandon et al. 2008
Received: 04 October 2007
Accepted: 27 February 2008
Published: 27 February 2008
Worldwide gastric carcinoma has marked geographical variations and worse outcome in patients from the West compared to the East. Although these differences has been explained by better diagnostic criteria, improved staging methods and more radical surgery, emerging evidence supports the concept that gene expression differences associated to ethnicity might contribute to this disparate outcome. Here, we collected datasets from 4 normal and 11 gastric carcinoma Serial Gene Expression Analysis (SAGE) libraries from two different ethnicities. All normal SAGE libraries as well as 7 tumor libraries were from the West and 4 tumor libraries were from the East. These datasets we compare by Correspondence Analysis and Support Tree analysis and specific differences in tags expression were identified by Significance Analysis for Microarray. Tags to gene assignments were performed by CGAP-SAGE Genie or TAGmapper. The analysis of global transcriptome shows a clear separation between normal and tumor libraries with 90 tags differentially expressed. A clear separation was also found between the West and the East tumor libraries with 54 tags differentially expressed. Tags to gene assignments identified 15 genes, 5 of them with significant higher expression in the West libraries in comparison to the East libraries. qRT-PCR in cell lines from west and east origin confirmed these differences. Interestingly, two of these genes have been associated to aggressiveness (COL1A1 and KLK10). In conclusion we found that in silico analysis of SAGE libraries from two different ethnicities reveal differences in gene expression profile. These expression differences might contribute to explain the disparate outcome between the West and the East.
Gastric carcinoma is the second leading cause of cancer-related death worldwide and has marked geographical variations [1–3]. The observed advantage in 5-year survival rate from patients from the East than from the West may reflect differences in diagnostic criteria, better staging methods and more radical surgery . However emerging evidence supports the concept that ethnicity might contribute to the disparate gastric carcinoma outcomes between the East and the West [4, 5]. Serial Analysis of Gene Expression (SAGE) is a comprehensive profiling method that allows for global, unbiased and quantitative characterization of transcriptomes . A major advantage of SAGE is that once normalized is possible to directly compare the levels of tags generated by a single experiment with any other available . To gain an insight of the differences between gastric carcinoma transcriptomes that might explain the disparate outcomes between the East and the West here we compare datasets of fifteen SAGE libraries derived from normal and gastric tumor tissues from Japanese and American gastric cancer patients by Correspondence Analysis, Support Tree and Significance Analysis for Microarray for significative tags and gene selection. We found specific genes differentially expressed between normal and tumor SAGE libraries as well as tumor libraries from the East and the West. These differentially expressed genes could explain the worse survival rate in the West in comparison to the East.
Serial Analyses of Gene Expression data
Fifteen gastric SAGE libraries (4 normal and 11 tumor) from Cancer Genome Anatomy Project (CGAP)  were combined for the analysis. Only libraries with 10 bp tags and the same cutting enzymes (BsmFI and NlaIII) were included in this study. Normal libraries consist of a tissue pool (GSM784 and GSM14780) or microdissected samples (CGAP_MD_13S and CGAP_MD_14S) and were produced by El-Rifai et al  in Virginia, USA. Gastric tumor libraries consist of five libraries, three microdissected (CGAP_MD_HG7, CGAP_MD_HS29, CGAP_MD_G329), two primary tumors (GSM757 and GSM2385) and two xenografts (GSM758 and GSM14760) all from western patients and produced by El-Rifai et al  also in Virginia, USA ("West tumor libraries") and 4 libraries (GSM7800, GSM8505, GSM8867 and GSM9103) all from japanese patients produced by Oue et al  in Hiroshima, Japan ("East tumor libraries"). A database containing 121,409 different tags was generated from libraries which have between 9,000 and 34,000 unique tags. Thus, only library GSM9103 was removed because its unique tag count was too low (around 6,000 unique tags). The frequency of each tag was normalized by dividing it with the total tag number of the corresponding library and multiplying by 200,000 tags (CGAP normalization format). A selection process to reduce noise from an enormous amount of tags collected was performed. This selection criterion was i) "tags found in all normal libraries" vs. "tags found in all tumor libraries" and ii) "tags found in all West tumors libraries" vs. "tags found in all East tumors libraries". The Institute for Genomic Research software MultiExperiment Viewer  was used to perform the following analysis: i) Correspondence Analysis (COA) to explore associations between samples that tend to have similar profiles ii) Support Tree to shows the statistical support after repeating at least 1000 times the analysis by resampling with replacement (Bootstrap method) for samples with similar profiles and iii) Significance Analysis for Microarray (SAM) to select tags whose expression was significantly different between samples. The association of tags to genes was perform by SAGE Genie  or TAGmapper  when no association was found by SAGE Genie. To predict functional classes of annotated genes the FatiGO+ tool of Babelomics [13, 14] was applied. The unadjusted p-value given by Babelomics was used because the small number of genes analyzed made it more appropriate than the adjusted-False Discovery Rate (FDR) value.
Quantitative Real-Time Reverse-Transcription PCR
Quantitative real-time reverse-transcription PCR (qRT-PCR) was performed on two western cell lines (AGS, N87) and one eastern cell line (MKN45). Total RNA was extracted using Trizol (Invitrogen Life Technologies, Carlsbad, CA) according to the manufacturer's recommendations. RNA concentration was determined by measuring absorbance at 260 nm, and quality was verified by the integrity of 28S and 18S rRNA after ethidium bromide staining of total RNA samples subjected to 0.8% agarose gel electrophoresis. Total cDNA was synthesized with MMLV (Moloney Murine Leukemia Virus) reverse transcriptase (ThermoScript RT; Invitrogen Life Technologies, Carlsbad, CA). Reverse transcription-PCR was performed using 1 ug of total cellular RNA to generate cDNA. qRT-PCR was performed using a LightCycler-FastStart DNA Master SYBR Green I kit (Roche Molecular Biochemicals, Mannheim, Germany). We designed gene-specific primers for human PDFGR (5' AGCTGATCCGTGCTAAGGAA 3' and 5' CGACCAAGTCCAGAATGGAT 3') and RPL13 (5' GAGGAGGCGGAACAAGTCC 3' and 5' TCAGCAGAACTGTCTCCCTTC 3') and conditions of amplification are available upon request. A single-melt curve peak was observed for each product, thus confirming the purity of all amplified cDNA products. The qRT-PCR results were normalized to GADPH (5' CGGGAAGCTTGTCATCAATGG 3' and 5' CATGGTTCACACCCATGACG 3'), which had minimal variation in all cell lines tested. Analysis was performing by LightCycler software 3.0. Crossing points (beginning of the PCR exponential phase) were assessed by the second derivated maximum method and plotted against the concentrations of the standards.
Tags with consistent expression in normal and tumor SAGE libraries
Selection of discriminatory tags between East and West SAGE libraries
Mapping SAGE tags to genes
The significant tags with higher expression by Significant Analysis for Microarray between the West and the East tumor SAGE libraries. Only the tags that were successfully associated with a specific gene are shown. The tags are sorted in a significance descending order, first the tags highly expressed in the East and then those highly expressed in the West.
N° of West libraries where present
West tumor average (Tags per 200,000)
N° of East libraries where present
East tumor average (Tags per 200,000)
Platelet-derived growth factor receptor, alpha polypeptide
H2.0-like homeo box 1 (Drosophila)
Ribosomal protein L13
Alcohol dehydrogenase 1C (class I), gamma polypeptide
Fc fragment of IgG binding protein
Alcohol dehydrogenase 4 (class II), pi polypeptide
Aldo-keto reductase family 1, member C2 (dihydrodiol dehydrogenase 2; bile acid binding protein; 3-alpha hydroxysteroid dehydrogenase, type III)
Mitogen-activated protein kinase 13
Mitochondrial trans-2-enoyl-CoA reductase
Epithelial membrane protein 1
Collagen, type I, alpha 1
Coiled-coil domain containing 12
Validation of genes differentially expressed between East and West tumor SAGE libraries
Our results, based on two non-supervised analyses, COA and Support Tree, are highly suggestive of a different expression profile of tumor SAGE libraries, along with differences between normal and tumor samples. These differences in expression levels might have an influence on the recognized better survival of the East patients in comparison to the West. Both, COA and Support Tree show two clusters (microdissected and non-microdissected samples) mixed indistinctly, suggesting that the heterogeneity of a normal sample is not reduced by the microdissection. This might be explained by multiple cell activities of the normal cells compared with tumor cells . However among tumor libraries, a tight grouping of microdissected tumors was found. These findings suggest that the increase of the purity of the sample improves the homogeneity of the results. The neighborhood of the xenografts also points to an increase in homogeneity but differ from the microdissected tumor samples since they group in different subclusters. This difference is probably due to subtle changes in the transcriptomes given by a different genetic environment, such as the microenvironment given by surrounding animal tissue . On the other hand, the non-microdissected libraries were found more scattered in the COA analysis, probably because of sample contamination and heterogeneity.
The FatiGO+ results show that the tumor cells are characterized by up-regulation of genes related to cell organization, biogenesis and cell proliferation, and a down-regulation of genes related to cell-to-cell communication. After searching for specific differences between the West and the East tumor libraries, we found that the most significantly different tags have a higher expression in the East compared with the West tumors. Thus, it seems that the average expression level of the West samples falls more than the East samples, probably because of a wider gene repression.
Of the 5 genes identified with significant higher expression in the West libraries at least two (COL1A1 and KLK10) have been associated with invasiveness and disease progression [9, 15]. COL1A1 has been reported associated with more advanced tumor stage in 46 gastric carcinoma cases . KLK10 has been reported up-regulated in gastric as well as colorectal carcinomas and associated with invasion and more advanced clinical stage for both types of tumors . In addition KRT17 has been found up-regulated in human esophageal squamous cell carcinoma (ESCC) and associated to invasiveness . Another gene, EMP1 has been associated to highly proliferative cell types in mouse brain tumors . Only CCDC12 gene does not have available clinical data and also lacks GO annotations. The qRT-PCR analysis on cell lines confirmed the SAGE results and validated the over-expression of PDFGR and RPL13 in the East tumor libraries.
In summary here we report that the predominant up-regulation of invasive and metastatic genes in the West tumor libraries might result in a more malignant disease with a poorer survival. Taken together these findings might suggest that that differentially expressed genes might contribute to explain the observed differences observed in the outcome of gastric carcinoma between the East and the West. Finally, our analysis is an example of how computational biology can effectively assist biomedical researchers in identifying the molecular mechanisms of disease .
We thank David S. Holmes and Gonzalo Riadi from Center for Bioinformatics and Genome Biology, Life Science Foundation – Andres Bello University, Santiago, Chile and Wael El-Rifai from Surgical Oncology Branch Vanderbilt Ingram Cancer Center, Vanderbilt University, Nashville, TN, USA, for helpful discussion of the manuscript. This work was supported by Chilean government research grants FONDECYT 1030130 and FONIS SA06I20019 to AHC.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.