Pan-cancer analysis of tumor metabolic landscape associated with genomic alterations

Although metabolic alterations are one of the hallmarks of cancer, there is a lack of understanding of how metabolic landscape is reconstituted according to cancer progression and which genetic alterations underlie its heterogeneity within cancer cells. Here, the configuration of the metabolic landscape according to genetic alteration is examined across 7648 subjects representing 29 cancers. The metabolic landscape and its reconfiguration according to the accumulated mutation maintained characteristics of their tissue of origin. However, there were some common patterns across cancers in terms of the association with cancer progression. Carbohydrate and pyrimidine metabolism showed the highest positive correlation with tumor metabolic burden and they were also common poor prognostic pathways in several cancer types. We additionally examined whether genetic alterations associated with the heterogeneity of metabolic landscape. Genetic alterations associated with each metabolic pathway differed between cancers, however, they were a part of cancer drivers in most cancer types. Electronic supplementary material The online version of this article (10.1186/s12943-018-0895-9) contains supplementary material, which is available to authorized users.


Preprocessing for transcriptomic and genomic data of TCGA
We used publicly available cancer genome and transcriptome data from the TCGA projects. Using 'TCGABiolinks' R package [1], we downloaded the level three RNA sequence data of 32 solid cancers from TCGA data portal (https://portal.gdc.cancer.gov/) on Dec 12th, 2017, obtained with Illumina HiSeq RNASeqV2 (Illumina, San Diego, CA, USA).
For each type of TCGA projects, we normalized mRNA transcripts using 'TCGAAnalyze_Normalization' function and filtered low expression genes with 'TCGAanalyze_Filtering' function. We merged transcriptome data of each TCGA projects into one large-scale expression matrix for pan-cancer analysis. Clinical information, including vital status, follow-up time, and time of death was also collected in the same manner. The information for microsatellite instability (MSI) or hypermutated status of both STAD and COAD were downloaded from the website of The cBioPortal for Cancer Genomics (http://www.cbioportal.org) on Dec 23th, 2017. We downloaded pre-compiled, curated somatic mutations data of 32 solid cancer types from TCGA projects, provided by 'TCGAmutations', as a R data package (https://github.com/PoisonAlien/TCGAmutations).
The pre-compiled data were derived from the latest analysis data (January 28 th , 2016) which was downloaded via Broad Institute GDAC Firehose pipeline. Mutation data were analyzed and summarized using maftools package [2]. We excluded three cancer subtypes (malignant mesothelioma, uterine corpus endometrial carcinoma, and skin cutaneous melanoma) for the analysis due to insufficient mutation data in the pre-compiled data; conclusively, genomic and transcriptomic data from total 29 cancer subtypes were used for the analysis in the present study.

Calculating enrichment scores of metabolic pathways
To analyze cancer type-specific metabolic landscape, we used 26 metabolic pathways defined by Reactome [3] across 29 cancer types. Single sample gene set enrichment analysis was then applied against the curated gene sets of the Reactome metabolic pathways to define metabolic profiles of each cancer samples. We implemented single sample gene set enrichment analysis [4] using the curated gene sets from canonical pathways (MSigDB C2, Broad Institute; version 3.0) with GSVA R/Bioconductor package [5,6]. To identify the functional enrichment scores of metabolic pathways of each sample, we extracted the enrichment scores of 26 Reactome pathways which were related to metabolism (Supplementary table 1) [3]. The enrichment scores of Reactome metabolic pathways were normalized by z-score across all samples. In total, we analyzed 7648 transcriptomic and genomic data (Number of samples and abbreviation for each cancer type is summarized in Supplementary Table 2).

Two-dimensional metabolic landscape mapping
To visualize differences in metabolic landscape, a dimension reduction method, t-Distributed Stochastic Neighbor Embedding (t-SNE), was used [7]. Briefly, using t-SNE, similar samples are modeled by nearby points to maintain local similarity. The similarity between a sample and other samples is defined by Gaussian with a number of neighbors, perplexity. We set the perplexity to 30.

Differentially mutated genes according to each metabolic pathway
For each metabolic signatures, the samples of each tumor type were divided into two groups according to the median value of the enrichment score. The differentially mutated genes between low and high enrichment scores group were evaluated by a fisher test on all genes. The genes with p-value corrected by false discovery rate under 0.05 were regarded as significantly differentially mutated genes.

Cancer driver mutation
We compared significantly differentially mutated genes for metabolic signatures and cancer drivers. Cancer drivers were identified by a pre-computed drivers using various algorithms deposited by DriverDBv2 (http://driverdb.tms.cmu.edu.tw/driverdbv2/) [8]. This database provides cancer drivers identified by multiple algorithms and we can choose the number of algorithms to find duplicate driver genes. We chose the maximal number of computational algorithms for each cancer type to define cancer drivers.

Statistical analysis
The correlation analysis between the TMB and enrichment scores of metabolic pathways was performed by the Spearman's correlation test. The prognostic property of each enrichment score of metabolic pathways on overall survival was evaluated by using the Cox proportional regression analysis in each cancer subtypes as well as pan-cancer data. All