Validation and comparison of the molecular classifications of pancreatic carcinomas

Four molecular classifications of pancreatic ductal adenocarcinoma (PDAC), biologically and clinically relevant and based on gene expression profiles, were established in the recent years, including the Collisson’s, Moffitt’s (“tumor” and “stroma” classifications), and Bailey’s classifications. The aim of this study was to validate the prognostic value of the Moffitt’s classifications and to compare the Collisson’s, Moffitt’s, and Bailey’s classifications in a large series of samples. We collected clinical and gene expression data of PDAC samples from 15 public data sets, resulting in a total of 846 primary cancer samples, including 601 with survival annotation. All samples were classified according to each of the four multigene classifiers. We confirmed the independent prognostic value of the Moffitt “tumor”, Moffitt “stroma”, and Bailey’s classifications, but not that of the Collisson’s classification. Despite a relatively low gene overlap, all classifications were associated with pathological grade, an important prognostic feature and reflect of intrinsic molecular characteristics of tumors. The concordance rate in term of “good-prognosis” vs. “poor-prognosis” prediction by classifiers was relatively high (from 73 to 86%) between the three “tumor” classifications based on tumor gene lists (Collisson, Moffitt “tumor”, Bailey), but low (from 50 to 60%) with the Moffitt’s stroma classification based on stroma genes. Multivariate analysis incorporating the four classifiers together retained as significant variables the Moffitt “stroma” and Bailey classifications, highlighting the complementarity of classifiers based on tumor epithelium (Bailey) and tumor stroma (Moffitt stroma). Our results reinforce the clinical validity of subtyping in PDAC, which should be regarded as a collection of separate diseases. Beside their clinical utility that remains to be demonstrated, the clinical interest of the subtypes, notably those from Bailey’s and Moffitt’s “stroma” classifiers that show independent prognostic value, will be reinforced by the identification of new biomarkers and/or therapeutic targets in each subtype for designing and testing novel specific targeted therapies. Electronic supplementary material The online version of this article (10.1186/s12943-017-0739-z) contains supplementary material, which is available to authorized users.

Pancreatic ductal adenocarcinoma (PDAC) is one of the most aggressive human cancers [1]. Its incidence is rising [2] and the therapeutic advances have achieved only limited impact. As demonstrated for breast cancer [3], the identification of molecular subtypes allows a better definition of the clinical heterogeneity of cancers and the design of targeted therapeutic strategies. Recently, three studies have identified biologically and clinically relevant molecular PDAC subtypes based on gene expression profiles. In 2011, Collisson et al. defined three subtypes ("classical", "quasimesenchymal", "exocrine-like") based on surgical microdissected epithelial tumor samples and associated with overall survival (OS) in multivariate analysis in 27 informative samples [4]. Moffitt et al. "separated" the stroma from the epithelial pancreatic tumor by virtual microdissection and identified two "stroma subtypes" ("normal" and "activated") with different OS in a 108-patients series, and two "tumorspecific subtypes" ("classical" and "basal-like") with different OS in a 125-patients series with subsequent validation in 96 patients [5]. Bailey et al. defined four subtypes ("squamous", "pancreatic progenitor", "immunogenic", "aberrantly differentiated endocrine exocrine (ADEX)"), associated with different OS in multivariate analysis in the series of 96 patients [6].
Two important questions remain regarding these classifications established in relatively small series. The first one concerns the robustness of their prognostic value, notably the Moffitt's classification, the Collisson's and the Bailey's classifications having been recently challenged in respective series of 118 patients [7] and 364 patients ( [8]. The second question is whether these classifications provide redundant clinical information regarding outcome prediction for individual patients. Here, we have analyzed a large series of 846 patients with two objectives: to validate the prognostic value of the Moffitt classifications and to compare the four classifications.

Results and discussion
We collected clinical and gene expression data of PDAC samples from 15 public data sets (Additional file 1) selected according to the following criteria: availability of data in the GEO, Array-Express, EGA, or TCGA databases, and presence of at least 20 samples. Three data sets [4,6,9] had been previously used across the three earlier studies. A column indicating the sets previously used in these earlier studies The final set contained 846 primary cancer samples, including 819 non-microdissected samples and 27 Collisson's epithelium microdissected samples (3% of cases). Before analysis, expression data were normalized as described [10]. Briefly, we first normalized each DNA microarray-based data set separately, by using quantile normalization for the available processed data from non-Affymetrix-based sets (Agilent, Illumina), and Robust Multichip Average (RMA) with the non-parametric quantile algorithm for the raw Affymetrix data sets. Then, we mapped hybridization probes across the different technological platforms present. We used SOURCE (http:// smd.stanford.edu/cgi-bin/source/sourceSearch) and Entrez-Gene (Homo sapiens gene information db, release from 09/ 12/2008, ftp://ftp.ncbi.nlm.nih.gov/gene/) to retrieve and update the non-Affymetrix gene chips annotations, and NetAffx Annotation files (www.affymetrix.com; release from 01/12/2008) for the Affymetrix annotations. The probes were then mapped according to their EntrezGen-eID. For the TCGA data, we used the available normalized RNASeq data that we log 2 -transformed. Finally, we defined the molecular subtype of each sample in each data set separately as defined in the original publications: the three Collisson's subtypes [4], the two Moffitt's "tumor-specific subtypes" [5], the two Moffitt's "stroma subtypes" [5], and the four Bailey's subtypes [6].
We then compared the four classifications according to several criteria. Regarding the gene composition, the crossing of the four gene lists (Bailey: 707 genes; Collisson: 62 genes; Moffitt's "tumor": 50 genes; Moffitt's "stroma": 48 genes) showed ( Fig. 2a, Additional file 3) many more genes in common between the Bailey's, Collisson's, and Moffitt's "tumor" lists -respectively derived from bulk tumor tissues, microdissected tumor tissues, and bulk tumor tissues but with virtual microdissection retaining the tumor epithelial cell genes -than between each of them and the Moffitt's "stroma" list, derived from tumor tissues but with virtual microdissection retaining the stromal genes only. Thirtyseven of 62 Collisson's genes (58%) and 32 of 50 Moffitt's "tumor" genes (64%) were included in the Bailey's list in which they represented only 5% of genes, whereas 8 of 62 Collisson's genes (13%) were included in the Moffitt's "tumor" list, in which they represented 16% of genes. There was only one gene in common between the Bailey's or Collisson's lists and the Moffitt's "stroma" list (Additional file 4). The mean percentage of common genes between each list and the three other ones was 3% for the Bailey's list, 24% for the Collisson's list, 27% for the Moffitt "tumor" list, and 1% for the Moffitt "stroma" list, suggesting little overlap. Several methodological explanations account at least in part for this discrepancy, as reported for prognostic signatures in breast cancer [11,12]: different samples (whole-tumor for Bailey, microdissection for Collisson, and virtual microdissection for Moffitt), different patients, different technological platforms (DNA microarrays, RNA-Seq) with different tested gene sets for DNA microarrays, different methods of data handling, notably different cut-offs of significance for the retained genes, But the discordance may also be only apparent because discriminator genes, even if different through classifiers, may be involved in the same pathways or cell processes.
We compared the correlations between all classifications and clinicopathological data (Additional file 2). Some classifications were associated with age (Moffitt "tumor"), pathological type (Moffitt "tumor"), pathological tumor size (Moffitt "stroma", Collisson, Bailey), whereas all were associated with pathological grade, an important prognostic feature of PDACs. Based on the glandular cell differentiation, mitotic index, and nuclear atypia [13], the grade directly reflects important molecular characteristics of tumors, likely explaining its association with all these molecular classifications.
Next, we assessed the concordance of the four classifications in term of assignment to the poor-prognosis and goodprognosis groups in the 601-sample series. We combined the Bailey's pancreatic progenitor, immunogenic and ADEX subtypes into a single good-prognosis group because their survival curves were not different. Similarly, we combined the Collisson's classical and exocrine-like subtypes into a single good-prognosis group. In the Moffitt's "tumor classification", the "basal-like" subtype represented the poorprognosis group, as did respectively the "activated" subtype in the "stroma classification". Next, we compared the results of the classifications by using two-way contingency-table analyses. All comparisons showed significant correlations.
The concordance rate (Additional file 5) was high (from 73 to 86%) between the three classifications based on gene lists derived from tumor tissues, and decreased when considering the concordance with the Moffitt's "stroma classification". Analysis based on Cramer's V statistic (Fig. 2b) showed that the relation was strong between the Bailey and Moffitt's "tumor" classifications and the Bailey's and Collisson's classifications, substantial between the Moffitt's "tumor" and Collisson's classifications, and low between each "tumor" classification (Bailey, Moffitt's "tumor", and Collisson) and the Moffitt's "stroma classification". With regard to the Cramer's V values, the models showing the best and the worst agreements with the other ones were the Bailey's classification and the Moffitt's "stroma classification", respectively. Thus, despite this little gene overlap, three of the four gene lists tested showed significant agreement in the outcome predictions for individual patients, probably tracking a common set of biologic phenotypes likely in part related to pathological grade. Of note, such high concordance further a c b d Fig. 1 Overall survival in patients with pancreatic cancer according to the four molecular classifications. a Kaplan-Meier OS curves according to the two Moffitt's "tumor" subtypes. b Similar to A/, but according to the two Moffitt's "stroma" subtypes. c Similar to A/, but according to the four Bailey's subtypes. d Similar to A/, but according to the three Collisson's subtypes. P-value is for the log-rank test Finally, we compared the prognostic value of all classifications. The 2-years OS in the Bailey's classification were 23%, 48%, 56% and 46% in the squamous, pancreatic progenitor, immunogenic, and ADEX subtypes respectively (p = 5.78E-08, Fig. 1c). In multivariate analysis, the Bailey's classification remained significant (p = 1.69E-02, Table 1). The 2-years OS in the Collison's classification were 25%, 44%, 45% for the quasi-mesenchymal, exocrine-like, classical subtypes, respectively (p = 1.65E-03, Fig. 1d), but this classification lost its prognostic value in multivariate analysis (Table 1), as previously reported [7]. The comparison of the three other multivariate analyses including the clinical variables together with a molecular classification (Moffitt "tumor", Moffitt "stroma", Bailey) showed the most significant p-value with the Moffitt's "tumor classification". But multivariate analysis incorporating the four classifiers retained as significant the Moffitt's "stroma" and Bailey's classifications, suggesting independent prognostic value ( Table 1). Similar results were observed in uni-and multivariate analyses when the 27 Collisson's microdissected samples were excluded from analyses (data not shown), suggesting no impact of microdissection on our results.

Conclusion
This prognostic analysis of molecular subtypes in PDAC is, to our knowledge, the largest series reported to date and the first study comparing these four promising classifications. We confirmed for the first time the independent prognostic value of the Moffitt's classifications, and confirmed that of the Bailey's classification, but did not confirm that of the Collisson's classification. The gene overlap between all classifiers was low; there were many more common genes between the Collisson's and Moffitt's "tumor" gene lists and the Bailey's gene list, derived in part or in totality from tumor cells, than between each of them and the Moffitt's "stroma" gene list, derived from stromal genes only. Despite this little overlap, all classifications were associated with pathological grade. The concordance in term of outcome predictions was relatively high (from 73 to 86%) between the three classifications based on gene lists derived from tumor tissues, and low when considering the concordance with the Moffitt's stroma classification. Despite higher prognostic value for the Moffitt's "tumor classification" taken alone, the multivariate analysis incorporating the four classifiers together retained as independent variables the Moffitt's "stroma" and Bailey's classifications, highlighting the complementarity of classifiers based on tumor epithelium and stroma.
Our study displays some limitations related to the retrospective nature of data sets and associated biases, including the absence of information with respect to survival for all samples, However, our results reinforce the clinical validity of subtypes in PDAC. Of course, their clinical utility [14], or ability to improve patients' management and outcome, remains to be demonstrated in prospective clinical trials, The clinical potential of subtyping is important. It provides new insights into the molecular pathophysiology of pancreatic cancer which may be used to tailor therapies. Many phase III clinical trials have failed to show benefit of tested agents in unselected patients with advanced-stage pancreatic cancer, although benefit was observed in occasional patients who may represent a given subtype in which they are selectively effective. For example, Collisson et al. defined preclinical models of their three subtypes and showed that gemcitabine and erlotinib were preferentially active in different subtypes [4]. Subtyping may also provide prognostic b a Fig. 2 Comparison of the four molecular classifications. a Venn diagram comparing the gene lists of the four subtype classifications. b Heatmap of Cramer's V statistic reflecting the strength of the correlations between the classifiers in term of assignment to the poor-prognosis and goodprognosis groups. The V statistic values are color-coded according to the scale shown below the heatmap support in a clinical setting where the choice and timing of therapies is critical. For example, in early-stage disease, the subtypes could help select patients with resectable disease for either immediate surgery (for the good-prognosis subtypes) or neoadjuvant chemotherapy (for the poorprognosis subtypes), which ultimately should affect outcome and impact quality of life. But of course, the clinical utility of subtyping remains to be prospectively demonstrated before any use in clinical routine. But yet, from a conceptual point of view the strong biological and prognostic differences observed yet suggest that PDAC should be regarded as a collection of separate diseases, providing a more homogeneous and favorable environment for identifying new prognostic and/or therapeutic targets and testing new therapies. Three take-home messages derive from our results. First, we need to identify, probably from the Moffitt's "stroma" and Bailey's classifiers given their complementary prognostic value, new biomarkers and/or therapeutic targets, which will reinforce the clinical interest of subtypes. Potential therapeutic targets include for example immune modulators such as checkpoints inhibitors in the Bailey's immunogenic subtype, drugs "normalizing" the TP53, TP63 and KDM6A pathways frequently altered in the Bailey's squamous subtype, or drugs targeting PDAC stromal components, notably the pancreatic stellate cells or specific fibroblast subsets [15], in the Moffitt's "stroma" subtype. Second, PDAC subtypes are predictive of the prognosis, with the mesenchymal subtype being the worst of all, as already reported in breast cancer with the basal subtype [3]. Third, the pancreatic tumor microenvironment contributes to the prognosis, and adds prognostic information to classifiers based on "tumor epithelium" genes.