Kinome expression profiling and prognosis of basal breast cancers

Background Basal breast cancers (BCs) represent ~15% of BCs. Although overall poor, prognosis is heterogeneous. Identification of good- versus poor-prognosis patients is difficult or impossible using the standard histoclinical features and the recently defined prognostic gene expression signatures (GES). Kinases are often activated or overexpressed in cancers, and constitute targets for successful therapies. We sought to define a prognostic model of basal BCs based on kinome expression profiling. Methods DNA microarray-based gene expression and histoclinical data of 2515 early BCs from thirteen datasets were collected. We searched for a kinome-based GES associated with disease-free survival (DFS) in basal BCs of the learning set using a metagene-based approach. The signature was then tested in basal tumors of the independent validation set. Results A total of 591 samples were basal. We identified a 28-kinase metagene associated with DFS in the learning set (N = 73). This metagene was associated with immune response and particularly cytotoxic T-cell response. On multivariate analysis, a metagene-based predictor outperformed the classical prognostic factors, both in the learning and the validation (N = 518) sets, independently of the lymphocyte infiltrate. In the validation set, patients whose tumors overexpressed the metagene had a 78% 5-year DFS versus 54% for other patients (p = 1.62E-4, log-rank test). Conclusions Based on kinome expression, we identified a predictor that separated basal BCs into two subgroups of different prognosis. Tumors associated with higher activation of cytotoxic tumor-infiltrative lymphocytes harbored a better prognosis. Such classification should help tailor the treatment and develop new therapies based on immune response manipulation.


Background
Breast cancer (BC) is heterogeneous. Gene expression profiling has identified molecular subtypes with different biological features and different outcome [1][2][3][4][5], including basal BCs. Basal BCs, which represent~15-20% of invasive BCs are high-grade tumors, frequently do not express hormone receptors (HR) and ERBB2, and have the worst prognosis overall [6,7]. Yet, basal tumors show prognostic heterogeneity. Both the standard histoclinical features and the recently defined prognostic gene expression signatures (GES) fail to identify patients who will relapse and patients who will not respond to chemotherapy [8]. Defining the molecular bases of this heterogeneity should help better understand these tumors, identify new therapeutic targets and more reliable predictors of survival and therapeutic response.
Kinases, which constitute~1.7% of human genes [9], are activated or overexpressed in cancers [10], and constitute current or future targets for successful therapies [11]. Previously, we identified a 16-kinase GES that improved the prognostic classification of luminal BCs [12]. A similar approach was successfully applied to 44 estrogen receptor (ER)-negative BCs, including ERBB2-positive tumors and less than 50% of basal tumors [13]. To our knowledge, the "kinome approach" has never been applied to basal BCs only.
We tested the hypothesis that the expression of kinase genes may distinguish good-from poor-prognosis basal tumors.

Patients' selection
Institut Paoli-Calmettes (IPC) and public retrospective data sets from BC samples profiled using oligonucleotide microarrays were collected (Additional file 1, Table S1). All were pre-treatment samples of invasive non-inflammatory and non-metastatic adenocarcinomas. Microarray data from our set are available through Gene Expression Omnibus (series entry GSE21653).
The "IPC" training set included 261 patients who underwent initial surgery in our institution between 1992 and 2007. Each patient gave written informed consent and this study has been approved by our institutional ethics committee. All samples were histologically reviewed in a standardized fashion by a pathologist (JJ) to asses the ER, progesterone receptor (PR) and ERBB2 status by immunohistochemistry (IHC), and the percent of cancer cells (superior to 60%). Antibodies used (Dak ® ) were SP1 clone for ER, PgR636 clone for PR and Herceptest™ for ERBB2. The cut-off for positivity was 1% of stained tumor cells for HR, and ERBB2 status (0-3+ score, DAKO HercepTest kit scoring guidelines) was defined as positive if 3+ or 2+ with amplification confirmed by in situ hybridization. Vascular invasion and lymphocytic infiltration were assessed by Hematoxylin and Eosin Staining (HES).

Gene expression data analysis
Details are given in Additional file 2 (Supplementary Material). Data sets were processed as previously described [24]. Briefly, for the Agilent sets, we applied quantile normalization to available processed data. Regarding the Affymetrix sets, we used Robust Multichip Average (RMA) with the non-parametric quantile algorithm as normalization parameter [25]. Quantile normalization or RMA was done in R using Bioconductor and associated packages. The five molecular subtypes were determined using the single sample predictor (SSP) classifier [26].
Other analyses were centered on 771 kinase and kinase-interacting genes, based on an update of the initial kinome description [9,13]. This list was matched with genes available on the Affymetrix U133 Plus 2.0 microarrays used to profile the IPC tumor set, finally retaining 661 genes (Additional file 3, Table S2). Analyses were both unsupervised and supervised. Supervised t-test analysis searched for genes upregulated in basal samples compared to at least one of the four other molecular subtypes, with 5% significance and a false discovery rate (FDR) lower than 5%. To circumvent the problem of dissymmetry of variables with a number of samples inferior to the number of genes being tested [14,[27][28][29][30][31], we grouped the resulting genes with correlated and interdependent expression (gene subsets) in a single "metagene". Metagene expression value is the mean of the normalized expression values of all genes in the respective gene subset. Each metagene is treated as if it were a single gene, thereby reducing data dimension. We defined our metagenes by hierarchical clustering using data median-centered on genes, Pearson correlation as similarity metrics and centroid linkage clustering [32]. We identified robust gene clusters (minimal cluster size and minimal Pearson correlation were 15 and 0.6, respectively) using the quality-threshold (QT) clustering method [32]. A metagene was then computed for each selected cluster, and its prognostic incidence (as continuous value) evaluated using a Cox regression univariate analysis. Once a metagene associated with DFS (5% level significance) was defined, its expression value was used to divide the training set into two subgroups then tested for association with DFS. The cut-off was defined as the best threshold dividing the population into two subgroups with the greater DFS difference, "Metagene-Low" (expression value inferior to the threshold) and "Metagene-High" (expression value above) subgroups. This cut-off was applied to basal tumors of each validation series, and the define subgroups were then pooled before prognostic analysis.
We tested the prognostic value of two recently reported classifiers associated with survival in basal BCs: the medullary BC (MBC) classifier [33] and the HER2derived prognostic predictor (HDPP) [34] associated with survival in both ERBB2+ and basal tumors. We also tested three multigene signatures identified as prognostic in breast cancer, independently of molecular subtypes: the Genomic Grade Index [16], the 76-gene signature [15], and the 70-gene signature [5]. Ontology analysis was done using Ingenuity Pathway Analysis (IPA) software (Redwood City, CA, USA) [35]. We also determined if immune signatures [36] were overrepresented in one prognostic subgroup using the gene set enrichment analysis (GSEA) algorithm and 1000 permutations [37].

Statistical analysis
Correlations between sample groups and histoclinical factors were calculated with the Fisher's exact test and the t-test when appropriate. DFS was calculated from the date of diagnosis until date of first relapse or death using the Kaplan-Meier method, and follow-up was measured to the date of last news for event-free patients. Survival curves were compared with the logrank test. Univariate and multivariate prognostic analyses used the Cox regression method. Univariate analyses tested classical histoclinical factors: age (≤50 years vs. > 50), pathological tumor size (pT≤20 mm vs. > 20), lymph node status (pN positive vs. negative), SBR grade (I vs. II-III), IHC ER status (negative vs. positive), peritumoral vascular invasion (negative vs. positive) and lymphocytic infiltrate. Data regarding the delivery of adjuvant chemotherapy and hormone therapy were also analyzed. Analyses included also binary classifications based on the immune metagene, the MBC and HDDP classifiers (good vs. poor-prognosis subgroups). Multivariate analyses tested all variables with a p-value inferior to 0.05 in univariate analysis and excluded patients with one or more missing data. All statistical tests were two-sided at the 5% level of significance. Analyses were done using the survival package (version 2.30), in the R software (version 2.9.1). Our analysis adhered to the reporting recommendations for tumor marker prognostic studies (REMARK) [38].

Identification of a prognostic kinase expression signature
Five hundreds and ninety-one out of 2515 tumors were basal, including 73/261 in our IPC series and 518/2254 in the public sets (Table 1). These tumors displayed classical histoclinical features of basal BC (Additional file 4, Table S3). Clinical outcome, available for 2109 patients, correlated with subtypes with 5-year DFS of 83% for luminal A, 60% for luminal B, 77% for normallike, 61% for basal, and 61% for ERBB2-overexpressing.
The 73 IPC basal tumors were used as training set for identifying a prognostic kinase GES from the 661-gene list. Supervised analysis identified 581 genes differentially expressed in basal versus at least one other subtype, including 360 genes overexpressed in basal tumors (Additional file 3, Table S2). Within this series most of the patients (90%) received adjuvant chemotherapy. Twenty-five patients developed relapse or death with a median time-to-relapse of 19 months, and forty-eight patients remained disease-free with a median follow-up of 64 months. The 5-year DFS was 63%. Hierarchical clustering of these tumors with the 360-gene set ( Figure  1A) revealed two main clusters, I (n = 24) and II (n = 49), with 5-year DFS superior in cluster I (77% versus 56%; p = 0.22, log-rank test; Figure 1B). QT clustering identified three gene clusters with a major role in this discrimination ( Figure 1A, and Additional file 5, Table  S4). One included 21 genes not related to any specific biologic function. A second cluster was associated with the cell cycle. The third cluster (thereafter designed immune cluster) contained 28 genes, which for many were involved in immune signaling (e.g. BLK, BTK, FYN, SYK, ITK, JAK3, LCK, LCP2, PRKCB, and ZAP70). Visually, lower expression of this cluster was associated with more relapses ( Figure 1A). We built a metagene for each gene cluster, and analyzed their correlation with DFS using univariate Cox regression analysis. Only the immune metagene correlated with DFS (HR = 0.32,  Table S5). Resampling with 100,000 iterations showed only a 0.8% probability to find a metagene built from 28 random genes with similar or better prognostic correlation than the immune metagene. We defined two subgroups of basal tumors according to the immune metagene expression value: "Immune-High" if above the value threshold (n = 25) and "Immune-Low" if under (n = 48). No histoclinical factor including the lymphocyte infiltrate was different between the two subgroups (Additional file 7, Table  S6). Survival was different, with 91% 5-year DFS in "Immune-High" subgroup versus 49% in "Immune-Low" (p = 0.005, log-rank test, Figure 2). On univariate analysis ( They remained significant on multivariate analysis ( Table 2). We also performed a similar analysis on genes underexpressed in basal tumors, but it did not allow the identification of any robust gene clusters.

Validation of the prognostic signature
The expression of the immune metagene was studied in the independent panel of 518 basal tumors not used to define the predictor. Follow-up for DFS was annotated for 380 patients: 158 developed relapse or death with a median time-to-relapse of 30 months, and 222 remained disease-free with a median follow-up of 93 months. The 5-year DFS was 60%. At least 25 out of 28 (mean = 27) genes included in the immune metagene were common to each separate set (Additional file 1, Table S1). A total of 122 patients were defined as "Immune-High" and 396 as "Immune-Low". Their histoclinical features (including the lymphocyte infiltrate available in 56 out of 518 tumors) were well balanced, except SBR grading, more frequently II-III in the "Immune-High" subgroup ( Table  3). The 5-year DFS was 78% in the "Immune-High" subgroup and 54% in the "Immune-Low" one (p = 1.62E-04, log-rank test, Figure 3A), confirming the prognostic value of the immune metagene. Analysis by data set showed that the mean difference of 5-year DFS between "Immune-high" and "Immune-low" cases was 25% (95% CI, [13 -37], p = 0.0038, T-test). On univariate analysis (

Comparison with existing classifiers
Two prognostic multigenic models have been reported in basal BC: the MBC and HDPP classifiers [33,34]. We assessed their prognostic value in the present 518 basal tumors. On univariate analysis, the MBC classifier correlated with DFS, with a HR for relapse of 0.59 (95% CI [0.43-0.82], p = 0.0017) for good-prognosis patients as compared with poor-prognosis patients. In multivariate analysis including this classifier, our immune metagene classifier and lymph node status showed that both genomic classifiers were significant, whereas node involvement was not ( Table 4), suggesting that the multigenic models have independent prognostic value. The HDPP classifier confirmed its prognostic value for ERBB2-overexpressing tumors in our series (n = 214), but not in the 518 basal samples: 5-year DFS was 63% for the good-prognosis patients versus 61% for the poor-prognosis patients (p = 0.62, log-rank test).
We also assessed the prognostic impact of three published major prognostic expression signatures recently reported in early breast cancer. In each data set, each sample was assigned a good or a poor prognosis based on each signature. Data sets were then pooled, and survival was compared between the predicted good-prognosis and poor-prognosis subgroups. Univariate DFS analysis performed in the basal subtype showed that none of these classifiers was associated with survival (Table 5). These results show the absence of informative value of these signatures in the basal subtype, by contrast with our classifier.
Prognostic and/or predictive value of the immune classifier?
To determine the link of the immune metagene with metastatic risk and/or with response to chemotherapy, we analyzed -within the series of 518 basal BCs -the 187 cases with available follow-up who had not received  any adjuvant systemic therapy. In this set, "Immune-High" patients had a longer DFS than "Immune-Low" patients with 5-year DFS of 82 vs. 55% respectively (p = 4.75E-03, log-rank test; Figure 3B). Next, we studied the capacity of our model to predict pathological complete response (pCR) after anthracycline-based neoadjuvant chemotherapy. Information was available for two data sets with the following regimens: weekly paclitaxel and fluorouracil-doxorubicin-cyclophosphamide (55 patients with pCR and 70 without) [22], and fluorouracil-epirubicin-cyclophosphamide or docetaxel followed with docetaxel-epirubicin (34 patients with pCR and 99 without) [23]. We identified 98 basal cases out of the 258 included samples. "Immune-High" patients experienced more pCR (59%) than "Immune-Low" patients (43%), but the difference was not significant (Odds ratio = 1.87, 95%CI [0.57-6.40], p = 0.29, Fisher's exact test).
Altogether, these observations suggested that the immune metagene is associated with relapse risk, whereas its association with response to chemotherapy deserves to be tested in larger series.

The immune kinase metagene correlates with cytotoxic Tcell response
We next sought to elucidate the type of immune response associated with our metagene. Ontology analysis of the 28 genes using IPA software confirmed association with many pathways involved in immune response [35], particularly in lymphocyte activation processes, such as "T-cell receptor signaling", "CD28 signaling in T helper cells", "NK cell signaling", "PLC signaling", "Role of NFAT in regulation of the immune response", "NF-kB signaling", or "IL2 signaling" (Additional file 8 - Table S7, and Additional file 9 - Figure  S1). The upregulation of BTK, CD3E, FYN, ITK, LCK, LCP2, PRKCs, SYK, ZAP70 and JAK3 clearly identified an activated profile of the lymphocytic lineage.
To better explore the molecular differences between "Immune-High" and "Immune-Low" basal BCs, we searched for the genes differentially expressed between the two subgroups in the IPC series using the whole genome and not only the kinome. Supervised analysis (0.1% FDR) identified 532 differential genes. Most of them (n = 506) were overexpressed in "Immune-High" samples (Additional file 10, Table S8A). Ontology analysis showed that these genes were particularly involved in immune response, and more specifically in adaptive immunity (Additional file 10, Table S8B). To confirm this observation, we applied GSEA using reported T-cell, CD8+ T-cell and B-cell expression signatures [36]. As shown in Additional file 11 ( Figure S2), an enrichment of genes involved in T-cell, CD8+ T-cell and B-cell signatures was found in "Immune-High" cases.

Discussion
Basal BCs are poor-prognosis tumors, which require both improvement of our ability to predict the clinical outcome for better tailoring treatment and identification of new therapeutic targets. Their prognosis is heterogeneous, and it is currently impossible to predict which patients will or will not relapse using classical histoclinical factors or the recently reported prognostic GES, notably those currently tested in clinical trials [39]. In the same way, the HDDP classifier [34] identified using ERBB2+ tumors, failed to classify basal samples. Prognostic analyses should be done per subtype [40].
Analysis of kinase and kinase-related genes might help develop new targeted therapies. We report a kinasebased model that divides basal BCs into two subgroups with balanced histoclinical factors but different survival (25% difference for 5-year DFS). This model is based on the expression of an immune 28-gene metagene. Identified in a learning set, its prognostic value was confirmed in an independent data set of 518 cases. The model outperformed the individual current prognostic factors on multivariate analysis, both in the learning and validation sets. Patients with high expression of the immune metagene had a better DFS than other patients. This prognostic value remained when applied to patients treated without any adjuvant chemotherapy, suggesting a link with the metastatic potential. An additional link with chemosensitivity cannot be excluded as "Immune-High" patients experienced a higher, but non significant, pCR rate than "Immune-Low" patients.
Thus, we show that the immune response, and notably the adaptive cytotoxic T H 1-cell response [47], influence survival of basal BC patients. Despite the small size of the independent population with lymphocyte infiltrate data available, which does not allow to really conclude about the impact of the quantity of lymphocyte infiltrate on the expression of immune response-related genes, the absence of correlation between the immune metagene and lymphocyte infiltration in our cohort and in two independent data sets [5,8] as well as the function of genes, suggest that this influence does not depend on the degree of lymphocyte infiltrate, but on the efficiency of its cytotoxic activation status. The differential expression of these "immune genes" is probably also due to a variable expression of epithelial-derived molecules [13,42,48], which activate (in "Immune-High" cases) or repress (in "Immune-Low" cases) the local immune response to the tumor. These hypotheses deserve further investigation to understand the respective role of tumorinfiltrative lymphocytes and cancer cells on cancer history.

Conclusions
In conclusion, we propose a robust prognostic subdivision of basal BC based on the expression of 28 genes, involved in immune response and notably the cytotoxic T-cell response. Tumors associated with higher activation of cytotoxic tumor-infiltrative lymphocytes have a better prognosis, and are likely to better respond to chemotherapy. Such classification should help tailor treatment. Furthermore, since adaptive immunity seems to play a pivotal role [49] immune response manipulation might be an efficient way of treating or preventing these poor-prognosis tumors [47,50].

Additional material
Additional file 1:   Table S6: Histoclinical comparison of the two  basal subgroups defined with the immune metagene (IPC series).
Additional file 8: Table S7: Ingenuity canonical pathways associated with the immune-related cluster.
Additional file 9: Figure S1: Biological network of genes included in or associated with our 28-gene model. A fine-tuning between inhibitor (phosphatases) and activator (kinases) signals regulates lymphocyte anti-tumor immunity. AK and Pyk2 are two of the major kinases that become tyrosine phosphorylated following lymphocyte stimulation. Both are associated to Lck. Lck (lymphocyte specific kinase) and Fyn are cytoplasmic tyrosine kinases of the Src family expressed in Tcells and natural killer (NK) cells, under the T cell receptor (TCR) or Natural cytotoxicity receptor (NCR). Their activity is critical for T and NK cell receptors-mediated signaling, leading to normal T-and NK-cell development and activation. Increased Fyn transcript and protein content in T cells can be observed with high T cell activity. Square 1. LAT is a linker protein essential for activation of T lymphocytes. Its rapid tyrosine-phosphorylation upon TCR stimulation recruits downstream signaling molecules for membrane targeting and activation. LAT is a substrate for Syk/Zap70 kinase and an immediate substrate for both Lck and Syk kinases. Its phosphorylation is an early event leading to T-cell activation. Both Lck and Syk phosphorylate the ITAM-like motifs on LAT, which is essential for induction of the interaction of LAT with downstream signaling molecules such as Grb2, PLC-γ1 and for activation of MAPK-ERK pathways. ZAP70 is thus at the crossroad of several signaling pathways that control lymphocyte development and function and cell survival in response to a wide variety of activator signals coming from the NCR, TCR or other receptor involved in anti-tumor immunity. Square 2. Cytokines receptors express at the membrane also regulate lymphocyte activation through the JAK-STAT signaling pathway. Square 3. In B, T and NK cells, the inhibition of these kinases is mostly mediated by protein tyrosine phosphatases (PTP), regrouping members of the SHP family (SHP-1, SHP-2) or LYP family. These proteins inhibit effector phase functions by dephosphorylating a wide spectrum of phospho-proteins involved in hematopoietic cell signaling.
Additional file 10: Table S8: Genes differentially expressed between the "Immune-High" and "Immune-Low" basal tumor subgroups in the IPC set. (A) Summary of the 532 genes differentially expressed (Student's t-test). (B) Canonical pathways associated with the genes overexpressed in "Immune-High" tumors in the IPC set.
Additional file 11: Figure