Cell culture
The murine renal cancer RENCA cell lines were maintained in RPMI-1640 (Eurobio) supplemented with 10% (v/v) FBS and 1% (v/v) penicillin-streptomycin, and incubated at 37 °C with 5% CO2. For the generation of GFP-expressing cells, a lentiviral vector (pRRLsin-MND-eGFP-WPRE) was obtained from the vectorology platform of the University of Bordeaux (Vect’UB), and used for infection of RENCA cells. Authentication of parental cell line was conducted by Microsynth on December 2020.
Implantation,-extraction and tumor cell purification
For sub-capsular implantations, 1 × 105 GFP-RENCA cells were injected under the left kidney capsule of 6 weeks old female BALB/c mice (Charles River Laboratories), whilst for intravenous injections 5 × 106 cells were injected into the caudal vein. When a mouse from a group reached an endpoint, all mice from that group were sacrificed, and both primary tumors or lung metastases were collected. For tumor cell extraction and purification, tissues were cut into small pieces with a scalpel and digested with Collagenase I and Collagenase II (Liberase TL, Roche) for 1 h at 37 °C. Subsequently, digested tissues were filtered using cell strainers (100 μm, 70 μm and 40 μm) and cultured in complete RPMI-1640. Cell cultures were checked daily and passaged as necessary for around 2 weeks, until GFP-negative cells could not be detected.
GFP-RENCA purity was assessed either by fluorescence microscopy or flow cytometry using BD Accuri C6 (BD Bioscience). Finally, GFP-RENCA cell lines were collected for analysis or re-implanted into a new set of mice for the subsequent in-vivo passage.
Mice were housed in the animal facility of Bordeaux University (Animalerie Mutualisée, Université de Bordeaux, France). All animal experiments were approved by the “Ministère de l’Enseignement Supérieur, de la Recherche et de l’Innovation (MESRI)” (authorization numbers 2,016,072,015,478,042; 2,015,110,618,597,936 and 2,015,070,315,335,217), and were carried out in accordance with the approved protocols.
RNA extraction, transcriptomic and qPCR gene expression analyses
Total RNA was extracted using the RNeasy Plus Mini Kit (#74134, Qiagen), according to the manufacturer’s protocols. For the analysis of the transcriptomic profiles of generated cancer cell lines, we used SurePrint G3 Mouse Gene Expression Microarrays (G4852A, Agilent). Instead, for Real-Time qPCR analysis, 1 μg of total RNA was reverse-transcribed using the high-capacity cDNA reverse transcription kit (Applied Biosystems, 4,368,814). Then, cDNAs were analyzed using either EurobioProbe or EurobioGreen master mix (Eurobio Scientific), and StepOne Real-Time PCR System (Applied Biosystems). Human HPRT1 or mouse α-Tubulin were used as housekeeping genes. List of primers is summarized in Supplementary Table S3.
Elisa
Human plasma samples were collected from UroCCR cohort and analyzed by ELISA, according to the manufacturer’s protocols: human SAA2 (DLdevelop, DL-SAA2-Hu-96 T), Human CFB (Abcam, ab137973).
Transcriptome data generation and analysis
Counts of samples by group and passages:
Group
|
P0
|
P2
|
P3
|
P4
|
P5
|
P6
|
---|
Parental cell line
|
2
|
NA
|
NA
|
NA
|
NA
|
NA
|
KPT
|
NA
|
2
|
2
|
3
|
5
|
5
|
T-LM
|
NA
|
2
|
3
|
4
|
6
|
4
|
L-LM
|
NA
|
1
|
5
|
3
|
4
|
3
|
From the log2 scale normalized data set, Principal Component Analysis (PCA, function prcomp of stats R package (v3.6.2) with the parameter center = T) was performed on parental cell line (P0) and each passage resulting in P0-P3, P0-P5 and P0-P6 PCA.
For the P0-P6 PCA, genes with the most important association were selected by keeping genes whose contributions were above the mean of all contributions for PC1 and PC2. This resulted in a set of 5194 genes. The mean value was computed for each passage P0 to P6 and used for the heatmap using the pheatmap R package (v1.0.12, clustering_method = “ward. D2”, scale = “row”).
Differential Expression Analysis (DEA) was performed between the passage 3 and 6 for each group separately (KPT, T-LM, K-LM) with the limma R package (v.3.42.2 [41]). A gene was considerate as a Differentially Expressed Gene (DEG) if its adjusted p-value was ≤0.01 (Benjamini & Hochberg (BH) method (CIT4)). The results are summarized in the following table.
|
KPT
|
T-LM
|
K-LM
|
---|
Up-regulated DEG
|
454
|
714
|
223
|
Down-regulated DEG
|
210
|
145
|
92
|
Total DEG
|
664
|
859
|
315
|
Then, enrichment analysis was done for DEG sets of each group separately. To perform enrichment analyses we used hypergeometric test (enricher function of clusterProfiler R package [42]; v3.10.1) with go_terms.mgi downloaded on Mouse Genome Database (MGD) at the Mouse Genome Informatics website (URL: http://www.informatics.jax.org [43] (04/2019). A GO term was considered significantly enriched if its adjusted p-value was ≤0.05 (BH method). Then, we computed a z-score value as an additional indicator of the direction of the dis-regulation of the GO term as: z-score = (up−down)/sqrt(count) where up / down are the number of assigned genes up- or down-regulated, respectively, in the GO term [44]. Finally, we searched for commons GO terms between each possible combination of KPT, T-LM and K-LM groups. The top 5 GO terms were selected for each group and combined group based on the gene counts and the z-score.
Biomarker strategy
Using the transcriptomics data obtained from our cell lines, we generated a list of highly differentially and progressively expressed genes. This list corresponded to the intersection of the gene sets selected as described in the steps (i) and (ii).
(i) Highly differentially expressed genes between P0 and P6.
To select genes with the highest differential expression between the parental cell line and the P6 passage, we used a z-score approach. First, we computed the log fold change (logFC) for each gene, and then the mean and standard deviation of all the logFC, obtaining at the end a z-score for each gene. A gene was considered as highly differentially expressed if its absolute z-score value was ≥2.58 (corresponding to a p-value of 0.01) and if its absolute logFC was ≥2.
(ii) Genes with progressive expression patterns.
We captured genes with a progressive expression pattern following the same direction during all passages with the limma R package. For this, we identified DEG (adjusted p-value ≤0.01) for each following comparison: P0-P6, P0-P2, P2-P3, P3-P4, P4-P5 and P5-P6. A gene was considered as progressive if it was differentially expressed for the P0-P6 comparison and if the direction of its differential expression was the same through all other comparisons. Stable states with no significant difference between P0-P2, P2-P3, P3-P4, P4-P5 and P5-P6 comparisons were allowed. To ensure that the genes were specific to each generated cell line, we then selected only progressive genes through all passages and late passages, by keeping progressive genes differentially expressed in P0-P6, P0-P6 and P4-P5, and P0-P6 and P5-P6.
Next, from our selected highly differentially and progressively expressed gene list, we converted Mus Musculus gene names to Human gene names using the conversion table from Biomart website (https://www.ensembl.org [45]).
Subsequently, we generated potentially clinically relevant gene signatures using the TCGA-KIRC cohort, as described in the following steps (iii) and (iv).
(iii) Selection of prognostic human genes.
For each gene, the KIRC cohort was segregated in 3 groups based on the expression. Then, we fitted a Cox proportional hazard regression model based on overall survival (OS) and disease-free survival (DFS) time. The cox proportional and log rank hazard ratio (HR) values were computed according with the differential expression (up or down) of the gene identified in the previous steps. A gene was selected if its HR was ≥2 and if the adjusted p-value (BH method) of its log-rank test was ≤0.01 for OS or DFS.
(iv) Aggregation of relevant genes as signatures.
To measure the clinical relevance of the resulting signatures, we used the SigCheck R package (v2.18, [46]). To separate samples into groups, we computed a score for each sample which corresponded to the mean value over all the expression values in the signature (scoreMethod = “High” in the sigCheck function). Patients were then ranked by their scores and split in 3 groups (high, medium, low) to perform a log rank test and compute associated HR. Signatures were tested for OS of all patients and DFS of only non-metastatic M0 patients.
(v) Validation of the signatures.
We validated the significance of each signature after adjustment for clinical variables (Fuhrman grade and TNM stage) with a multivariate Cox regression analysis (ggforest function). To show that the signatures were significantly more associated with OS and DFS outcomes than random predictors, we compared the performance of each signature with 1000 signatures composed of the same number of randomly-selected genes (sigCheckRandom function).
TCGA KIRC cohort
TCGA Kidney Renal Clear Cell Carcinoma (KIRC) HiSeqV2 data were downloaded from XenaBrowser [47]. We chose log2(x + 1) transformed RSEM normalized count (version 2017-10-13) as recommended on its web site (https://xenabrowser.net/; page: “dataset:gene expression RNAseq-IlluminaHiSeq percentile”). We removed genes whose expression had null values in more than 2 third of all samples (2360 genes). Complementary associated clinical data were downloaded from cbioportal (www.cbioportal.org/, [48]). Only primary tumor samples were used remaining to 533 patients whose 352 had a M0 status.
Human patient samples
Patient samples (tumor tissue and plasma) from the UroCCR cohort were used with associated clinical data (clinicaltrial.gov, NCT03293563). Eligible patients for SUVEGIL and TORAVA trials were at least 18 years of age and had metastatic ccRCC histologically confirmed, with the presence of measurable disease according to Response Evaluation Criteria in Solid Tumors v1.1.
For SUVEGIL and TORAVA cohorts, patients did not received previous systemic therapy for RCC, and were eligible for sunitinib or bevacizumab treatment in the first-line setting. Patients were ineligible if they had symptomatic or uncontrolled brain metastases, an estimated lifetime less than 3 months, uncontrolled hypertension or clinically significant cardiovascular events (heart failure, prolongation of the QT interval), history of other primary cancer. All patients gave written informed consent. Tumors were assessed at baseline and then every 12 weeks by thoracic, abdominal, pelvic and bone CT scans. Brain CT scans were performed in case of symptoms. This cohort includes patients from the SUVEGIL (24 patients) and TORAVA (35 patients) trials. The SUVEGIL trial (clinicaltrial.gov, NCT00943839) was a multi-center prospective single-arm study. The goal of the trial is to determine whether a link exists between the effectiveness of therapy with sunitinib malate and development of blood biomarkers in patients with kidney cancer. Patients received oral sunitinib (50 mg per day) once daily for 4 weeks (on days 1 to 28), followed by 2 weeks without treatment. Courses repeat every 6 weeks in the absence of disease progression or unacceptable toxicity. The TORAVA trial (clinicaltrial.gov, NCT00619268) was a randomized prospective study. Patient characteristics and results have been previously described [43]. Briefly, patients aged 18 years or older with untreated metastatic ccRCC were randomly assigned (2: 1: 1) to receive the combination of bevacizumab (10 mg kg− 1 iv every 2 weeks) and temsirolimus (25 mg iv weekly) IFN-α (9 mIU i.v. three times per week), or one of the standard treatments: sunitinib (50 mg per day orally for 4 weeks followed by 2 weeks off) [49]. These studies were approved by the ethic committee at each participating center and run in agreement with the International Conference on Harmonization of Good Clinical Practice Guideline. Blood samples were collected during the inclusion visit (baseline).
Validation of signatures in a cohort treated with immunotherapy
The relevance of the KPT, K-LM and T-LM signatures, as well as SAA2, CFB and SAA2-CFB combination were also tested in a two cohorts treated with either everolimus (mTOR inhibitor, 130 patients) or nivolumab (anti-PD-1, 181 patients) immunotherapies [13]. We used.
SigCheck R package (v2.22, [46] with sigCheck function (scoreMethod = “High”). Signatures were tested for OS and Progression-Free Survival (PFS). To show that the signatures were significantly more associated with OS and PFS outcomes than random predictors, we compared the performance of each signature with 1000 signatures composed of the same number of randomly-selected genes (sigCheckRandom function).
Computational analysis
Machine learning analysis for right-censored data was done using the Random Survival Forest (RSF) algorithm, using the RSF implementation of the randomForestSRC R package. All RSF models were fitted using 1000 trees with the log-rank splitting rule. The impact of covariates on the DMFS curves was assessed by using the forest-averaged minimal depth (similar to the analysis done in Ref A3).
To analyze clinical DMFS data we developed a mathematical model that is fully described in Ref A4. First, we developed a mechanistic model of primary tumor growth and metastasis dissemination assuming Gompertzian kinetics for tumor growth and metastasis dissemination being proportional to the primary tumor growth. The full metastatic process was described by the following transport equation
$$\left\{\begin{array}{c}{\partial}_t{\rho}^i\left(t,v\right)+{\partial}_v\left({g}^i(v){\rho}^i\left(t,v\right)\right)= 0\\ {}{g}^i\left({V}_0\right){\rho}^i\left(t,{V}_0\right)={\mu}^i{V}_p^i(t)\\ {}{\rho}^i\left( 0,v\right)= 0\end{array}\right.$$
where the function ρi(t,ν) represents the distribution of metastatic tumors with size ν at time t for the individual patient i. The growth function gi is defined by gi(ν) = (αi-βi log(ν))ν and μi is the individual metastasis dissemination rate. The primary tumor volume Vip(t) follows \({V}_p^i(t)={e}^{\left(\frac{\upalpha^i}{\upbeta^i}\left(1-{e}^{\left(-{\upbeta}^it\right)}\right)\right)}\).
Second, the individual time to recurrence TTRi (considered as the time elapsed from diagnosis to the appearance of the first visible metastasis) was defined as a function of three individual parameters: TTRi = f(Vidiag, αi, μi), where Vidiag, αi and μi are the volume at diagnosis, growth rate and metastasis dissemination rate respectively. Specifically, Nvis(TTRi) = 1 with \({N}_{vis}(t)={\int}_{{\mathrm{V}}_{vis}}^{+\infty }{\uprho}^i\left(t,v\right) dv\) with Vvis a threshold assumed to represent minimal visible size of a tumor and taken to Vvis = 5 mm. in diameter. We then assumed log-normal distributions for the parameters (αi, μi) with parameters log(αi) = log(αpop) + ηi(α) where ηi(α) ~ N(O, ω2(α)) and log(μi) = log(μpop) + ηiμ where ηiμ ~ N(O, ω2μ). The population distribution of TTRs allows to define the population DMFS curves as DFMS(t; αpop, μpop, ωa, ωμ) = P[TTR > t].
We analyzed three possible effects of the most important covariates on tumor growth and metastasis dissemination and selected the effect with the lowest difference between the model predictions and the clinical data.
For a given patient, we mathematically computed 10,000 in silico replicates with identical values for the covariates FG, CFB and Saa2, and calculated the proportions of virtual patients with metastasis at diagnosis and patients without metastasis after 5 years. We defined the “predicted TTR” as the median of the 10,000 TTR. Harrell’s C-index was computed using the predicted TTR and the clinical TTR values, allowing for right-censored data. Confidence intervals were calculated bootstrapping the data 100 times and using the percentile method.
Statistical analyses
Mann-Whitney U test and unpaired two-tail Student t-test were used for in vivo and in vitro, respectively, experiments. p-values < 0.05 were considered statistically significant (* p < 0.05; ** p < 0.01; *** p < 0.001). Statistical analyses were performed using either R studio (R v3.6.2 [50], R studio v1.2.5033 [51]) or GraphPad Prism (version 6.00 for Windows, La Jolla California USA, www.graphpad.com). Survival analysis were performed using survival ([52], v3.2–7) survminer (v0.4.8). Analysis of transcriptomics, methylomics, enrichment and signature computations were performed using R.
For Transwell migration assay, Tissues processing and immunofluorescent analysis, Low-coverage whole-genome sequencing and Methylomics data generation and analysis, see Supplementary Materials and Methods.