Modeling survival in colon cancer: a methodological review

The Cox proportional hazards model is the most widely used model for survival analysis because of its simplicity. The fundamental assumption in this model is the proportionality of the hazard function. When this condition is not met, other modifications or other models must be used for analysis of survival data. We illustrate in this review several methodological approaches to deal with the violation of the proportionality assumption, using survival in colon cancer as an illustrative example.


Background
Several methods for estimating survival probability of populations from patient samples have been proposed since the first systematic approach in 1950 [1]. One of the oldest and most straightforward methods for analyzing survival data is to compute the Life Table. This method, proposed by Berkson and Gage [1] for studying cancer survival, uses an enhanced frequency distribution approach. To compute a Life Table, the range of survival times for all patients is divided into subintervals. For each interval, one computes the number and proportion of cases that entered the interval "alive." the number and proportion of cases that "died", and the number of cases that were lost or "censored" in the respective time interval. An observation is censored if the subject leaves the study or is alive when the study ends. Appropriate manipulation of these quantities allows estimation of parameters of interest related to the survival distribution.
While the Life Table method worked well for a homogeneous sample, it did not address a primary goal of cancer research, namely to determine whether or not certain continuous and/or categorical variables are related to the sur-vival times. This need led to the application of regression methods for analyzing survival data. The standard multiple linear regression model is not well suited to survival data for several reasons; among these are (i) survival times are typically not normally distributed, and (ii) censored data is commonplace, resulting in missing values for the dependent variable (survival time). Early attempts to circumvent these problems involved applying the log transform to survival time, but this worked well only when censoring was present in a very small percentage of the observations (Everitt and Rabe-Hesketh, [2]). Two important developments that have greatly enhanced survival analysis methods are the derivation of a nonparametric method for constructing a survival curve from censored data by Kaplan and Meier [3], and the Proportional Hazards (PH) model proposed by Cox [4]. With the rapid improvements in the graphics capabilities of personal computers over the last 20 years, the use of the Kaplan-Meier method has become so popular that survival curves are often referred to as "Kaplan-Meier curves". An example of a survival curve estimated using the Kaplan-Meier approach is shown in Figure 2. The Cox model, a rnultivariate semiparametric regression model, is now the most widely used in clinical studies to characterize disease progression on existing cases by revealing the importance of covariates. It is the most popular model for survival analysis due to its simplicity. The proportional hazards model is the most general of the regression models because it is not based on any assumptions concerning the nature or shape of the underlying-survival distribution. The model assumes that the underlying hazard rate (rather than survival time) is a function of the independent variables (covariates); no assumptions are made about the nature or shape of the hazard function.
The hazard function (developed more fully in section 2) is a non-negative function (of time) which can be thought of as reflecting the change in an individual's probability of death in the immediate future, given that the individual has survived up to the current time. In humans, the probability of death is higher immediately after birth than it is when the newborn becomes more mature; consequently, the hazard function decreases with age through the maturation process. Subsequent to maturation, there is a long period where an individual's probability of death (in the immediate future) is relatively low and changing very little with the passage of time; here, the hazard function is rather flat and closer to zero. In the final stages of life, this probability increases with increasing age, resulting in an increase in the hazard function.
This pattern of change gives rise to a U-shaped (or "bathtub shaped") hazard function for all cause mortality in humans. The hazard is sometimes referred to as the "force of mortality" or the "conditional failure rate".
Mathematically, the hazard function is simply a re-expression of the survival function, in that specification of either one of these uniquely determines the other. The hazard function, however, has more visual appeal in that it directly displays the time periods over which changes to the risk of death in the immediate future are occurring. To identify these periods from the survival function (rather than the hazard function), the analyst would have to look for sharp drops and flatter sections of the survival curve. The hazard function displays the information more directly.
The Cox model relates the hazard function of an individual at time t, with a vector X = (X 1 , X 2 ,..., X p ) of p covariates (explanatory or predictor variables), to a baseline hazard function h 0 (t) via a log-linear function: where β = (β 1 , β 2 ,..., β p ) is a vector of coefficients. An important consequence of this formulation, and the reason for the name "Proportional Hazards Model", is that the hazard ratio function (HRF) for two individuals with covariates X and X* does not depend on time; h(t; X) is a constant multiple of h(t; X*). It is also important to note that the effects of the covariates on the hazard are assumed to be constant over time. Inference about the regression parameter (β) is possible without making assumptions about the form of the baseline hazard function, h 0 (t); the hypothesis of no association of one or more of the p independent variables with survival can be tested by the likelihood ratio test (LRT) [4]. Figure 1 illustrates the difference between proportional hazards and non-proportional hazards. In Figure 1a the two hazard functions are proportional, and their corresponding HRF (and log hazard ratio function, LHRF) is constant over time as shown in Figure 1b. On the other hand, if the hazard functions are not proportional, they might start at the same value and then diverge, or converge to some common value (Figure 1c), or cross and diverge again, and the corresponding LHRF will not be constant (Figure 1d). Many possibilities arise due to this non-proportionality. For example, the resulting LHRFs may be linear, non linear but monotonic (e.g. logarithmic), or non-monotonic (e.g. quadratic) (Ohno-Machado [5]).
In most applications of the Cox model in cancer research, the goal is to compare survival characteristics between two or more treatment groups. This is accomplished by letting one (or more) of the covariates serve as group indicator variable(s). For example, in comparing a treated group with a control group, we might assign X 1 = 0 for control group cases and X 1 = 1 for treated group cases, and compare the fitted survival curves. In this setting, it follows from the constancy of the HRF that the survival curves for different groups do not cross. In practice, however, the HRF may change over time leading to incorrect inferences (O' Quigley and Pessione [6] and Hess [7]) if the Cox model were applied. Although there are several tests available to check for PH violations (Lin et al [8]), they are rarely used because after rejecting the assumption it may not be known how to model the effect of the predictors (Quantin et al [9]). This article focuses on a review of (a) the Cox model and interpretation of its parameters, (b) assessment of the validity of the PH assumption, (c) the use of time-varying coefficients, and (d) accommodating nonproportional hazards using covariate stratification, partitioning of the time axis, and modeling tune dependence of the regression coefficients.

Proportional Hazards Model
We describe the models in this paper using a synthetic colon cancer survival (SCCS) data set on 100 individuals. The explanatory variables considered are Treatment taking values 1 or 2, Race taking values W and B, and Stage taking values 1,2, and 3. The survival time in months is recorded (Time) and whether the observation was censored (Event = 0). Data for the first five individuals appear in Table 1. For example, subject 1 in this table died after 3 months of followup whereas subject 3 was still alive after 60 months of followup. The number of censored events is 51 and the median of the variable Time is 24. The estimated median survival time will be higher than the sample median because of the censored observations. The explanatory For an individual selected at random from the population, the survival function can be interpreted as the probability this individual survives beyond time t.

Hazard Functions
For real data the survival function is not known but can be estimated from a sample. If everyone in the sample were observed until death the survival function could be estimated by Typically, not all individuals are observed until death so that some of the data are censored. One method for dealing with censored data is to estimate the survival function with the Kaplan-Meier estimator. The Kaplan-Meier estimate for the data in SCCS is plotted in Figure 2. This estimate can be described as a decreasing step function where the steps occur when a death has been observed. The smooth function is the true survival function which we know in this case because the data have been simulated. This population survival function is for a population with the following distributions: Treatment (50% Treatment 1, 50% Treatment 2), Race (80% White, 20% Black), and Stage (40% Stage 1, 30% Stage 2, 30% Stage3). The survival function shows the relationship between survival and time; that is, between the variables Time and Event in the SCCS data set. To describe the effect of other variables, such as Treatment. Race, and Stage, on survival it is convenient to use the hazard function.
The hazard function addresses the question: What is the probability that an individual who has survived to time t dies in the interval t to t + Δt?
As the interval Δt gets close to zero so does the probability of death and the hazard function h(t) is the rate at which the probability goes to zero: The hazard function h(t) is not a probability so it can be larger than one and its value will depend on the units used for time (e.g., days or months). The hazard function is used to give an approximate probability, namely h(t)Δt, for the event that an individual dies in interval t to t + Δt. This approximation is very good for small Δt. The hazard function is also related to the survival function. In fact, the hazard function is an equivalent formulation for the relationship between survival time and the event of death. The proportional hazards model is a simple model to describe the survival between two groups having hazard functions h 1 (t) and h 2 (t). The proportional hazards assumption is that ratio of these hazard functions does not depend on time: where ψ is called the proportionality constant. If ψ > 1 Individuals in group 2 have a greater risk of dying than individuals in group 1, regardless of the time t.
While ψ does not depend on time, it does depend on the explanatory variables in the model, namely Treatment, Race, and Stage. A convenient and readily interpretable linear expression for the logarithm of ψ (which can be any real number, since ψ > 0) is obtained by the proper use of indicator variables according to the following scheme: For Treatment: For Race: For Stage: x Stg2 = 0 and x Stg3 = 0 if Stage is 1 With this coding of the explanatory variables, and appropriate substitution into equation 1, it can be shown that (2) where the β's have been subscripted to reflect their role in the hypothesized causal mechanism, and the subscript on x indicates when the variable takes the value "1" rather than "0". For example. x Trt2 , x RaceW , and x Stg2 are all 1 and x Stg3 is 0 when the individual receives treatment 2, is white and has stage 2 cancer. An individual who received treatment 1, was black, and had stage 1 cancer would have each of these explanatory variables equal to zero. If β Trt2 > 0, individuals receiving treatment 2 have a larger log hazard function than those receiving treatment 1. Expressed in terms of the hazard function, exp(β Trt2 ) > 1 indicates that the hazard function for treatment 2 is proportionally greater than that of treatment 1 and the proportionality is constant for all time t. The generic expression of the hazards proportionality constant is Fitting the proportional hazards model in (2) gives the estimates in Table 2. This model indicates that the effect of receiving treatment 2 rather than 1 is to decrease the  Since the validity of inferences based on the Cox model depends on the proportional hazards assumption, it is desirable to have diagnostic methods for checking this assumption. Many tools are available for checking the PH assumption. These include:

(i) Plots of log(-log S(t))
Suppose the hazards for 2 groups are proportional, say h 1 (t) = ψh 2 (t). It can be shown that this results in the relationship log(-log S 1 (t)) = log ψ + log(-log S 2 (t)); the log(log S(t)) curves for the two groups differ by a constant. Plotting the estimated log(-log S(t)) curves for the two groups, evaluated at the covariate mean values, provides a visual check of the PH assumption. A clear departure from parallelism of these two curves would be consistent with violation of the PH assumption.

(ii) Interactions with Time
Interaction of a covariate x with time can be modeled by including in the model a product term x × (time), resulting in a log hazard function of the form The coefficient β 2 reflects x's dependence on time; if this effect is statistically significant in the fitted model we have evidence for non-proportionality.

(iii) Schoenfeld Partial Residuals
For each covariate, a Schoenfeld residual can be calculated for each case that was not censored. A plot of these residuals against time should be "approximately flat" if the PH assumption holds. We illustrate a check on the proportionality assumption for the SCSS dataset using a Schoenfeld residual plot with a smooth curve fit to these residuals (Grambsch and Therneau [10]).

Dealing with Nonproportional Hazards
In the SCCS example from the previous section we encountered a model with a nonpreportional hazard function where one of the coefficients varied with time t.
In this section we address some ways to cope with time varying coefficients β(t) in the Cox model. Before describing these methods we list the following considerations and caveats.
• Detecting the true time dependence of the parameter can be difficult. The Schoenfeld plot for Treatment in Figure 3 shows that the parameter is not constant but does not indicate the proper functional form. Therneau and Grambsch [11] give examples where the data cannot distinguish two very different functions of time.
• While there are tests to check for nonproportionality (see for example [10]), these need not be effective against some alternatives. In particular, tests based on the slope of β(t) need not be sensitive to quadratic or other nonlinear functions of time.
• Even when a test for nonproportionality is statistically significant, this does not mean the nonproportionality is of practical importance or significance. This potential difficulty arises in any significance test, especially if the sample is large.
• Time varying coefficients can be due to other inadequacies in the model. For example, one or more covariates are not included, the functional form of the covariates is incorrect, or another survival function can more effectively model the data.β β We consider three methods for dealing with time-varying coefficients: stratification of variables with time varying coefficients, partitioning of the time axis, and modeling the dependence of β(t) on time. We discuss each of these using the SCCS example. Other examples of these approaches can be found on pages 145-147 in [11].

Covariate Stratification
Nonproportionality in Treatment can be addressed by fitting a separate baseline hazard function for each level of Treatment. The results for the SCCS data are given in Table  3. In this model, the effects of covariates are the same on each stratum so there is only one set of coefficients that apply to all strata. A disadvantage of this approach is that we cannot test the effect of treatment 1 versus treatment 2. For a variable that can be controlled this would make this approach unacceptable. It could, however, be used if a covariate such as race had a time varying coefficient.

Modeling Time Dependence of β(t)
There are two basic approaches to modeling the dependence of β(t) on t. One is to specify a known functional form for this relationship. For example, β Trt2 (t) = β 0 exp(-ρt) (3) so that the treatment effect diminishes over time. Another is to allow the data to select the functional form of the time dependence. This can be done by using splines. The basic idea is to replace each of the dashed line segments in Figure 5 with a curve such that the curves are connected at time t = 30. The time axis in Figure 5 is partitioned into just two intervals but by increasing the number of intervals a close approximation can be obtained.
We do not use this approach for the SCCS data. One reason is that software for this approach is not readily available. The SAS procedure phreg can be used when the known functional form is linear; however, this will not work for the function in equation (3). A more important reason is that allowing the data to choose the functional form increases the chance of over-fitting the data, and one Schoenfeld residual plot for β RaceW  must be cautious about interpreting the p-values and confidence intervals for the parameters.

Other Methods
The methods given above for dealing with nonproportional hazards all involve modification of the Cox model. Another approach is to simply use a different model. Therneau and Grambsch [11] suggest accelerated failure time or additive hazard models. In the Cox model, the covariates act multiplicatively on the baseline hazard function (1). The additive hazard model: allows the covariates to modify the baseline hazard in an additive way; the effect of the covariate may also vary with time. The additive model measures the additional risk due to the effect of a covariate in absolute terms, whereas the Cox model measures it in relative terms.
Artificial Neural Networks (ANNs) are another class of models that have been used to model cancer data. These are sometimes referred to as "black box" models because of the complex, nonlinear relationships used in the fitting procedure. In the Cox model, the form of the relationship between the covariates and survival is specified; only the parameters of the model are estimated. In contrast to this, with ANNs one tries to estimate both the functional form of the relationships between variables and the parameters that describe those relationships. This limits the ability of these models to identify causal relationships, because it is usually difficult to relate the model parameters to the biological context that generated the data (Ahmed [12]).
Tree Structured Survival Analysis (TSSA) provides another alternative for the study of risk factors (Segal [13]). It is an extension of the Classification And Regression Tree (CART) algorithm developed by Breiman et al [14]. TSSA is an exploratory, nonparametric method that requires no assumptions about the relationship between survival and the potential risk factors (covariates). The primary output of a TSSA algorithm is a "tree structure", whose branches and nodes identify the successive splits of the cases into risk groups with similar survival profiles. TSSA evaluates the relationship between risk factors and survival through recursive partitioning of patients according to their risk factors, with a comparison of survival patterns among the subgroups of patients resulting from each partition. The method not only identifies a set of significant risk factors, but also provides a simple procedure to identify subgroups of participants with an estimate of their associated risk.

Application of Modified Cox Models in Colon Cancer Research
Moreau et al [15] proposed a crude survival model that accounts for the nonproportionality of hazards by modeling the hazard ratio as a step function of the follow-up time (divided into predefined intervals). The application of the piecewise model requires determining both the number and the boundaries of these intervals. Clinicians have conveniently suggested six intervals with the following upper boundaries being 1 month, 6 months, 12 months, 24 months, 60 months, and 120 months (Bolard et al [16]), and employed the 2L program of BMDP Statistical Software analysis (Dixon et al [17]).
Esteve et al [18] fit the baseline hazard function and its dependence on time by partitioning the time axis. However, the coefficients of the covariates were not allowed to depend on time. In contrast, Moreau et al [15] allow the coefficient on age to depend on time and finds the effect of age continues to decline between 24 and 120 months. Moreau et al [15] also allow cancer stage to depend on   time. These considerable differences between models that allow covariates to depend on time and those that do not demonstrate the ability of models with time varying coefficients to provide new insights about these changes.
Esteve et al [18] investigate the time-dependent logarithm of the hazard ratio for each covariate by modeling the data using B-splines (de Boor [19] and Stone and Koo [20]). Using this B-spline model, they find a significantly higher risk of post surgical mortality than the Cox model during the first six months, and by 12 months the reverse happens. Moreover, the Cox model overestimates the impact of age on colon cancer-specific mortality after the first 6 months of follow-up, whereas the impact of age is near zero during Estimates for the dependence of β Trt2 on time   this time period when the B-spline model is used. Clinically, the risk of cancer related death for elderly patients is higher only during the initial few months corresponding to post-surgical period. The estimated effect of later periods of diagnosis shows that there may be some benefits of new treatments in reducing post-surgical mortality, but they do not appear to affect the long-term survival (Giorgi et al [21]).

Software
Software routines for fitting Cox models are available in all of the popular statistical software packages. Here we provide only a brief summary for some of those that are more widely used. It is worth noting that many of these programs have active and well-organized user groups whose members are continually writing add-on functions and macros, usually available at no cost. is a language and environment for statistical computing and graphics. Unlike the other packages mentioned here, R can be downloaded at no cost from the R website. It is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies). Much code written for S runs unaltered under R. Analysis is accomplished via calls to functions; the arguments to these functions include datasets and model specifications, and the function returns appropriate statistics and graphical output. An R add-on package called "survival" (again, available at no cost) contains over 100 files consisting of functions and datasets for survival analysis.

SAS
SAS [24] software has six procedures that perform survival analysis computations. LIFETEST produces life tables and graphs of survival curves. LIFEREG estimates regression models with censored data, but does not allow for timedependent covariates. PHREG fits Cox models, handling both discrete-time and continuous-time data, as well as time-dependent covariates. Three other procedures (LOGISTIC, PROBIT, and GENMOD), while not designed specifically for survival analysis, are effective for estimating survival models in certain settings.

S-PLUS
Marketed by Insightful Corporation, S-PLUS [25] contains a complete array of survival analysis tools, including frailty models, smoothing splines, penalized survival models, parametric survival regression, Kaplan-Meier curves, and Cox proportional hazards models.

SPSS
SPSS [26] is a popular menu-driven general statistical analysis package. It contains menu-driven routines for constructing Life Tables, plotting Kaplan-Meier survival curves, and fitting Cox models with time-varying covariates.

Conclusion
The Cox model, also known as the proportional hazards model, often provides a very good approximation to the survival function and its dependence on covariates. For those situations where the proportional hazards model is inadequate we have described three simple modifications that address nonproportional hazards. One solution is to stratify covariates. A disadvantage of this approach is that it does allow modeling the effect of the covariates used in the stratification. Another solution is to partition the time axis into intervals and require proportional hazards only within intervals. A disadvantage here is determining the number intervals that are necessary. The final approach is to model the time dependence of the coefficients. Interpretation should be tempered by the fact that these models tend to over-fit the data.