First I was afraid, I was petrified: survival modelling in health technology assessment


Gianluca Baio

Department of Statistical Science   |   University College London

g.baio@ucl.ac.uk


https://gianluca.statistica.it

https://egon.stats.ucl.ac.uk/research/statistics-health-economics

https://github.com/giabaio   https://github.com/StatisticsHealthEconomics  

@gianlubaio@mas.to     @gianlubaio    


Graham Dunn Seminar, University of Manchester

7 May 2025

Check out our departmental podcast “Random Talks” on Soundcloud!

Follow our departmental social media accounts + magazine “Sample Space”

Disclaimer…

… Just so you know what you’re about to get yourself into… 😉

Health technology assessment (HTA)

Objective

  • Combine costs and benefits of a given intervention into a rational scheme for allocating resources

Health technology assessment (HTA) is a method of evidence synthesis that considers evidence regarding clinical effectiveness, safety, cost-effectiveness and, when broadly applied, includes social, ethical, and legal aspects of the use of health technologies. The precise balance of these inputs depends on the purpose of each individual HTA. A major use of HTAs is in informing reimbursement and coverage decisions, in which case HTAs should include benefit-harm assessment and economic evaluation. Luce et al, 2010

(Quote stolen from a brilliant presentation by Cynthia Iglesias)

A relatively new discipline

  • Basically becomes “a thing” in the 1970s
  • Arguably, a historical accident
    • Economists take the lead in developing the main theory \(\Rightarrow\) Health Economics
    • But there’s so much more to it (more on this later…)

(Truly…) World-beating Britain

Health technology assessment (HTA)

Objective

  • Combine costs and benefits of a given intervention into a rational scheme for allocating resources

Health technology assessment (HTA)

Objective

  • Combine costs and benefits of a given intervention into a rational scheme for allocating resources

(The problems with) Survival analysis in HTA

  • Time-to-event data constitute the main outcome in a large number of HTAs (e.g. for cancer drugs) \(\Rightarrow\) very relevant!
    • Between 2018 and 2022, 56% of all NICE appraisals for cancer drugs conducted on immature data (Gibbons and Latimer 2024)
  • BUT:…

  1. We may (or may not!) access individual level data for “our” trial, but not for the competitors’
    • “Partition survival modelling” / Digitised data
  1. Often the data are manipulated by the stats team within the sponsor and the economic modellers only get summaries/estimates
    • May lead to inefficiencies/confusion in the modelling…
  1. Crucially, the trial data have a very limited follow up
    • This is often OK(-ish!) for “medical stats” analysis. But HORRIBLE for economic evaluation! \(\Rightarrow\) Extrapolation

Survival analysis

“Standard” analysis generally focused on observed data and median survival time \(\phantom{\displaystyle\int_0^{\infty} S(t)dt}\)

  • Can use flexible/non-parametric methods (e.g. ubiquitous Cox model)
  • Still needs assumptions (e.g. proportional hazards)

Survival analysis in HTA

Focus on decision-making, so needs the mean time: \(\class{ubuntublue}{\displaystyle\int_0^\infty S(t)dt} \Rightarrow\) generally requires extrapolation/parametric modelling!

  • E.g. in the Weibull model
    • \(\bm\theta=(\txt{scale}=\mu; \txt{shape}=\alpha)\)
    • \(S(t\mid \bm\theta)=\exp(−(t/\alpha)^\mu)\)

Survival analysis in HTA

General parametric modelling structure

\[\class{myblue}{t \sim f(\mu(\bm{x}),\alpha(\bm{x})), \qquad t\geq 0}\]

  • \(\bm{x}=\) vector of covariates (potentially influencing survival)

  • \(\mu(\bm{x})=\) location parameter

    • Scale or mean — usually main objective of the (biostats!) analysis
    • Typically depends on the covariates \(\bm{x}\)
  • \(\alpha(\bm{x})=\) ancillary parameters

    • Shape, variances, etc
    • May depend on \(\bm{x}\), but often assume they don’t (see NICE TSD 14)
  • NB: \(S(t)\) and \(h(t)\) are functions of \(\mu(\bm{x}), \alpha(\bm{x})\)

  • Typically use generalised linear model

\[\class{myblue}{g(\mu_i)=\beta_0 + \sum_{j=1}^J \beta_j x_{ij} [+ \ldots]}\]

since \(t>0\), usually, \(g(\cdot) = \log\)

  • In a Bayesian setting, complete by putting suitable priors on \(\bm\beta\) and \(\alpha\)

(The problems with) Survival analysis in HTA

  • Time-to-event data constitute the main outcome in a large number of HTAs (e.g. for cancer drugs) \(\Rightarrow\) very relevant!
    • Between 2018 and 2022, 56% of all NICE appraisals for cancer drugs conducted on immature data (Gibbons and Latimer 2024)
  • BUT:…

  1. Which model is the “best fit”?
    • How to judge that?
  2. Is modelling even enough?
    • How to make the most of external data and information
  3. Should you be Bayesians about this?
    • (Spoiler alert: the answer is always Yes!…)

Extrapolation

A recipe for disaster?…

Extrapolation

A recipe for disaster?…

Extrapolation

A recipe for disaster?…

Extrapolation

A recipe for disaster?…

  • NB: Any *IC can only tell us about model fit for the observed data!
  • Extrapolation (like missing data) is based on (virtually) untestable assumptions

Extrapolation

Why does it matter?

  • Intrinsic/pathological uncertainty in the output of the (time-to-event) statistical modelling does carry through the entire process, all the way to the decision-making
  • It is not impossible (especially in cases involving new, innovative immuno-oncology drugs) that the observed data be extremely sparse and subject to high censoring
    • In the case depicted here, the best fitting model responds by extrapolating a survival curve that implies Pr(alive at 15 years) \(>\) 0.5
    • This may be obviously wrong/against expert or clinical opinion!

We need to formally and quantitatively consider what the implications of this uncertainty are on the decision-making process!

Integrate different sources of data (including “Real World Evidence”) — fundamentally, a Bayesian operation!

What do people do?…

  1. Long-term survival estimated by modelling all data sources jointly
  1. (More or less formal) Piecewise adjustment
    • Restricted mean survival time
    • Elicitation of long-term survival constraints
  1. Others
    • Mixture cure models (Chaudhary et al, 2022; Paly et al, 2022) — some survivors are eventually assumed to have zero or negligible risk of some type of event in the long term
    • Background mortality/anchoring
  • Want/need models that are as flexible as possible and combine observed (limited) data with reasonable extrapolation

1. “Blended” survival curves

Consider two separate process

  1. Driven exclusively by the observed data
    • Similar to a “standard” HTA analysis – use this to estimate \(S_{obs}(t\mid\bm\theta_{obs})\)
    • Main objective: produce the best fit possible to the observed information
    • NB: Unlike in a “standard” modelling exercise where the issue of overfitting is potentially critical, achieving a very close approximation to the observed dynamics has much less important implications in the case of blending
  1. “External” process
    • Used to derive a separate survival curve, \(S_{ext}(t\mid \bm\theta_{ext})\) to describe the long-term estimate for the survival probabilities
    • Could use “hard” evidence (eg RWE/registries/cohort studies/etc)…
    • …Or, purely subjective knwoledge elicited from experts (or both!)

NB: Most likely need to use suitable statistical methods to “de-bias” the RWE - Propensity score, g-computation, …

1. “Blended” survival curves

Modelling

Combine the two processes to obtain \[\begin{align} \class{myblue}{S_{ble}(t\mid\bm\theta) = S_{obs}(t\mid\bm\theta_{obs})^{1-\pi(t; \alpha, \beta, a, b)}\times S_{ext}(t\mid\bm\theta_{ext})^{\pi(t;\alpha, \beta, a, b)}} \end{align}\] where:

  • \(\class{myblue}{\bm \theta = \{\bm \theta_{obs}, \bm \theta_{ext}, \alpha, \beta, a, b\}}\) is the vector of model parameters
  • \(\displaystyle \class{myblue}{\pi(t;\alpha,\beta,a,b) = \Pr\left(T\leq \frac{t-a}{b-a}\mid \alpha, \beta\right) = F_{\text{Beta}}\left (\frac{t-a}{b-a}\mid \alpha, \beta \right)}\) is a weight function controlling the extent to which \(S_{obs}(\cdot)\) and \(S_{ext}(\cdot)\) are blended together
  • \(t \in [0,T^*]\), is the interval of times over which we want to perform our evaluation

Important

NB: This is not the same as a “mixture cure model”!

  • In MCM, additive on \(S(t)\) and population explicitly divided in “cured” and “uncured”, with a constant cure fraction and one mixed survival curve to model heterogeneity in the population
  • In BSC, additive on \(\log\left(S(t)\right)\) and short- vs long-term processed modelled explicitly, with smooth transition and the main objective of extrapolation (so it is possible that no-one is “cured”)

1. “Blended” survival curves

Graphical representation

1. “Blended” survival curves

Weights

1. “Blended” survival curves

What do the weights do?…

1. “Blended” survival curves

  • The main point of the “blending” procedure is to recognise that, sometimes (often…), the observed data are just not good enough to simultaneously
    1. Provide the best fit to the observed data
    2. Provide a reasonable extrapolation for the long-term survival
  • Instead, we let the observed data tell us about the short-term survival and some external information tell us something about the long-term survival
  • When external data/RWE are available, they should be leveraged
    • BSCs allow to do this in a relatively straightforward way – but need to make sure the RWE are exchangeable/unbiased (as much as we possibly can…)
    • The “heavy-lifting” is done by the weight function that determines how the sources are blended together
    • This is based on (possibly untestable, but certainly open/upfront!) assumptions
  • This combination of difference sources of evidence is naturally Bayesian
    • Ultimately, we don’t really care about the two components – rather we want to fully characterise the uncertainty in the blended curve
    • … But to get that is simple algebra to combine the posterior distributions for \(S_{obs}(t\mid\bm\theta_{obs})\) and \(S_{ext}(t\mid\bm\theta_{ext})\)

2. M-Splines

Directly model the hazard function \[ \class{myblue}{h(t\mid\bm\theta) = \phi\sum_{k=1}^K \omega_k b_k(t)} \]

where

  • \(\phi>0\) is the baseline hazard scale parameter
  • \(\bm\omega=(\omega_1,\ldots,\omega_K)\) is a vector of coefficients for the M-spline basis terms
  • \(b_k(t)\) are deterministic functions of time \(t\), known as “basis functions”

In the M-Splines model

  • The splines represent potential changes in the hazard trajectory at any time
  • The model adapts to fit the observed data, while allowing for larger uncertainty when the information is sparse
  • Reduces the reliance on direct extrapolation

2. M-Splines

Bayesian modelling

  • \(\log\phi \sim \dnorm(0,\text{sd}=20)\)
    • Can also include dependence on \(C\) covariates \(\bm{x}=(\bm{x}_1,\ldots,\bm{x}_C)\) by assuming a PH model in which \(\phi^*(\bm{x})=\phi\exp(\bm\beta^\top \bm{x})\), with a default prior \(\beta_1,\ldots,\beta_C \stackrel{iid}{\sim}\dnorm(0,\text{sd}=2.5)\)
  • \(\gamma_k=\log\left(\frac{\omega_k}{\omega_1}\right)\sim\txt{Logistic}(\mu_k,\sigma)\)
    • Assumes \(\sum_{k=1}^K \omega_k =1\) and \(\gamma_1=\mu_1=0\)
    • Partial pooling to ensure global smoothing
  • Location parameters \(\bm{\mu}=(0,\mu_2,\ldots,\mu_K)\)
    • Selected using a data-driven procedure based on the location of consecutive knots for the basis functions
    • \(\mu_k\) set so that the corresponding weight \(\omega_k\) represents a constant hazard \(h(t)\)
    • Can also relax PH and include covariates by modelling \(\mu+\bm{\delta}_k^\top \bm{x}\) (see Jackson, 2023)
  • Scale parameter \(\sigma\sim \dgamma(2,1)\)
    • Controls the level of smoothness of the fitted hazard curve
    • If \(\sigma\rightarrow 0\), then the hazard is closer to constant

2. M-Splines

2. M-Splines

Pseudo-data

  • We can include information about the long-term survival behaviour using pseudo-data in the form of Binomial counts
  • This information “anchors” the curves to more meaningful (if subjective…) extrapolation
Start Stop \(\delta\) \(n\) \(r\) Treatment
100 120 20 100 8 FC
120 150 30 100 6 FC
150 180 30 100 2 FC

  • \(n=\) people alive at Start
  • \(r=\) people still alive at Stop
  • \(\delta=\) duration of interval

\[ \begin{aligned} r_i & \sim\dbin(\phi_i,n_i) \\ \phi_i & = 1-\exp(-h_i\delta_i) \end{aligned} \]

2. M-Splines

3. Diffusion Piecewise Exponential Model

Consider

\[ \class{myblue}{\log\left(h(t\mid \bm\theta)\right)= \sum_{j=1}^J \alpha_j \unicode{x1D7D9}\left(t \in (s_{j-1},s_j]\right)} \]

  • \(\{\alpha_j\}_{j=0}^J\) are a sequence of local log-hazards
  • \(\{s_j\}_{j=0}^J\) are a sequence of knot locations with \(s_0 = 0\)
  • \(\{\alpha_j\}_{j=1}^J\sim \txt{Discretised Diffusion Process}(\cdot)\)
    • Drift function used to encode strong prior information about the long-term behaviour of the hazard function
    • Restrictions on the form of the drift are minimal, allowing for a range of prior information to be encoded into the model
  • \(\{s_j\}_{j=0}^J\sim \txt{Poisson Point Process}(\cdot)\)
    • Allows for intensity in the changes of the hazard during the extrapolation period to be informed by those observed on \((0,t_+)\)
    • Reduces sensitivity of extrapolation to knots locations
  • Use efficient MCMC algorithm based on Piecewise Deterministic Markov Processes (PDMPs — Fernhead et al, 2018; Fernhead et al, 2024)

3. Diffusion Piecewise Exponential Model

Very promising

  • Inference drive by data in the observed time horizon
  • Extrapolation relies on information encoded in the model through the priors
    • Inevitable sensitivity to specification of the drift
  • Compared with M-splines, tends to provide more precise estimates for the extrapolation
    • Potential artificial increase in overall mean survival for M-splines — important for HTA modelling!
  • Computational effort optimised by PDMPs
    • Very flexible structure and easy-ish to include genuine information in the priors
    • Does require bespoke implementation (soon to come as R package)

Conclusions

  • Survival modelling in HTA is a complex but vital research area
    • Extrapolation is weird, but we need it for HTA modelling
    • Important decisions depend on sparse and immature data
    • No model can do this well on both the observed and extrapolated time, without relying on external data/assumptions!…
  • Being Bayesians is a good thing here (isn’t it always though?… 😉)
    • Include genuine information/explicit assumptions through priors and modelling structure
    • Openly present results based on large uncertainty
  • Analysis of the Value of Information (for another edition of the Graham Dunn seminar…)
  • Conditional reimbursement