Mind the Gaps: Managing Missing PRO Data in the Era of JCA for HTA Submissions

Some methodological issues


Gianluca Baio

Department of Statistical Science   |   University College London

g.baio@ucl.ac.uk


https://gianluca.statistica.it

https://egon.stats.ucl.ac.uk/research/statistics-health-economics

https://github.com/giabaio   https://github.com/StatisticsHealthEconomics  

@gianlubaio@mas.to     @gianlubaio    


Issue Panel

ISPOR Europe 2025, Glasgow

11 November 2025

Check out our departmental podcast “Random Talks” on Soundcloud!

Follow our departmental social media accounts + magazine “Sample Space”

Objectives (of this talk…)

  • Expanding the focus on multiple imputation and linking it up with Bayesian modelling

  • Is it more efficient, particularly in HTA?\(^{*}\)

    • Non-normality
    • Skewness
    • Longitudinal vs cross-sectional data
    • Structural values
    • Links to decision-making and PSA
  • Methodological approaches to handle situations where substantial amount of missingness is present, with particular relevance to PRO data accounting for the longitudinal nature of the data

  • An example…

\(^{*}\)Yes it is. 😉

Background

The problems with missing data…

  • We plan to observe \(n_{\rm{planned}}\) data points, but end up with a (much) lower number of observations \(n_{\rm{observed}}\)

    • What is the proportion of missing data?
    • Does it matter?…
  • We typically don’t know why the unobserved points are missing and what their value might have been
    – Missingness can be differential in treatment/exposure groups

  • … Basically, not very very much we can do about it!
    • Any modelling based on at least some untestable assumptions
    • Cannot check model fit to unobserved data
    • Have to accept inherent uncertainty in our analysis!

Missing data mechanisms

(a) MCAR

 

(b) MAR

 

(c) MNAR

 

Figure 1

Missing data analysis methods

  • Complete Case Analysis
    • Elimination of partially observed cases
    • Simple but reduce efficiency and possibly give bias parameter estimates
  • Inverse probability weighting
    • Weigh the original data (subject to missingness) to account for the fact that the actual sample size is smaller than originally planned
    • Weigh up(down) units that have a low(high) chance of actually being observed
  • Single (deterministic) imputation
    • Imputation of missing data with a single value (mean, median, LVCF)
    • Does not account for the uncertainty in the imputation process
  • Multiple (stochastic) imputation (MI)

  • “Full Bayesian”

Multiple imputation

  • Suppose we observe a continuous variable \(y\) on \(n_{\rm{obs}} < n\) individuals
  • Also, we completely observe an additional variable \(x\) on the \(n\) individuals in the sample
  • The simplest way in which we can describe the relationship between \(y\) and \(x\) is through linear regression

\[\class{myblue}{y_i = \beta_0 + \beta_1 x_i + \varepsilon_i \qquad \varepsilon_i \sim \dnorm(0,\sigma^2)}\]

  • Using the model-bases estimates for the parameters \(\hat{\bm\theta}=(\hat{\beta}_0,\hat{\beta}_1,\hat\sigma)\), we can also obtain a prediction

\[\class{myblue}{y_i^{\rm{mis}}\sim \dnorm(\hat{\beta}_0 + \hat{\beta}_1 x_i,\hat\sigma^2)}\]

  • To account for sampling variance and uncertainty in the estimates of the model parameters, replicate this a (small-ish) number \(R\) of times…

Multiple imputation

Analysis

  • Can analyse every imputed full dataset using standard methods
    • In each imputation, there are no missing data — so no problems…
  • This amounts to computing \(R\) estimates for the mean and sd of the estimand, e.g. \(\bar{y}_r\) and \(s_r\)

Pooling (“Rubin’s rule”)

  • Combine the \(R\) analysis results into a single final result

\[\class{myblue}{\hat{\mu}_{\txt{MI}} = \frac{1}{R}\sum_{r=1}^R \bar{y}_r}\]

with variance

\[\class{myblue}{\hat{\sigma}^2_{\text{MI}} = \underbrace{\left(1 + \frac{1}{R}\right)}_{\substack{\txt{finite sampling}\\ \txt{correction}}}\underbrace{\left[\frac{1}{R-1}\sum_{r=1}^R \left(\bar{y}_r - \hat{\mu}_{\txt{MI}}\right)^2\right]}_{\txt{between imputation}} + \underbrace{\left[ \frac{1}{R}\sum_{r=1}^R s^2_r \right]}_{\txt{within imputation}}}\]

Multiple imputation

Advantages

  • (Generally) valid under MCAR and MAR assumptions

  • Makes use of the whole dataset

  • Can be extended to MNAR, although models become more complex and untestable assumptions are necessary

  • “Vanilla” implementations based on underlying Normality assumption

Disadvantages

  • Leads to biased results if the imputation model is completely mis-speficied (“congeniality”)
    • Needs to consider this carefully because the modelling is performed in “two stages” (imputation vs analysis)…
  • Can (still) be computationally intensive

Think like a Bayesian/do like a frequentist…

  • In reality, when Rubin invented this framework, he had in mind a very Bayesian setup
    • “Imputations” are draws from some suitable posterior predictive distribution
  • But: back in the 1970s, he didn’t have MCMC and powerful computers…
    • And so he settled for a compromise – MI!

Think like a Bayesian/do like a frequentist…

  • In reality, when Rubin invented this framework, he had in mind a very Bayesian setup
    • “Imputations” are draws from some suitable posterior predictive distribution
  • But: back in the 1970s, he didn’t have MCMC and powerful computers…
    • And so he settled for a compromise – MI!

  • MI can be seen as an approximation to the full Bayesian modelling in which the missingness mechanism is formally modelled (if necessary, eg MNAR)

  • In any case, as the underlying model complexity increases, the Bayesian machinery becomes marginally less complicated…

  • The MCMC structure does not change dramatically, once it is in place and can handle the idiosyncrasy of the data

    • Longitudinal vs cross-sectional
    • Non-Normality
    • “Structural values”

Bayesian modelling of missing data

MCAR/MAR

  • Build the model of analysis and simply run it (e.g. using JAGS or Stan…)
  • Missing values are automatically “imputed” from the posterior predictive distribution

\[\class{myblue}{p(y^{\text{mis}}\mid y^{\text{obs}},\bm{x}) = \int p(y^{\text{mis}}\mid \bm\theta,\bm{x}) p(\bm\theta\mid y^{\text{obs}},\bm{x}) d\bm\theta}\]

  • MCMC algorithms give this “for free” – no extra computation required…

MNAR

  • Build the model of analysis and the model of missingness jointly
  • Use (potentially subjective) information to “anchor” the impact of the partially observed outcome on the missingness probability
  • Missing values are again automatically drawn from the posterior predictive distribution

Either way, the modelling happens “in one go” and uncertainty is fully propagated!

  • This has important implications for decision modelling and uncertainty analysis (PSA)…

MenSS study

Time Type of outcome observed (%) observed (%)
Baseline utilities 72 (96%) 72 (86%)
3 months utilities and costs 34 (45%) 23 (27%)
6 months utilities and costs 35 (47%) 23 (27%)
12 months utilities and costs 43 (57%) 36 (43%)
Complete cases utilities and costs 27 (44%) 19 (23%)

Gabrio et al (2018). https://doi.org/10.1002/sim.8045

MenSS study

MenSS study