Mind the Gaps: Managing Missing PRO Data in the Era of JCA for HTA Submissions

Some methodological issues

Gianluca Baio

Department of Statistical Science | University College London

g.baio@ucl.ac.uk

https://gianluca.statistica.it

https://egon.stats.ucl.ac.uk/research/statistics-health-economics

https://github.com/giabaio https://github.com/StatisticsHealthEconomics

@gianlubaio@mas.to @gianlubaio

Issue Panel

ISPOR Europe 2025, Glasgow

11 November 2025

Check out our departmental podcast “Random Talks” on Soundcloud!

Follow our departmental social media accounts + magazine “Sample Space”

Objectives (of this talk…)

Expanding the focus on multiple imputation and linking it up with Bayesian modelling
Is it more efficient, particularly in HTA?\(^{*}\)
- Non-normality
- Skewness
- Longitudinal vs cross-sectional data
- Structural values
- Links to decision-making and PSA
Methodological approaches to handle situations where substantial amount of missingness is present, with particular relevance to PRO data accounting for the longitudinal nature of the data
An example…

\(^{*}\)Yes it is. 😉

Background

The problems with missing data…

We plan to observe \(n_{\rm{planned}}\) data points, but end up with a (much) lower number of observations \(n_{\rm{observed}}\)
- What is the proportion of missing data?
- Does it matter?…
We typically don’t know why the unobserved points are missing and what their value might have been
– Missingness can be differential in treatment/exposure groups

… Basically, not very very much we can do about it!
- Any modelling based on at least some untestable assumptions
- Cannot check model fit to unobserved data
- Have to accept inherent uncertainty in our analysis!

Missing data mechanisms

Missing data analysis methods

Complete Case Analysis
- Elimination of partially observed cases
- Simple but reduce efficiency and possibly give bias parameter estimates
Inverse probability weighting
- Weigh the original data (subject to missingness) to account for the fact that the actual sample size is smaller than originally planned
- Weigh up(down) units that have a low(high) chance of actually being observed
Single (deterministic) imputation
- Imputation of missing data with a single value (mean, median, LVCF)
- Does not account for the uncertainty in the imputation process

Multiple (stochastic) imputation (MI)
“Full Bayesian”

Multiple imputation

Suppose we observe a continuous variable \(y\) on \(n_{\rm{obs}} < n\) individuals
Also, we completely observe an additional variable \(x\) on the \(n\) individuals in the sample

The simplest way in which we can describe the relationship between \(y\) and \(x\) is through linear regression

\[\class{myblue}{y_i = \beta_0 + \beta_1 x_i + \varepsilon_i \qquad \varepsilon_i \sim \dnorm(0,\sigma^2)}\]

Using the model-bases estimates for the parameters \(\hat{\bm\theta}=(\hat{\beta}_0,\hat{\beta}_1,\hat\sigma)\), we can also obtain a prediction

\[\class{myblue}{y_i^{\rm{mis}}\sim \dnorm(\hat{\beta}_0 + \hat{\beta}_1 x_i,\hat\sigma^2)}\]

To account for sampling variance and uncertainty in the estimates of the model parameters, replicate this a (small-ish) number \(R\) of times…

Multiple imputation

Analysis

Can analyse every imputed full dataset using standard methods
- In each imputation, there are no missing data — so no problems…
This amounts to computing \(R\) estimates for the mean and sd of the estimand, e.g. \(\bar{y}_r\) and \(s_r\)

Pooling (“Rubin’s rule”)

Combine the \(R\) analysis results into a single final result

\[\class{myblue}{\hat{\mu}_{\txt{MI}} = \frac{1}{R}\sum_{r=1}^R \bar{y}_r}\]

with variance

\[\class{myblue}{\hat{\sigma}^2_{\text{MI}} = \underbrace{\left(1 + \frac{1}{R}\right)}_{\substack{\txt{finite sampling}\\ \txt{correction}}}\underbrace{\left[\frac{1}{R-1}\sum_{r=1}^R \left(\bar{y}_r - \hat{\mu}_{\txt{MI}}\right)^2\right]}_{\txt{between imputation}} + \underbrace{\left[ \frac{1}{R}\sum_{r=1}^R s^2_r \right]}_{\txt{within imputation}}}\]

Multiple imputation

Advantages

(Generally) valid under MCAR and MAR assumptions
Makes use of the whole dataset
Can be extended to MNAR, although models become more complex and untestable assumptions are necessary
“Vanilla” implementations based on underlying Normality assumption

Disadvantages

Leads to biased results if the imputation model is completely mis-speficied (“congeniality”)
- Needs to consider this carefully because the modelling is performed in “two stages” (imputation vs analysis)…
Can (still) be computationally intensive

Think like a Bayesian/do like a frequentist…

In reality, when Rubin invented this framework, he had in mind a very Bayesian setup
- “Imputations” are draws from some suitable posterior predictive distribution
But: back in the 1970s, he didn’t have MCMC and powerful computers…
- And so he settled for a compromise – MI!

Think like a Bayesian/do like a frequentist…

In reality, when Rubin invented this framework, he had in mind a very Bayesian setup
- “Imputations” are draws from some suitable posterior predictive distribution
But: back in the 1970s, he didn’t have MCMC and powerful computers…
- And so he settled for a compromise – MI!

MI can be seen as an approximation to the full Bayesian modelling in which the missingness mechanism is formally modelled (if necessary, eg MNAR)
In any case, as the underlying model complexity increases, the Bayesian machinery becomes marginally less complicated…
The MCMC structure does not change dramatically, once it is in place and can handle the idiosyncrasy of the data
- Longitudinal vs cross-sectional
- Non-Normality
- “Structural values”
- …

Bayesian modelling of missing data

MCAR/MAR

Build the model of analysis and simply run it (e.g. using JAGS or Stan…)
Missing values are automatically “imputed” from the posterior predictive distribution

\[\class{myblue}{p(y^{\text{mis}}\mid y^{\text{obs}},\bm{x}) = \int p(y^{\text{mis}}\mid \bm\theta,\bm{x}) p(\bm\theta\mid y^{\text{obs}},\bm{x}) d\bm\theta}\]

MCMC algorithms give this “for free” – no extra computation required…

MNAR

Build the model of analysis and the model of missingness jointly
Use (potentially subjective) information to “anchor” the impact of the partially observed outcome on the missingness probability
Missing values are again automatically drawn from the posterior predictive distribution

Either way, the modelling happens “in one go” and uncertainty is fully propagated!

This has important implications for decision modelling and uncertainty analysis (PSA)…

MenSS study

Time	Type of outcome	observed (%)	observed (%)
Baseline	utilities	72 (96%)	72 (86%)
3 months	utilities and costs	34 (45%)	23 (27%)
6 months	utilities and costs	35 (47%)	23 (27%)
12 months	utilities and costs	43 (57%)	36 (43%)
Complete cases	utilities and costs	27 (44%)	19 (23%)

Gabrio et al (2018). https://doi.org/10.1002/sim.8045

MenSS study

MenSS study