3 Why `R`? A Low- and Middle-Income Countries Perspective

Joshua Soboil
Cogentia Healthcare Consulting, UK

Federico Rodriguez Cairoli
Triangulate Health, UK

Antoinette Buhle Ndweni
University of Cape Town, South Africa

3.1 Introduction

If you are perusing this book, chances are you’re already pondering, “Why R?” and more specifically, “Why R for Health Technology Assessment (HTA)?” Other chapters delve into comprehensive explanations and technical justifications for this question. This chapter, however, brings into focus the advantages of using R for HTA within Low- and Middle-Income Countries (LMICs).

Health analysts who have built a de novo (new) HTA models in Microsoft Excel will attest that this is a cumbersome task: from naming the cells to assigning cell names to the correct parameters, ensuring the right formulae are pulled correctly across spreadsheets within the Excel workbook as you hope and pray the model computes and produces statistically significant analysis. This is because while Visual Basic for Applications (VBA) in Microsoft Excel is a coding tool, its statistical precision is lacking in comparison to R. And while organisations in LMICs may be conscious of implementing cost-cutting measures to ensure they stick to their budgets – even within their health economic decision modelling and data analysis work – there is no reason this should be done at the detriment of producing statistically accurate health outcomes.

3.2 Some basic motivations: why open-source programming languages matter

The most obvious motivation for using R for HTA in LMICs rather than, for example, Excel, TreeAge, Stata, or Simul8, is that it is a free and open-source software. If you have internet access you can download it without having to whip out your credit card. This significantly reduces the financial burden associated with software licences, making it more accessible for LMIC analysts. In comparison, TreeAge, for instance, carries a price tag of US$\$2,250$; Excel costs range from US$\$160$ to US$\$400$ depending on the package; basic Stata costs US$\$840$ per year, and Simul8 sets you back a hefty US$\$4,995$ annually. That is quite a lot of money. It is even more costly when you think about it in terms of the marginal gains. What can point-and-click or spreadsheet software do that R cannot? Not much. In fact, the traditional software used for health economic modelling cannot do what R can do. Hence, the allure of R predominantly lies in the analytic freedoms that it provides, rather than merely because it is a free software.

TreeAge and Simul8 may be easier for people to learn because of their point-and-click interface and visual mock-ups. But, these can also be seriously limiting to an analyst who is serious about modelling. Implementing highly specialised or complex modelling approaches not directly supported by such software can be challenging. Similarly, while Excel is ubiquitously used for health economic cost-effectiveness analysis, it suffers from its own limitations and flexibility issues. For example, version control can become an issue when multiple users collaborate on a single Excel file, leading to inconsistencies and difficulties in tracking changes over time.

Excel also lacks built-in features for documenting and reproducing analyses, which are essential for ensuring technical transparency in HTA. For instance, a small but notable issue is the difficulty in setting the starting point for generating a sequence of random numbers, known as a seed, in Excel models, which controls simulation consistency to ensure model reproducibility. Thus, tracking changes, documenting assumptions, and generating clear audit trails can be cumbersome and often impossible in Excel.

Of course, Excel does allow you to enter formulas freely. But even some of the formula functions used for HTA models have been found wanting with the newer versions of Excel, requiring that you tweak them to make them fit for purpose. Also, while Excel may, for example, enable you to enter equations to reproduce specific parametric survival analysis, this can be a tedious process that is easily prone to errors. For instance, to estimate extrapolated survival based on a Generalised Gamma parametric model in Excel, you can use the following formula:

=1 - IF(
   Q < 0, 
   GAMMADIST((-Q^-2) * EXP(-Q * -((LN(time) - (mu))/sigma)), -Q^-2, 1, 1), 
   1 - GAMMADIST((-Q^-2) * EXP(-Q * -((LN(time) - (mu))/sigma)), -Q^-2, 1, 1)
)

There are a lot of moving parts in that formula, and most parameter values would be typically filled directly into the live cell formula, rather than via object assignment with VBA Excel code. And even if you do use VBA to conduct such analyses, it is inefficient and slow. As a comparison, using some arbitrary parameter values, this can be coded in R as follows:

library(flexsurv)

time <- 1:1000
mu <- 0.5
sigma <- 1
Q <- 0.01

df_surv_gengamma <- 1 - pgengamma(q = time, mu = mu, sigma = sigma, Q = Q)

While Excel and TreeAge also allow free data entry, they often necessitate manual importation of results from additional software due to limited statistical capabilities. This is especially common in health economic modelling, whereby several different statistical analyses are often fed into a “final” economic model. For instance, conducting a full survival analysis, which includes model validation and inspection, is often performed in other, external software like Stata or SAS. Parameter estimates are then imported into Excel or TreeAge and the extrapolation is only then generated for the health economic model. Hence, validation of the survival analysis and analytical flexibility is omitted, increasing the risk of errors and constrain the choice of parametric model. In other words, traditional health economic modelling software produces an “artificial separation between parameter estimation based on the clinical evidence and model simulation.”

Again, the counter-argument may be that it is just simply easier to learn TreeAge or Excel compared to statistical programming software like R, and this might be attractive for one-off modelling tasks. However, as most analysts will be involved in constructing multiple models over time, one should consider whether the additional investment required to learn a programming language is offset by the benefits of statistical programming languages in the longer term. Statistical programming languages provide substantial reductions in the time required to run model simulations, provide greater flexibility in statistical methods, are generally able to produce fully-interactive models, and ensure more reliable reproducibility and enhanced collaboration between analysts compared to the more traditional alternatives. Model technical reports can be fully automated via scripts using RMarkdown (Xie et al., 2018) or quarto (Allaire et al., 2024). Model code can be freely shared without the risk of inadvertently including confidential data through the use of an application programming interface (API). Decision-makers can fully interact with a model developed in Shiny, for instance (see Chapter 14 for more details). The list goes on.

So, despite a steeper learning curve, using a purpose-built open-source programming language is, from a technical perspective, clearly favourable for LMIC analysts in the long-term. They are cheaper, faster, more suited to building clinically valid models, and facilitate better reproducibility and collaboration. Given the above motivations, you might now ask “Why not use any open-source programming language, like Python or Julia?” Well, these programming software offer additional alternatives and, we believe, are also superior to the traditional software used for health economic modelling. Nevertheless, the preference for R lies in the fact that R has the largest and most active health economics user community. In any case, the choice of statistical software is often a proxy for the statistical sophistication of the modeller — anything “proper” and “fit-for-purpose” is obviously ideal; but a common denominator having the advantage of the largest support base surely gives a much welcomed safety-net.

3.3 The use-case of `R` for HTA in LMICs

R offers extensive functionality for statistical analysis, data manipulation, visualisation, and modelling. More importantly, R has a vast ecosystem of packages and resources specifically tailored for health economic modelling and HTA. Analysts in LMICs can leverage numerous open-resource tutorials, packages, and online courses to learn and implement advanced HTA modelling techniques (including this book!).

Ideally, LMICs analysts could, for example, develop interactive web-based models, which could be shared and used across several LMIC jurisdictions tackling the same decision problem. Such possibilities and resources promote transparency, cooperation and resource sharing, which are hugely beneficial characteristics in LMICs. This is especially relevant as they tackle the increasing prevalence of communicable and non-communicable diseases (NCDs) and strive towards achieving Universal Health Coverage (UHC). Models that require more advanced statistical methods will be a growing necessity.

3.4 Addressing the Learning Curve

So far, this all sounds great on paper. R is technically superior and it will be able to meet a growing demand in more sophisticated health economic modelling techniques in the long-term. However, looking at and interpreting any programming language can be daunting. Like the language that we speak to communicate to each other, a programming language has its own syntax structure, logic, conditional statements, and consistency that must be followed in order to make sensible statements. And you also have to understand these linguistic rules to interpret the statements of others. Though R is technically more transparent and reliable than traditional health economic modelling software, transparency is in the eye of the beholder… To confidently declare to someone who is unfamiliar with R “Look at my code, it’s just so much easier to understand!” while holding a straight-face is unlikely to elicit the desired response.

The reality is that many reviewers, analysts, and researchers working in governments, academia, industries, or HTA agencies in LMICs still lack the knowledge of how to use or read programming code. This can create a significant barrier to the full implementation of R in HTA processes within this setting. How will these decision makers advocate for the use of HTA models being built in R if they cannot even comprehend them? Often, these individuals are more familiar with software like Excel, which, despite its limitations, requires less technical knowledge to use. The fact that using R requires a more advanced knowledge of programming may deter many reviewers and analysts from adopting it as the primary tool for health technology assessment. This is the reason we encourage those with networks within government and reimbursement agencies to encourage their colleagues to attend the R-HTA workshops. And should the need arise, they can contact us regarding arranging bespoke training workshops.

This is also why the R-HTA in LMICs chapter (https://r-hta-in-lmics.github.io/) within the wider R-HTA network (https://r-hta.org/) has been created: to provide HTA stakeholders in LMICs with basic notions about R, how to read programming code, and how to develop cost-effectiveness models using both existing packages within the R community and developing models from scratch.

While our workshops are aimed at participants from LMICs, we do make accommodations for a few places for attendees from High Income Countries (HICs). A total of four half-day workshops have been conducted so far, with an average attendance of over 50 HTA analysts in LMICs. In one workshop the chapter hosted within the first year we had participants from about 30 countries worldwide – both from LMICs and HICs. Keen interest has been seen in those who have little to no programming skills extending to those with intermediate and advanced R programming skills. Analysts from Latin America, Africa, and Asia have actively participated in these workshops. The LMICs chapter has been fortunate to offer workshop breakout sessions for participants to engage in group work in both English and Spanish. Attendees have included analysts from Ministries of Health (including departments focused on HTA), consulting firms, industries, and academia. These workshops have provided an excellent opportunity to learn about ongoing research projects and foster collaboration among analysts from the Global South.

3.5 Looking Ahead

We hope this chapter has provided you with some insight into “Why R for HTA in LMICs” and convinced you that you R on the right track. While we continue to grow as a young chapter and expand the impact of utilising R for HTA in LMICs our vision extends beyond a single chapter. We aspire to cultivate a global network of R-HTA chapters, each dedicated to empowering analysts and fostering greater R for HTA collaboration within their respective regions. Ultimately, our goal is to establish dedicated R-HTA chapters in diverse geographic regions and communities worldwide.

Having regional R-HTA chapters would provide hubs of knowledge exchange, capacity building, and innovation, tailored to the specific needs and challenges faced by analysts and stakeholders within their region. In addition to the workshops being held in English, the chapter is working towards developing workshop breakout sessions in a majority spoken indigenous language when possible, and translating key R resources for health economic modelling to several languages. However, this will require all health analysts with a keen interest to commit themselves to championing the use of R for HTA in industry, academia and government agencies. If you are up for the challenge to start a chapter within your region, rest assured that you will not go through this journey alone. We as the R-HTA in LMICs chapter will provide you with the guidance and support needed to embark on this journey.

3.1 Introduction

3.2 Some basic motivations: why open-source programming languages matter

3.3 The use-case of R for HTA in LMICs

3.4 Addressing the Learning Curve

3.5 Looking Ahead

3.3 The use-case of `R` for HTA in LMICs