# Loads the relevant packages
library(readr)
library(readxl)
library(dplyr)
# Cache raw data
read_xlsx(path = "data.xlsx",
sheet = "output") %>%
write_csv("data-raw.csv")
xls2BCEA
I [Gianluca] have already mentioned this: Nathan (who’s actually written this post) has started doing work on BCEA — here’s another very cool idea he’s had. We’re planning to implement this functionality fully in the next release.
The BCEA package is basically agnostic as to where the cost and effectiveness samples come from. Of course, we strongly suggest performing a full Bayesian analysis (in R or a MCMC sampler, e.g. WinBUGS), where possible but these could equally be a simple Monte Carlo sample generated externally, such as in Microsoft Excel.
This post demontrates how to easily import the output of a PSA from Excel in to R to use with BCEA.
Input Data
In this particular example, I am using some some dummy data I created in Excel using the RAND()
and GAMMA.INV()
functions. I sampled \(n = 100\) for 4 different interventions and recorded them in a single sheet for both cost and QALYs.
A screenshot of the workbook is below.
Our task is to correctly pull this into R for bcea()
.
The package to do this is [readxl](https://readxl.tidyverse.org/)
from the tidyverse. This is making the assumption that the data are in tabular form i.e. a nice, flat rectagular array. If they are more troublesome then we can look in to using something like [tidyxl](https://cran.r-project.org/web/packages/tidyxl/vignettes/tidyxl.html)
.
The readxl
people suggest alway caching the Excel data in a csv file first for prosperity so we’ll do this too.
Note the use of the “pipe” %>%
to concatenate operations in R.
Now we can start proper. read_xlsx
has arguments to specify which sheet to import from and the range of cells to find the table in, as well as some more familiar ones like what type to give each column.
For the cost data we read it in as follows.
read_xlsx(path = "data.xlsx",
sheet = "output",
range = "C4:F101",
col_names = TRUE,
col_types = "numeric")
# A tibble: 97 × 4
`status-quo` interv_1 interv_2 interv_3
<dbl> <dbl> <dbl> <dbl>
1 3.69 9.01 2.37 2.78
2 3.95 5.66 2.45 5.35
3 0.329 0.635 2.04 4.32
4 0.426 0.832 1.47 0.626
5 0.657 1.99 3.14 0.409
6 2.71 5.87 3.46 3.76
7 4.13 7.90 3.23 1.82
8 5.81 2.46 3.45 4.06
9 5.09 0.867 6.42 0.968
10 5.16 1.48 2.15 2.30
# … with 87 more rows
We could alternatively just give the columns using cell_cols()
and read_xlsx()
which work out which rows to read.
read_xlsx(path = "data.xlsx",
sheet = "output",
range = cell_cols("C:F"),
col_names = TRUE,
col_types = "numeric",
skip = 1)
Because we want to read in two tables, we’ll create a function.
<- function(path,
xls2ec
sheet_e,sheet_c = NA,
range_e,
range_c,
...) {
if (is.na(sheet_c)) sheet_c <- sheet_e
<-
eff read_xlsx(path = path,
sheet = sheet_e,
range = range_e,
...)
<-
cost read_xlsx(path = path,
sheet = sheet_c,
range = range_c,
...)
check_xls2ec(eff, cost)
list(eff = eff,
cost = cost)
}
If the two tables are in the same sheet then only provide a sheet name to sheet_e
. Notice that we included a simple check on the validity of the read tables. This could check for matching dimensions and labeling.
= function(eff, cost) {
check_xls2ec
if (nrow(eff) != nrow(cost))
warning("Different number of samples.")
if (ncol(eff) != ncol(cost))
warning("Different number of interventions")
if (!identical(sort(names(eff)), sort(names(cost))))
warning("Different names of interventions")
}
Now, we can read in all of the data.
xls2ec(path = "data.xlsx",
sheet_e = "output",
range_e = cell_cols("C:F"),
range_c = cell_cols("I:L"),
col_names = TRUE,
col_types = "numeric",
skip = 1)
$eff
# A tibble: 99 × 4
`status-quo` interv_1 interv_2 interv_3
<dbl> <dbl> <dbl> <dbl>
1 3.69 9.01 2.37 2.78
2 3.95 5.66 2.45 5.35
3 0.329 0.635 2.04 4.32
4 0.426 0.832 1.47 0.626
5 0.657 1.99 3.14 0.409
6 2.71 5.87 3.46 3.76
7 4.13 7.90 3.23 1.82
8 5.81 2.46 3.45 4.06
9 5.09 0.867 6.42 0.968
10 5.16 1.48 2.15 2.30
# … with 89 more rows
$cost
# A tibble: 99 × 4
`status-quo` interv_1 interv_2 interv_3
<dbl> <dbl> <dbl> <dbl>
1 23.5 51.1 26.0 20.3
2 14.5 29.5 46.4 31.3
3 17.8 18.6 39.7 20.0
4 17.3 30.0 45.6 20.8
5 16.9 29.2 43.7 15.6
6 16.1 30.5 31.0 11.0
7 23.3 41.9 37.0 30.1
8 12.9 19.0 35.3 20.3
9 27.6 14.4 37.7 21.0
10 22.5 46.1 30.7 16.9
# … with 89 more rows
We can now use these data with BCEA via the bcea()
function. However, if we’re feel particularly lazy, we may want to do this all in a single step.
One option is something like the function below.
<- function(path,
xls2bcea
sheet_e,sheet_c = NA,
range_e,
range_c,
...) {
<- list(...)
ext_args
<-
keep_args_xlsx names(ext_args) %in% names(formals(read_xlsx))]
ext_args[
<-
keep_args_bcea names(ext_args) %in% names(formals(bcea))]
ext_args[
<- do.call(xls2ec, c(path = path,
ec sheet_e = sheet_e,
sheet_c = sheet_c,
range_e = range_e,
range_c = range_c,
keep_args_xlsx))
do.call(bcea, c(list(e = as.matrix(ec$eff),
c = as.matrix(ec$cost)),
keep_args_bcea)) }
The key elements are that arguments passed in with dot-dot-dot could correspond to either xls2ec()
or bcea()
and so we filter them beforehand. Also, range_e
and range_c
have to be of the format C2:D7
, rather than the cell_cols()
input because it cannot simply be passed onwards to xls2ec()
in do.call()
. I tried unsuccessfully using the cellranger package or some NSE tricks I could remember.
library(BCEA)
<-
he xls2bcea(path = "data.xlsx",
sheet_e = "output",
range_e = "C4:F101",
range_c = "I4:L101",
col_names = TRUE,
col_types = "numeric",
skip = 1)
plot.bcea(he)