1 Overview

MERIAM (Modelling the Economics of Respiratory tract Infections and AMr) is a model built to assess the long-term health-economic effects of improved diagnostics for community-acquired acute respiratory tract infections at the first point of care. The model is developed within the VALUE-Dx project.

Image: model overview

MERIAM has three compartments: the demographic model, used to model the population over a long time horizon; the consultation model, used to model patients going to care with an acute respiratory tract infection and the antimicrobial resistance (AMR) model, used to forecast AMR levels and AMR-related mortality and costs.

The demographic model contains a representative sample of the modelled country. The consultation model uses incidence data to simulate the care-seeking behaviour for community-acquired respiratory tract infections of a subset of individuals from the demographic model and their outcomes, including costs, quality-adjusted life years (QALYs) and antibiotic consumption. The AMR model uses antibiotic consumption data to forecast AMR levels.

The time framework of this model is as follows:

  • Individuals may seek care for respiratory complaints, with associated costs and health outcomes. Currently, only weeks during the influenza season are included (week 40 - 20).
  • After the influenza season (after week 20), demographic changes are applied (background mortality, babies born, migration etc.)
  • Predict antimicrobial resistance (AMR) - not yet implemented
  • Repeat for the requested number of years

Below are instructions to run the model and a description of the technical aspects of the model.

2 Setting up the R environment

The current R version in use for MERIAM is R version 4.1.0 (2021-05-18). The Checkpoint package is used to be able to use packages in a reproducible manner, this forces all packages to use the version as published on CRAN on 20 July, 2021.

An overview of the included packages is displayed below:

*: indicates that the package is only used to display results in this Documentation, and not to compute the main model.

3 Input data

3.1 VALUE-Dx data

3.1.1 PPAS

The VALUE-Dx point-prevalence audit survey (PPAS) contains patient-level data for many European countries regarding patients seeking care at the general practitioner for acute respiratory complaints. Used in MERIAM are data on:

  • Patient characteristics

    • Proportion of patients that are under 5 years old
  • Number and types of diagnostic tests

  • Antibiotics

    • Number of prescriptions
    • Delayed prescriptions
    • Type of antibiotic

3.2 Automated data sources

The script datagen.R will automatically download all required population data from external sources and save it in the data_input folder. In principle, these data should be available for all EU countries.

3.2.1 Population data

Population data is used to populate the model with a realistic sample of a country.

3.3 Financial data

3.3.1 Methods to convert to common currency

The harmonized index of consumer prices is used to convert all costs to a single reference year. This is published by Eurostat (dataset id: “prc_hicp_midx”).

To convert all country currencies to a common currency, Purchasing Power Parities (PPP) for GDP are used. This process corrects the price level differences between countries. Euro for the 27 EU countries is used as the reference. More information can be found on the Eurostat website.

The function convert_currency() is used to automate this process, see an example below, which uses the costs of a GP consult as an example. This function also allows for currency conversions (using Eurostat dataset: “ert_bil_eur_m” and OECD dataset: “SNA_TABLE4”) and the output in PPP corrected US Dollars.

3.3.2 Labour costs

For employed persons, average daily labour costs are estimated using two Eurostat tables: total annual labour costs (“lc_ncost_r2”) and average annual hours worked per employee (“lc_nnum2_r2”). Averages are used for each country, using the aggregate consisting of “Industry, construction and services (except public administration, defense, compulsory social security)” and including all employers with 10 employees or more. Two methods are used to estimate productivity losses: the friction period method and the human capital method (Krol and Brouwer 2014). For the human capital approach, losses are mainly considered from the perspective of the worker, while the friction cost method considers losses due to lost productivity for the employer (Krol and Brouwer 2014; Koopmanschap et al. 1995). For the friction cost method, we did not consider the possible elasticity between labour and production (Krol and Brouwer 2014).

For unpaid work, we assumed this to be valued at the hourly wages at the minimum wage level (if available). Hourly cost of unpaid work are calculated using the “minimum wage as a proportion of average monthly earnings” dataset (“earn_mw_avgr2”), which is related to the hourly earnings for employed persons (using the datasets described above). Please note that the minimum wages data is not available for all countries. This results in the following data (converted to 2019 euros, corrected for PPP):

3.4 Data from other sources

Some data used in the model are derived from literature.

3.4.1 Utilities

EuroQol-5D (EQ-5D) population norms are used, stratified by different age groups and sex (Szende, Janssen, and Cabases 2014). Three different value sets are included in the model, depending on their availability, the following value sets can be selected:

  • Country-specific time-trade-off (TTO)

  • Visual Analogue Scale (VAS)

    • Country-specific
    • European value set

Children and adolescents(under 18) are assumed to have a utility of 1. The utilities are assumed to follow a beta distribution.

Data from: Szende, Janssen, and Cabases (2014)

For the consultation model, the following utility values from literature are used (Oppong et al. 2018).

After a hospitalization in the consultation model, a utility decrement of 0.1 is applied for individuals aged 65 and over (Mangen et al. 2017).

3.4.2 Probabilities consultation model

Currently, many probabilities in the consultation model are derived from literature. An overview is provided later in this document

3.4.3 Costs consults

The following consult costs are used in the model (presented in 2019 euros). The outpatient care costs are per consult or prescription and inpatient care per day.

4 Model details

4.1 Automatically load model inputs

The function create_data() will automatically load all required data and generate the individuals in the model (this process is further here). Before running this function all required data should be available in the ‘data_input’ folder.

The function requires the following inputs:

  • Population size (e.g. 1000, or 1000000)
  • Country code (e.g. “nl” or “de,” this can be either a single string or an array)
  • The strategies that are being compared (eg. “base,” “crp,”this can be either a single string or an array)
  • Start year
  • Number of years the simulation should run
  • An array containing the weeks that include the influenza season
  • The currency year
  • The method to include productivity (either “friction,” “hc” (human capital) or “none”)
  • Used value set for the utilities (either EUVAS, VAS or TTO)
  • Should the mortality be corrected (TRUE/FALSE)
  • Should the model use stochastic methods (TRUE/FALSE)
  • ID of model iteration (in case a single integer is provided, or an array containing the iterations that need to be run, e.g. 1:100 for 100 iterations)

This function automatically converts costs to the same currency year.

Below is a raw output of the contents of a data object for the model.

## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   set = col_character(),
##   geo = col_character(),
##   pair = col_character()
## )
## i Use `spec()` for the full column specifications.
## 
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   set = col_character(),
##   geo = col_character(),
##   pair = col_character()
## )
## i Use `spec()` for the full column specifications.
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   set = col_character(),
##   geo = col_character(),
##   pair = col_character(),
##   code = col_character()
## )
## i Use `spec()` for the full column specifications.
## 
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   set = col_character(),
##   geo = col_character(),
##   pair = col_character(),
##   code = col_character()
## )
## i Use `spec()` for the full column specifications.
## Rows: 2
## Columns: 29
## $ country                <chr> "nl", "nl"
## $ strategy               <chr> "base", "crp"
## $ iteration              <dbl> 1, 1
## $ nodes                  <list> [<tbl_df[10000 x 8]>], [<tbl_df[10000 x 8]>]
## $ incuded_weeks          <list> <1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,~
## $ utility_valueset       <chr> "EUVAS", "EUVAS"
## $ pop_start_year         <list> [<tbl_df[306 x 8]>], [<tbl_df[306 x 8]>]
## $ fertility              <list> [<tbl_df[82 x 2]>], [<tbl_df[82 x 2]>]
## $ mortality              <list> [<grouped_df[16400 x 6]>], [<grouped_df[16400 x~
## $ migration              <list> [<tbl_df[16400 x 5]>], [<tbl_df[16400 x 5]>]
## $ employment             <list> [<tbl_df[10 x 4]>], [<tbl_df[10 x 4]>]
## $ vaccination            <list> [<tbl_df[1 x 5]>], [<tbl_df[1 x 5]>]
## $ utility_data           <list> [<rowwise_df[200 x 3]>], [<rowwise_df[200 x 3]>~
## $ amr_elas               <named list> [0.01474217, 0.01551005, -0.038454281, -0.0367~
## $ incidence              <list> [<tbl_df[832 x 7]>], [<tbl_df[832 x 7]>]
## $ rollout                <list> [<tbl_df[1 x 2]>], [<tbl_df[1 x 2]>]
## $ amr                    <list> [<grouped_df[8 x 4]>], [<grouped_df[8 x 4]>]
## $ abx_con                <list> [<tbl_df[192 x 3]>], [<tbl_df[192 x 3]>]
## $ microsim_duration      <dbl> 28, 28
## $ eurostat_financialdata <list> [[<data.frame[872 x 4]>], [<data.frame[5586 x ~
## $ utilities              <list> [<0.72, 0.70, 0.30, NA, 0.00>, [[-0.08, "loosel~
## $ longterm               <list> [[[65, 120, -0.13, "Mangen 2017"]], [[65, 120,~
## $ labour                 <list> [148.4545, 113.448, 8.842606, 12, 1, 1, 0.5, 6~
## $ consult_costdata       <list> [<tbl_df[4 x 3]>], [<tbl_df[4 x 3]>]
## $ antibiotic_types       <list> [<data.frame[7 x 7]>], [<data.frame[7 x 7]>]
## $ tests                  <list> [<tbl_df[1 x 3]>], [<tbl_df[1 x 3]>]
## $ currency               <chr> "EUR", "EUR"
## $ antibiotic_prescribing <list> [<data.frame[3 x 3]>], [<data.frame[3 x 3]>]
## $ probabilities          <list> [[TRUE, "lognormal", function (n_cycles, sex_i~

4.2 Demographic model

Within the model, nodes are created to represent individuals. Populations are mainly based on demographic data from Eurostat and can be made as large as needed. The nodes are stored in a tibble, where every line represents one individual. Below is an example of 10 nodes:

Where the id is an unique identifier of each node; active indicated whether an individual is alive, sex is 0 for male and 1 for female, age is age (0-99), empl indicates the employment status for each node (note that we did not set that in this case, so it defaulted to “employed”), uti_base for the base utility value (explained below) and uti_dec for any utility decrement (0 by default). The methods used to create a representative sample of the population based on the demographics of a country, are described below.

4.2.1 Baseline population

Before running the model, the starting population needs to be created: age distribution: the age_dist() function samples ages based on the Eurostat population projections from the starting year. Example for 50 age values:

##  [1]  4  8  8  9 16 17 17 18 20 21 21 25 25 25 26 27 27 27 28 29 29 30 31 31 31
## [26] 32 40 43 43 44 45 46 48 48 49 50 51 53 54 58 58 59 60 63 68 70 70 78 89 92

The probability of being female is sampled using the sexdist() function, using the probability of being female from the Eurostat population projections (at each age). Example:

##  [1] 0 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1
## [39] 1 1 0 1 0 1 1 0 0 1 1 1

The employment status is created using the calc_employment() function. This function uses the most recent data on unemployment, education and pensions from Eurostat and is dependent on the age.

##  [1] none       school     school     school     school     employed  
##  [7] employed   school     school     unemployed employed   employed  
## [13] employed   employed   employed   employed   employed   employed  
## [19] employed   employed   employed   employed   employed   employed  
## [25] employed   employed   employed   employed   employed   employed  
## [31] employed   employed   employed   employed   employed   employed  
## [37] employed   employed   employed   retired    employed   retired   
## [43] employed   retired    employed   retired    retired    retired   
## [49] retired    retired   
## Levels: employed none retired school unemployed

The utility values are set using the run_base_utility() function, which sets the base utility values (uti_base) to those described above. If required a utility decrement is applied to any node, and this is registered in the uri_dec column.

Combining this all leads to the following 50 nodes:

Doing this for more (e.g. 10000) nodes can be used to create a population pyramid:

4.2.1.1 Vaccination

Influenza vaccination data for persons aged >= 65 from Eurostat are used. Before running the model, the population coverage is sampled. This is repeated every year (to simulate elderly getting vaccinated anually). The function run_influenza_vaccination is used for this. Below is an example to vaccinate the elderly in the Netherlands (vaccination rate: 63%), using the population that was just created for the population pyramid.

4.2.2 Annual demographic changes

Every year the population is updated to reflect the Eurostat projections; this is done in the following order:

  • Mortality

  • Ageing

  • Fertility

  • Migration

  • Employment status

  • Vaccination coverage

    • Influenza vaccinations for the elderly (>= 65 years old)
  • Utilities

4.2.2.1 Mortality

Two types of mortality are included: base mortality and excess mortality. The base mortality is based on the Eurostat mortality probability projections, corrected to exclude mortality for influenza and pneumonia. In the run_base_mortality() function, the mortality probability is sampled for the active nodes of all age groups. This returns a list of node IDs that are going to be deactivated. A major assumption in the model is that all nodes over 99 will be deactivated. We do not include centennials in the model. This is mainly due to data availability for this group. Below is an example of running the base_mort() function within a population of 10000 nodes, it returns the IDs of the 66 nodes that will be deactivated, using the mortality projections from 2021:

##  [1] 62074493627 64049340552 65052259226 66034459481 67079181200 68077054749
##  [7] 69000720885 70074858693 71007703157 72074500537 73044530546 74082316348
## [13] 75086446459 76026005431 77021364870 78001036645 79043612111 80051365852
## [19] 81001799401 82008777156 83046342611 84083600034 84041023063 85021790192
## [25] 85054875743 86054985708 87061207982 88096248580 88058627349 89022999550
## [31] 90048653955 91015266968 92098285796 68119645701 69116291973 70139840439
## [37] 71133776338 72106679835 73193259456 74184586268 75182983063 76176961097
## [43] 77146989654 78178636934 79153693229 80118003577 81157547798 82185958708
## [49] 83190916302 84160949196 85144544999 86110269840 86190556178 87193388713
## [55] 88130803396 89141958143 90108419325 90155794464 91166394571 92149579143
## [61] 92134446298 93101686812 94161420421 96139628260 97173583832 98119799913

65+ year olds have an increased risk of dying after being hospitalized for community-acquired pneumonia (Mangen et al. 2017) and this is calculated as excess mortality using the run_excess_mortality() function. This function also returns a list of node IDs.

After the nodes are collected that should be deactivated, this is executed using the node_remove() function, which will inactivate the nodes (i.e. set active to 0):

4.2.2.2 Ageing

Ageing is straightforward in that it increases the age with 1. The function used in increase_age().

4.2.2.3 Fertility

Annual births are added using the births() function. Input data is used from the Eurostat population projections. The number of babies born is related to the population aged 15-45. The births() function takes the node_list as an input and returns the node_list with newly-created nodes (babies). See an example here:

4.2.2.4 Migration

The model accounts for migration by using the Eurostat projections. The Eurostat projections provide total numbers of immigration (positive number) and emigration (negative number) for 2019-2100. In MERIAM this is related to the total population and converted to a rate. This rate is then used to calculate the total number of immigrants and emigrants using the run_migration() function.

4.2.2.5 Employment status

The employment status is used to calculate productivity losses. As this model assumes no effect of disease status on the employment of individuals, this is randomly assigned every year using the calc_employment() function. This function distinguishes five states of employment:“none,” “school,” “employed,” “unemployed,” “retired.” Where “none” is used for children too young to follow education. All children are assumed to go to school at age 5. At age 5-14 all children go to school. Between the ages of 15 and 24 school-going individuals gradually move to being either employed or unemployed, based on the Eurostat education and productivity data. Between the ages of 25-54 all individuals are either employed or unemployed. From the age of 55 individuals gradually retire (based on the Eurostat retirement data). From the age of 75, everyone is assumed to be retired.

Var1 Freq
employed 5198
none 497
retired 2143
school 1974
unemployed 188

4.2.2.6 Vaccination coverage

The function run_influenza_vaccination is run every year, randomly selecting individuals aged 65 or over to be vaccinated with an influenza vaccine.

4.2.2.7 Base utility

The utility values are set every year using the run_base_utility() function, which sets the base utility values (uti_base) to those described above.

4.3 Consultation model

Each week, a subset of nodes will seek care. These nodes are selected based on real-world incidence data.

4.3.1 Incidence

Incidence is modeled using the Incidence package. Data is from [WHO/ECDC to be determined] Both Acute Respiratory Infection (ARI) and Influenza-Like Illness (ILI) are modeled (if data is available). Incidence data is stratified by the following age groups:

  • 0-4 years

  • 5-14 years

  • 15-64 years

  • 65 years and over

Incidence data is read into R and then converted into an incidence object from the Incidence package. Two exponential models will be created for each year, one where the number of cases increases over time and one where the number cases decrease. This way, an annual peak is created in the influenza season. The function fit_optim_split() from the incidence package is used to automatically determine the peak of the influenza season.

At the start of each model run, the exact incidence is calculated using the model for all weeks. See an example below for 4 weeks in a model run with 10,000 nodes for the Netherlands.

Table 4.1: Example of incidence in model (per 10,000 inhabitants), for Influenza-like ilness (ILI) and Acute respiratory-tract infection (ARI), per week
Week ILI ARI
2020-01 42 317
2020-02 45 303
2020-03 49 291
2020-04 53 279

The incidence is used as input for the consult_gp() function. This function samples nodes that visit a doctor from the node list (both ari or ili). It returns a node list with only the nodes seeking care and simulates the initial consult, including tests, treatment and associated costs (see below).

Below is an example for 4 weeks (weeks 40-43, right at the start of the flu season). Note that ili incidence is also added. Costs are €0, as these will be filled in later.

4.3.2 Index consultation

During the index consultation, a clinician will perform tests, prescribe antibiotics etc. This is modeled for all nodes seeking care. Additionally, costs will be recorded. For all nodes seeking care (as described above), tests and antibiotic prescriptions are sampled.

As far as the tests are not part of the intervention, they are sampled using the PPAS data.

Antibiotics are also sampled using the PPAS data, including the probability of delayed prescription. Both the proportion of antibiotic prescriptions and the delayed prescriptions are country-specific; the proportion of antibiotic prescriptions is stratified by age (two categories: younger than 60 and 60 and older). In case of a delayed prescription, patients will not start taking antibiotics if they are cured within two days.

4.3.3 Entry into the healthcare system

For now, the entry point into the health system is the General Practitioner - emergency rooms and long-term care facilities will follow later.

4.3.4 28-day follow up

After the initial consultation, the patients are modelled for 28 days using a microsimulation. This is a microsimulation based on earlier work by Krijkamp et al., including the very useful samplev() function to sample vectorized data (Krijkamp et al. 2018).

The probabilities are imported in to the model using a yaml file, which is implemented in such a way that all strategies can have their own probabilities.

The transition probabilities from ‘death’ and ‘healthy’ are not shown here, as we assume that all patients remain either death or cured for the remaining days in the microsimulation.

4.3.4.1 Illness duration

Illness duration is taken from (Vos et al. 2020), which is for lower respiratory infections. The probability of being cured changes by day, based on a median illness duration of 6 days and a standard error of 1.97 days. Using (Briggs, Claxton, and Sculpher 2006, pp 52-53) to convert this to daily probabilities:

4.3.4.2 Hospital length-of-stay

Hospital length-of-stay data is published by Eurostat, specific for country, sex, age and indication. For now, we made the assumption that all patients with an acute respiratory infection will be hospitalized for pneumonia and all patients with influenza-like illness for an upper respiratory tract infection (urti) / influenza. We assume a linear function for the length-of-stay. Another assumption is that patients are healthy when they are discharged. This translates to the following daily probabilities from hospitalized to healthy:

4.3.4.3 Running the microsim

The microsim can be run using the microsim() function. This function handles running the microsim, as well as aggregating the results. It is able to run in parallel using the Future package.

See here the results of a microsim with 205 nodes seeking care.

This leads to the following aggregated results:

With a low number of nodes, this process can be quite random. However, this should be solved if the number of random draws increases (i.e. the number of nodes is increased). Let’s run a total of 821 nodes through the simulation:

4.3.5 Implementation of long-term effects

The outcomes of the microsimulation are implemented in the population model in a variety of ways.

  • Direct mortality: nodes that reach the “death” state are deactivated in the model, using the node_remove() function.
  • Excess mortality: following a hospitalization for CAP, patients older than 65 have a relative risk of mortality of 6.7 (sd: 0.3) (Mangen et al. 2017). This effect is categorized under excess mortality, and applied annually when overall mortality is applied (see Mortality). We assume this effect lasts for only one year.
  • Utilities: a utility decrement of -0.13, is applied to hospitalized patients older than 65 (Mangen et al. 2017).

4.3.6 Productivity losses

Productivity losses are typically incorporated using two methods: the friction cost method or the human capital method (Krol and Brouwer 2014). The friction period is assumed to be 12 weeks in all countries (Krol and Brouwer 2014). For the data sources see above.

During the 28-day follow-up, the employment/education status of individuals are considered to calculate the productivity losses as follows:

4.3.6.1 Productivity losses due to mortality

For patients that die within the 28-day follow-up period, productivity losses are estimated beyond the 28 simulated days. For the friction cost method, the productivity losses for the complete friction period (12 weeks) are counted (Krol and Brouwer 2014). For the human capital method, we calculate production for the whole (alive) population in the demographic model and subtract from that the productivity losses calculated within the consultation models. A deceased patient will not be counted anymore, hence will not contribute to the total production.

4.4 AMR module

The AMR model uses a two-step approach. First, the baseline AMR projections are generated, using an ensemble model. This is a data-driven approach where current trends are used to forecast future AMR rates. These baseline projections are then used as for the current-care scenario, where we assume current patterns in AMR will continue in the future. The second step is to incorporate the impact on antibiotic consumption from the diagnostic strategies, in the baseline AMR projections. This uses a more mechanistically-driven approach. The steps are described in more details below.

Within the VALUE-Dx project, we aim to assess the long-term effects of rapid diagnostics on antimicrobial resistance (AMR). The first step in this process is to forecast AMR rates when the status quo is preserved, i.e. current AMR policies remain, but no additional measures are taken. Predicting antimicrobial resistance (AMR) is a challenging task, as the development and subsequent spread of resistance genes is highly uncertain. Two methods of modelling AMR in the population over time have been identified (Rothery et al. 2018):

  • Mechanistic dynamic transmission models, which models the transmission of resistant pathogens through populations, requiring information on the mechanisms of spread of resistant pathogens.

  • Statistical forecasting methods, which is a data-driven approach where the underlying mechanisms of resistance is not considered: past trends are used to forecast future AMR rates.

Additionally, expert elicitation is a viable method to forecast AMR, which can be combined with these modeling approaches (Colson et al. 2019) .The mechanisms to attain and retain resistance may differ between various pathogens. As we aim to assess the impact of diagnostics for all community-acquired respiratory-tract infections in the population, which can be caused by various pathogens (Ieven et al. 2018) , we considered a mechanistic dynamic transmission model not to be a viable strategy. A statistical forecasting method, comparable to the methods used by Hashiguchi et al. was used for this study (Hashiguchi et al. 2019) .

Several methods are available for time series forecasting (Hyndman and Athanasopoulos 2021; Galicia et al. 2019), but selecting a single ‘best’ model is challenging. Ensemble methods are an often-used technique to improve forecasts: instead of picking one model, several models are used simultaneously and then combined to provide an average. We developed an ensemble model, averaging three models:

  • An exponential smoothing (ETS) model, which forecasts future data using weighted averages of past observations. (Hyndman and Athanasopoulos 2021)

  • A random forest, which aggregates many regression trees to estimate the outcome of interest (AMR rates in our case) (Breiman 2001). Bagging (bootstrapping and aggregrating) is used, where each decision tree is informed by a random sample, with only a subset of the available regressors, of the original data set. The different trees are grown in parallel, i.e. new trees are not informed by previous trees.

  • An XGBoost model, which also combines many regression trees to estimate the outcome of interest, however, as opposed to random forests, a sequential tree growing algorithm (boosting) is used, where each new tree informs the creation of the next tree (Chen and Guestrin 2016).

4.4.1 Missing data

The European consumption and AMR data had some missing data. These were imputed using the Amelia algorithm (Honaker, King, and Blackwell 2011) which allows for time-series-cross-sectional data to be imputed. To incorporate uncertainty in the various forecasts, the imputation algorithm was run 2000 times to incorporate uncertainty.

4.4.2 Forecasts of antibiotic consumption

Antibiotic consumption of broad-spectrum penicillins was forecast using an ETS model.

There are different ETS methods. As we considered annual data, we did not consider seasonal components. The trend can be either none, additive, additive damped or multiplicative. Multiplicative trends tend to produce poor forecasts and additive trends can overestimate the trend on the long term (Hyndman and Athanasopoulos 2021), hence we considered an additive damped trend. The consumption data were box-cox transformed so that the data resembled a normal distribution. An example is displayed in figure 4.1.

Antibiotic consumption forecast

Figure 4.1: Antibiotic consumption forecast

4.4.3 AMR forecasts

For the antimicrobial resistance forecasts the dataset was split into a training and a testing set (training: 2005-2014, testing: 2015-2018), to be able to measure the performance of the forecasts. After fitting the different models to the training set, the prediction of the testing set was assessed. Then the models were refit to the full dataset to forecast the AMR rates up to 2050.

Although we only assessed resistance of Streptococcus pneumoniae to broad-spectrum penicillins in the Netherlands in this paper, we incorporated data from other bug-drug combinations and European countries as regressors in the random forest and XGBoost models.

4.4.3.1 Exponential smoothing model

The exponential smoothing model uses a similar approach as described for the consumption forecasts, hence an additive, damped, trend.

4.4.3.2 Random forest model

The random forest model uses the following regressors to predict the AMR rate:

  • Antibiotic consumption

  • GDP forecasts (corrected for purchasing power parities)

  • Forecasts proportion population aged < 15 years

  • Forecasts proportion population aged > 64 years

  • Forecasts healthcare expenditure (% of GDP)

  • Forecasts out-of-pocket spending on health (% of total spending on health)

The ranger R package was used to build the model (Wright and Ziegler 2017). The model was tuned to minimize the root mean square error (RMSE), which resulted in a mtry (number of variables included in each bootstrapped sample of 25 and a min.node.size (minimum number of observations in terminal nodes) of 3 (Probst, Wright, and Boulesteix 2019).

4.4.3.3 XGBoost model

The XGBoost (Chen and Guestrin 2016) model uses the same dependent variables as the random forest model. After tuning the following hyperparameters were chosen:

  • min_child_weight: 3 (minimum number of instances in child node)

  • max_depth: 11 (maximum depth of each tree)

  • eta 0.00920 (learning rate)

  • gamma: 0.00158 (mimumum loss reduction to make a further partition on a lead node of the tree)

Below are the feature importance plot, an example of a tree used in the XGBoost model and an example of a forecast:

4.4.3.4 Accuracy of predictions

The accuracy of the different models is calculated on the testing set, using the models trained only on the training set. Figure 4.2 shows an example of the calibration of one model iteration.

AMR forecast calibration

Figure 4.2: AMR forecast calibration

The performance of time-series forecasts are often represented using the root mean squared error (RMSE), which is calculated using the following formula (Hyndman and Athanasopoulos 2021):

\[ RMSE = \sqrt{mean(e^2_t)} \]

Where \(e_t\) is the forecast error of values from the testing set.

The values differ within the probabilistic analysis, table 4.2 gives an overview:

Table 4.2: Overview accuracy metrics AMR predictions
.model_desc mae mape mase rmse rsq smape
RANGER 0.07 [0.04 - 0.11] 2.8 [1.67 - 4.3] 0.68 [0.41 - 1.03] 0.08 [0.05 - 0.12] 0.6 [0.01 - 0.97] 2.77 [1.66 - 4.23]
XGBOOST 0.06 [0.03 - 0.11] 2.34 [1.06 - 4.41] 0.58 [0.26 - 1.08] 0.07 [0.03 - 0.12] 0.88 [0.66 - 0.98] 2.32 [1.06 - 4.41]
ETS 0.14 [0.12 - 0.17] 5.57 [4.91 - 6.51] 1.34 [1.17 - 1.52] 0.16 [0.15 - 0.19] 0.76 [0.76 - 0.78] 5.36 [4.74 - 6.23]
Note:
mae: mean absolute error; mape: mean absolute percentage error; mase: mean absolute scaled error; root mean square error; rsq: r-squared; smape: symmetric mean absolute percentage error

4.4.3.5 Forecasts of individual models

Figure 4.3 gives the AMR forecasts of the individual models.

Antimicrobial resistance forecasts

Figure 4.3: Antimicrobial resistance forecasts

4.4.3.6 Ensemble

The ensemble model is created by averaging (with equal weights) the predicted values across the models. An example is provided in figure 4.4.

Ensemble model forecasts

Figure 4.4: Ensemble model forecasts

4.4.3.7 Incorporating uncertainty

The previously described forecasting methods generate point forecasts, that is, a mean is forecast, but no statistical distribution. To incorporate uncertainty in the AMR forecasting model, the following input parameters are varied and the models are fitted for 2000 iterations:

  • A different imputed data set is used for both the historical AMR data and antibiotic consumption

  • Forecasts healthcare expenditure (% of GDP) are varied for the model replications

  • Forecasts out-of-pocket spending on health (% of total spending on health) are varied for the model replications

Consequently, all model replications use slightly different AMR projections. However, we have not quantified all uncertainty associated with the projections, i.e. not all possible future AMR rates are included in the modeling.

4.4.4 Incremental effects of diagnostic strategies

As has been described elsewhere, there is a clear relationship between antibiotic consumption and national AMR rates(Goossens et al. 2005; Cecchini and Lee 2017). We use this relationship to relate the change in antibiotic consumption, as estimated in MERIAM, to future AMR levels (projected as described above). The following formula is used:

\[ p_{Test,t}^{Ab,B} = p_{Base,t}^{Ab,B} \times \frac{C_{Test,t-1}^{Ab}}{C_{Base,t-1}^{Ab}} \times \epsilon^{Ab,B} \]

Where \(p_{Test,t}^{Ab,B}\) is the proportion of resistance of bacterium \(B\) to antibiotic \(Ab\) under the testing scenario in the year \(t\); \(p_{Base,t}^{Ab,B}\) the proportion of resistance of bacterium \(B\) to antibiotic \(Ab\) under the base case scenario in the year \(t\); \(C_{Test,t-1}^{Ab}\) the antibiotic consumption of antibiotic \(Ab\) in the year \(t-1\) in the testing scenario; \(C_{Base,t-1}^{Ab}\) the antibiotic consumption of antibiotic \(Ab\) in the year \(t-1\) in the base case scenario and \(\epsilon\) the elasticity between antibiotic consumption of antibiotic \(Ab\) and the development of resistance in bacterium \(B\).

4.4.4.1 Estimating elasticity

The elasticity \(\epsilon\) is given by the following formula:

\[ \epsilon = \frac{\% \; change \; in \; resistance }{\% \; change \; in \; consumption} \]

We calculate the elasticity from the historical antibiotic consumption and resistance data from the ECDC across all countries included in the dataset (European Centre for Disease Prevention and Control 2021). Using ordinary least squares regression on the historical (non-missing) data, a linear function is estimated. See below for the example of the proportion of resistant pneumococci and the consumption of broad-spectrum penicillins:

Historical consumption of broad spectrum penicillins and resistance of pneumococci

This function is then used to estimate the elasticity using the midpoint method, which uses the average percent change of both resistance proportions (\(p\)) and antibiotic consumption (\(C\)) between two points on the linear function:

\[ \epsilon = \frac{\frac{p_2 - p_1}{(p_2 + p_1) / 2} \times 100}{\frac{C_2 - C_1}{(C_2 + C_1) / 2} \times 100} \]

The elasticity is not constant, it varies based on the location on the line. To give two examples, a drop from 3 ddd to 2 ddd (daily per 1000 population), a 33% drop, corresponds to an elasticity of 0.72, resulting in a decline of AMR levels by 24%. However, a drop from 9 ddd to 6 ddd (also a 33% drop), corresponds to an elasticity of 0.89, causing an AMR rate decline of 29%. This also matches prior beliefs, as we would expect a larger influence of antibiotic consumption reductions in countries with a high consumption, compared to countries with a lower consumption. Although antibiotic consumption is the only parameter used here to estimate AMR levels, in reality this is not the only parameter. This is also clear from the graph above, the R2 is 0.28, so the correlation is by no means perfect.

4.4.5 Overview data sources

The input data were used based on literature (Hashiguchi et al. 2019) and export opinion, see table 4.3 for an overview.

Table 4.3: Overview data sources AMR model
Data Database Notes Reference
Antimicrobial resistance Surveillance Atlas for Infectious Disease (European Centre for Disease Prevention and Control 2021)
Antibiotic consumption ECAC-Net (European Centre for Disease Prevention and Control n.d.)
Population projections Eurostat (Eurostat 2021a)
Historical demographic data (Eurostat 2021b)
GDP projections OECD Used for OECD countries (OECD 2018)
GDP per capita World Bank Used for non-OECD countries (The World Bank n.d.)
Health expenditure projections Literature (Chang et al. 2019)
Out-of-pocket healthcare payments projections Literature (Chang et al. 2019)

5 Reporting of results

TBA

6 R session info

## R version 4.1.0 (2021-05-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22000)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252 
## [2] LC_CTYPE=English_United Kingdom.1252   
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.1252    
## 
## attached base packages:
## [1] compiler  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] aweek_1.0.2       kableExtra_1.3.4  patchwork_1.1.1   rmarkdown_2.9    
##  [5] gt_0.3.0          reactable_0.2.3   scales_1.1.1      incidence_1.7.3  
##  [9] AMR_1.7.1         ggplot2_3.3.5     truncnorm_1.0-8   doRNG_1.8.2      
## [13] rngtools_1.5      doFuture_0.12.0   foreach_1.5.1     lubridate_1.7.10 
## [17] countrycode_1.3.0 readr_1.4.0       readxl_1.3.1      stringr_1.4.0    
## [21] forcats_0.5.1     furrr_0.2.3       future_1.21.0     purrr_0.3.4      
## [25] tidyr_1.1.3       dplyr_1.0.7       tibble_3.1.2      magrittr_2.0.1   
## [29] yaml_2.2.1        knitr_1.33       
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.7        svglite_2.0.0     listenv_0.8.0     digest_0.6.27    
##  [5] utf8_1.2.1        parallelly_1.26.1 reactR_0.4.4      R6_2.5.0         
##  [9] cellranger_1.1.0  evaluate_0.14     highr_0.9         httr_1.4.2       
## [13] pillar_1.6.1      rlang_0.4.11      rstudioapi_0.13   labeling_0.4.2   
## [17] webshot_0.5.2     htmlwidgets_1.5.3 munsell_0.5.0     xfun_0.24        
## [21] systemfonts_1.0.2 pkgconfig_2.0.3   globals_0.14.0    htmltools_0.5.1.1
## [25] tidyselect_1.1.1  bookdown_0.24     codetools_0.2-18  viridisLite_0.4.0
## [29] fansi_0.5.0       crayon_1.4.1      withr_2.4.2       grid_4.1.0       
## [33] checkpoint_1.0.0  jsonlite_1.7.2    gtable_0.3.0      lifecycle_1.0.0  
## [37] cli_3.0.1         stringi_1.7.3     farver_2.1.0      xml2_1.3.2       
## [41] ellipsis_0.3.2    generics_0.1.0    vctrs_0.3.8       iterators_1.0.13 
## [45] tools_4.1.0       glue_1.4.2        crosstalk_1.1.1   hms_1.1.0        
## [49] parallel_4.1.0    colorspace_2.0-2  rvest_1.0.0

References

Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1): 5–32. https://doi.org/10.1023/A:1010933404324.
Briggs, Andrew H., Karl Claxton, and Mark J. Sculpher. 2006. Decision Modelling for Health Economic Evaluation. Oxford Handbooks in Health Economic Evaluation. Oxford: Oxford University Press.
Cecchini, Michele, and Sherry Lee. 2017. “Low-Value Health Care with High Stakes: Promoting the Rational Use of Antimicrobials,” January, 115–58. https://doi.org/10.1787/9789264266414-6-en.
Chang, Angela Y., Krycia Cowling, Angela E. Micah, Abigail Chapin, Catherine S. Chen, Gloria Ikilezi, Nafis Sadat, et al. 2019. “Past, Present, and Future of Global Health Financing: A Review of Development Assistance, Government, Out-of-Pocket, and Other Private Spending on Health for 195 Countries, 1995–2050.” The Lancet 393 (10187): 2233–60. https://doi.org/10.1016/S0140-6736(19)30841-4.
Chen, Tianqi, and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System.” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August, 785–94. https://doi.org/10.1145/2939672.2939785.
Colson, Abigail R., Itamar Megiddo, Gerardo Alvarez-Uria, Sumanth Gandra, Tim Bedford, Alec Morton, Roger M. Cooke, and Ramanan Laxminarayan. 2019. “Quantifying Uncertainty about Future Antimicrobial Resistance: Comparing Structured Expert Judgment and Statistical Forecasting Methods.” PLOS ONE 14 (7): e0219190. https://doi.org/10.1371/journal.pone.0219190.
European Centre for Disease Prevention and Control. 2021. “Antimicrobial Resistance.” Surveillance Atlas of Infectious Diseases. March 19, 2021. https://atlas.ecdc.europa.eu/public/index.aspx?Dataset=27&HealthTopic=4.
———. n.d. “Antimicrobial Consumption Database (ESAC-Net).” European Centre for Disease Prevention and Control. Accessed April 15, 2021. https://www.ecdc.europa.eu/en/antimicrobial-consumption/surveillance-and-disease-data/database.
Eurostat. 2021a. “Population on 1st January by Age, Sex and Type of Projection.” Eurostat. February 8, 2021. https://ec.europa.eu/eurostat/databrowser/view/proj_19np/default/table?lang=en.
———. 2021b. “Population on 1 January by Age and Sex.” Eurostat. March 12, 2021. https://ec.europa.eu/eurostat/databrowser/view/demo_pjan/default/table?lang=en.
Galicia, A., R. Talavera-Llames, A. Troncoso, I. Koprinska, and F. Martínez-Álvarez. 2019. “Multi-Step Forecasting for Big Data Time Series Based on Ensemble Learning.” Knowledge-Based Systems 163 (January): 830–41. https://doi.org/10.1016/j.knosys.2018.10.009.
Goossens, Herman, Matus Ferech, Robert Vander Stichele, and Monique Elseviers. 2005. “Outpatient Antibiotic Use in Europe and Association with Resistance: A Cross-National Database Study.” The Lancet 365 (9459): 579–87. https://doi.org/10.1016/S0140-6736(05)17907-0.
Hashiguchi, Tiago Cravo Oliveira, Driss Ait Ouakrim, Michael Padget, Alessandro Cassini, and Michele Cecchini. 2019. “Resistance Proportions for Eight Priority Antibiotic-Bacterium Combinations in OECD, EU/EEA and G20 Countries 2000 to 2030: A Modelling Study.” Eurosurveillance 24 (20): 1800445. https://doi.org/10.2807/1560-7917.ES.2019.24.20.1800445.
Honaker, James, Gary King, and Matthew Blackwell. 2011. “Amelia II: A Program for Missing Data.” Journal of Statistical Software 45 (1, 1): 1–47. https://doi.org/10.18637/jss.v045.i07.
Hyndman, R. J., and G Athanasopoulos. 2021. Forecasting: Principles and Practice (3rd Ed). 3rd edition. Melbourne, Australia: OTexts. https://Otexts.com/fpp3/.
Ieven, M., S. Coenen, K. Loens, C. Lammens, F. Coenjaerts, A. Vanderstraeten, B. Henriques-Normark, et al. 2018. “Aetiology of Lower Respiratory Tract Infection in Adults in Primary Care: A Prospective Study in 11 European Countries.” Clinical Microbiology and Infection 24 (11): 1158–63. https://doi.org/10.1016/j.cmi.2018.02.004.
Koopmanschap, Marc A., Frans F. H. Rutten, B. Martin van Ineveld, and Leona van Roijen. 1995. “The Friction Cost Method for Measuring Indirect Costs of Disease.” Journal of Health Economics 14 (2): 171–89. https://doi.org/10.1016/0167-6296(94)00044-5.
Krijkamp, Eline M., Fernando Alarid-Escudero, Eva A. Enns, Hawre J. Jalal, M. G. Myriam Hunink, and Petros Pechlivanoglou. 2018. “Microsimulation Modeling for Health Decision Sciences Using R: A Tutorial.” Medical Decision Making 38 (3): 400–422. https://doi.org/10.1177/0272989X18754513.
Krol, Marieke, and Werner Brouwer. 2014. “How to Estimate Productivity Costs in Economic Evaluations.” PharmacoEconomics 32 (4): 335–44. https://doi.org/10.1007/s40273-014-0132-3.
Mangen, Marie-Josée J., Susanne M. Huijts, Marc J. M. Bonten, and G. Ardine de Wit. 2017. “The Impact of Community-Acquired Pneumonia on the Health-Related Quality-of-Life in Elderly.” BMC Infectious Diseases 17 (1): 208. https://doi.org/10.1186/s12879-017-2302-3.
OECD. 2018. “Long-Term Baseline Projections, No. 103.” OECD. https://doi.org/10.1787/68465614-en.
Oppong, Raymond, Richard D. Smith, Paul Little, Theo Verheij, Christopher C. Butler, Herman Goossens, Samuel Coenen, et al. 2018. “Cost-Effectiveness of Internet-Based Training for Primary Care Clinicians on Antibiotic Prescribing for Acute Respiratory Tract Infections in Europe.” Journal of Antimicrobial Chemotherapy 73 (11): 3189–98. https://doi.org/10.1093/jac/dky309.
Probst, Philipp, Marvin Wright, and Anne-Laure Boulesteix. 2019. “Hyperparameters and Tuning Strategies for Random Forest.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9 (3). https://doi.org/10.1002/widm.1301.
Rothery, Claire, Beth Woods, Laetitia Schmitt, Karl Claxton, Stephen Palmer, and Mark Sculpher. 2018. “Framework for Value Assessment of New Antimicrobials.” Policy Research Unit in Economic Evaluations of Health & Care Interventions: New York, NY, USA.
Szende, Agota, Bas Janssen, and Juan Cabases, eds. 2014. Self-Reported Population Health: An International Perspective Based on EQ-5d. Springer Netherlands. https://doi.org/10.1007/978-94-007-7596-1.
The World Bank. n.d. GDP Per Capita, PPP (Constant 2017 International $).” The World Bank Data. Accessed March 19, 2021. https://data.worldbank.org/indicator/NY.GDP.PCAP.PP.KD.
Vos, L. M., R. Bruyndonckx, N. P. A. Zuithoff, P. Little, J. J. Oosterheert, B. D. L. Broekhuizen, C. Lammens, et al. 2020. “Lower Respiratory Tract Infection in the Community: Associations Between Viral Aetiology and Illness Course.” Clinical Microbiology and Infection 0 (0). https://doi.org/10.1016/j.cmi.2020.03.023.
Wright, Marvin N., and Andreas Ziegler. 2017. “Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.” Journal of Statistical Software 77 (1, 1): 1–17. https://doi.org/10.18637/jss.v077.i01.