Reports standardized differences in means between the treated and control group before and after choosing a subset of controls. These differences are reported both across strata and within strata. This function can also generate love plots of the same quantities.

check_balance(
  z,
  X,
  st,
  selected,
  treated = 1,
  control = 0,
  denom_variance = "treated",
  plot = FALSE,
  message = TRUE
)

Arguments

z

a factor with the ith entry equal to the treatment of unit i.

X

a data frame containing the covariates in the columns over which balance is desired. The number of rows should equal the length of z.

st

a stratum vector with the ith entry equal to the stratum of unit i. This should have the same order of units and length as z.

selected

a boolean vector including whether each unit was selected as part of the treated and control groups for analysis. Should be the same length as z and typically comes from the results of optimize_controls().

treated

which treatment value should be considered the treated units. This must be one of the values of z.

control

which treatment value should be considered the control units. This must be one of the values of z.

denom_variance

character stating what variance to use in the standardization: either the default "treated", meaning the standardization will use the treated variance (across all strata), where the treated group is declared in the treated argument, or "pooled", meaning the standardization will use the average of the variances of each treatment group.

plot

a boolean denoting whether to generate love plots for the standardized differences.

message

a boolean denoting whether to print a message about the level of balance achieved

Value

List containing:

sd_across

matrix with one row per covariate and two columns: one for the standardized difference before a subset of controls were selected and one for after.

sd_strata

matrix similar to sd_across, but with separate standardized differences for each stratum for each covariate.

sd_strata_avg

matrix similar to sd_across, but taking the average of the standardized differences within the strata, weighted by stratum size.

plot_across

ggplot object plotting sd_across, only exists if plot = TRUE.

plot_strata

a named list of ggplot objects plotting sd_strata, one for each stratum, only exists if plot = TRUE.

plot_strata_avg

ggplot object plotting sd_strata_avg, only exists if plot = TRUE.

plot_pair

ggplot object with two facets displaying sd_across and sd_strata_avg with one y-axis, only exists if plot = TRUE.

Examples


data('nh0506')

# Create strata
age_cat <- cut(nh0506$age,
               breaks = c(19, 39, 50, 85),
               labels = c('< 40 years', '40 - 50 years', '> 50 years'))
strata <- age_cat : nh0506$sex

# Balance age, race, education, poverty ratio, and bmi both across and within the levels of strata
constraints <- generate_constraints(
                 balance_formulas = list(age + race + education + povertyr + bmi ~ 1 + strata),
                 z = nh0506$z,
                 data = nh0506)

# Choose one control for every treated unit in each stratum,
# balancing the covariates as described by the constraints
results <- optimize_controls(z = nh0506$z,
                             X = constraints$X,
                             st = strata,
                             importances = constraints$importances,
                             ratio = 1)

cov_data <- nh0506[, c('sex', 'age', 'race', 'education', 'povertyr', 'bmi')]

# Check balance
stand_diffs <- check_balance(z = nh0506$z,
                             X = cov_data,
                             st = strata,
                             selected = results$selected,
                             plot = TRUE)