weightflow computes weights and also estimates their variances. This vignette shows two ways to obtain standard errors from a weightflow recipe, and how they relate.
Throughout, is the population and the sample; is the final weight of unit ; and a population total is written , estimated by . The sample is drawn in clusters: primary sampling units (PSUs) nested in strata.
Why the adjustments matter for variance
A weighting recipe rarely stops at the design weight. It redistributes unknown eligibility, drops out-of-scope units, adjusts for nonresponse and calibrates to known totals. Each of those stages is estimated from the sample, so each one adds (or, for calibration, often removes) variability.
A linearization that takes the final weights as fixed and applies the ultimate-cluster formula ignores that the nonresponse and calibration steps were themselves estimated. The cleanest way to account for them is to re-run the whole recipe on each replicate, so the replicate weights carry the variability of every stage.
Method 1: a PSU bootstrap that re-applies the recipe
bootstrap_weights() resamples primary sampling units
(PSUs) with replacement within strata and re-runs the recipe on each
replicate. Pass the inert recipe (do not call
prep() first): the bootstrap preps it once per
replicate.
dat <- sample_one
dat$age_grp <- cut(dat$age, c(0, 30, 45, 60, Inf),
labels = c("18-30", "31-45", "46-60", "60+"))
spec <- weighting_spec(dat, base_weights = pw) |>
step_unknown_eligibility(unknown = unknown_elig, by = "region") |>
step_drop_ineligible(ineligible = ineligible) |>
step_nonresponse(respondent = hh_responded, method = "weighting_class",
by = "region") |>
step_select_within(prob = p_within) |>
step_nonresponse(respondent = responded, method = "weighting_class",
by = c("region", "sex", "age_grp")) |>
step_calibrate(method = "raking",
margins = list(region = c(table(population$region)),
sex = c(table(population$sex))))
boot <- bootstrap_weights(spec, replicates = 200, strata = "region",
psu = "psu", seed = 2024, progress = FALSE)
boot
#> <weightflow bootstrap>
#> replicates : 200
#> units : 417 (active: 209)
#> strata : region
#> psu : psuThe multiplier is the Rao-Wu rescaling bootstrap. Consider a stratum with PSUs, from which are drawn with replacement (by default ). Let be the number of times PSU is selected in a replicate. Every unit in that PSU has its weight rescaled by
so the replicate weight is . The factor has expectation one over the resampling, , which keeps each replicate design-unbiased, and the construction never turns it negative, so the recipe can be re-prepped on every replicate without invalid weights. Whole PSUs are kept together (every unit in a drawn PSU is retained), as the design’s clustering requires.
Estimates with bootstrap standard errors
Writing for the point estimate and for its value on replicate (each computed from the re-prepped replicate weights), the bootstrap variance is the average squared deviation across the replicates,
boot_mean(boot, "income") # mean income
#> estimate se ci_lower ci_upper
#> 1 21615.21 872.7788 19904.59 23325.82
boot_total(boot, "employed") # total employed
#> estimate se ci_lower ci_upper
#> 1 1927.219 140.9421 1650.978 2203.461
boot_mean(boot, "employed") # employment rate
#> estimate se ci_lower ci_upper
#> 1 0.4287473 0.03102821 0.3679331 0.4895615For any other statistic, pass a function of the weights and the data
to bootstrap_estimate():
bootstrap_estimate(boot, function(w, d) {
ok <- !is.na(d$income) & w > 0
stats::median(rep(d$income[ok], times = round(w[ok]))) # weighted median (approx.)
})
#> estimate se ci_lower ci_upper
#> 1 18136 930.15 16312.94 19959.06Method 2: hand the weights to the survey package
as_svydesign() builds an ultimate-cluster linearization
design from a prepped recipe. It is fast, but treats the calibration as
fixed.
fitted <- prep(spec)
des <- as_svydesign(fitted, ids = "psu", strata = "region")
survey::svymean(~income, des, na.rm = TRUE)
#> mean SE
#> income 21615 989.34To keep the recipe’s adjustments in the variance while still using survey, feed it the bootstrap replicate weights from method 1:
rep_des <- as_svrepdesign(boot)
survey::svymean(~income, rep_des, na.rm = TRUE)
#> mean SE
#> income 21615 872.78This matches boot_mean(boot, "income") exactly, because
as_svrepdesign() sets scale = 1 / B,
rscales = 1 and mse = TRUE.
Replicate weights for a tidyverse workflow
collect_replicate_weights() attaches the point weight
(.weight) and the replicate weights (rep_1 …
rep_B) to the active respondents, ready for srvyr.
df <- collect_replicate_weights(boot)
d_rep <- srvyr::as_survey_rep(df, weights = .weight,
repweights = dplyr::starts_with("rep_"),
type = "bootstrap", combined.weights = TRUE,
scale = 1 / attr(df, "R"), rscales = 1, mse = TRUE)
srvyr::summarise(d_rep, mean_income = srvyr::survey_mean(income, na.rm = TRUE))
#> # A tibble: 1 × 2
#> mean_income mean_income_se
#> <dbl> <dbl>
#> 1 21615. 873.Which one to use
Use the recipe-aware bootstrap (method 1, in any of its three forms) when the nonresponse and calibration steps are a meaningful part of the design and you want their uncertainty reflected; it is the more honest variance. Use the linearization (method 2) for a quick, well-understood standard error when the adjustments are minor or you only need the design-and-clustering part.
A few practical notes. More replicates give a more stable bootstrap
SE; 200 is fine for exploration, 500-1000 for final figures. Each
stratum needs at least two PSUs to be resampled (single-PSU strata are
left untouched, with a warning). If a replicate leaves a calibration or
weighting-class cell empty it is dropped with a warning; coarser
by cells make the bootstrap more robust.
