
Validation against the survey package
Source:vignettes/validation-against-survey.Rmd
validation-against-survey.Rmdweightflow’s calibration is meant to reproduce the established
results of the survey package on the methods they share —
raking, post-stratification and linear (GREG) calibration — while adding
the staged cascade (eligibility, nonresponse, selection) and a
recipe-aware bootstrap on top. This vignette checks that agreement
directly: on the same starting weights and the same control totals, the
two packages return the same weights.
To make every unit comparable one-to-one, the recipes below use only the calibration step (no dropping or nonresponse), so no rows are removed.
d <- sample_survey
N <- nrow(population)Post-stratification
Post-stratifying to the population counts of region:
each region’s weights are rescaled so the weighted count matches the
known total.
library(survey)
#> Loading required package: grid
#> Loading required package: Matrix
#> Loading required package: survival
#>
#> Attaching package: 'survey'
#> The following object is masked from 'package:graphics':
#>
#> dotchart
# weightflow
wf <- weighting_spec(d, base_weights = pw) |>
step_calibrate(method = "poststratify",
margins = list(region = c(table(population$region)))) |>
prep()
w_wf <- wf$final_weight
# survey
des <- svydesign(ids = ~1, weights = ~pw, data = d)
pr <- data.frame(region = names(table(population$region)),
Freq = as.numeric(table(population$region)))
des_ps <- postStratify(des, ~region, pr)
w_sv <- weights(des_ps)
c(max_abs_weight_diff = max(abs(w_wf - w_sv)))
#> max_abs_weight_diff
#> 0Raking
Raking (iterative proportional fitting) to the region
and sex margins. We tighten survey’s
convergence so both solve the system to the same precision.
# weightflow
wf <- weighting_spec(d, base_weights = pw) |>
step_calibrate(method = "raking",
margins = list(region = c(table(population$region)),
sex = c(table(population$sex)))) |>
prep()
w_wf <- wf$final_weight
# survey (tight epsilon so it fully converges, like weightflow)
des <- svydesign(ids = ~1, weights = ~pw, data = d)
ps <- data.frame(sex = names(table(population$sex)),
Freq = as.numeric(table(population$sex)))
des_rk <- rake(des, list(~region, ~sex), list(pr, ps),
control = list(epsilon = 1e-10, maxit = 100))
w_sv <- weights(des_rk)
c(max_abs_weight_diff = max(abs(w_wf - w_sv)))
#> max_abs_weight_diff
#> 1.110972e-09Linear (GREG) calibration
Linear calibration to the totals of the design matrix of
~ region + sex, including the intercept (the population
size N).
totals <- colSums(model.matrix(~ region + sex, population))
# weightflow
wf <- weighting_spec(d, base_weights = pw) |>
step_calibrate(method = "linear", formula = ~ region + sex, totals = totals) |>
prep()
w_wf <- wf$final_weight
# survey
des <- svydesign(ids = ~1, weights = ~pw, data = d)
des_cal <- calibrate(des, ~ region + sex, population = totals, calfun = "linear")
w_sv <- weights(des_cal)
c(max_abs_weight_diff = max(abs(w_wf - w_sv)))
#> max_abs_weight_diff
#> 3.552714e-15Same estimates
The agreement carries over to estimates. A calibrated total of a survey outcome matches between the two packages:
wf <- weighting_spec(d, base_weights = pw) |>
step_calibrate(method = "raking",
margins = list(region = c(table(population$region)),
sex = c(table(population$sex)))) |>
prep()
total_wf <- sum(wf$final_weight * d$employed, na.rm = TRUE)
des <- svydesign(ids = ~1, weights = ~pw, data = d)
des_rk <- rake(des, list(~region, ~sex), list(pr, ps),
control = list(epsilon = 1e-10, maxit = 100))
total_sv <- as.numeric(svytotal(~employed, des_rk, na.rm = TRUE))
c(weightflow = total_wf, survey = total_sv, difference = total_wf - total_sv)
#> weightflow survey difference
#> 1.084772e+03 1.084772e+03 3.450396e-09What weightflow adds
The point of agreement is trust: where the methods overlap,
weightflow returns exactly what survey does. On top of that
shared core, weightflow contributes the staged cascade
— unknown eligibility, ineligible dropping, within-household selection,
and person- or household-level nonresponse, each as a pipeable step with
diagnostics — and a bootstrap that re-applies the whole
recipe on each replicate, so the variance reflects every
adjustment (see the Variance estimation article). For
design-based inference you can always export the final weights back to
survey/srvyr.