
Nonresponse: weighting classes and propensities
Source:vignettes/nonresponse-propensities.Rmd
nonresponse-propensities.RmdNonresponse adjustment inflates the weights of respondents so they
also represent the nonrespondents. step_nonresponse()
offers two routes: weighting classes and response propensity models.
This vignette explains both, when each is preferable, and how they are
estimated.
Throughout, only active units (weight > 0) take part, so cases already dropped earlier in the recipe (unknown eligibility, ineligible) are excluded automatically. We write for the weight entering the step, for the set of respondents, for the auxiliaries known for unit , and for the weight after the adjustment.
Both routes rest on the same assumption: response is ignorable given the auxiliaries (missing at random). That is, conditional on , responding is independent of the survey outcome ,
Under this assumption the respondents, reweighted by the inverse of their response propensity , represent the nonrespondents without bias. Choosing auxiliaries that are related both to responding and to the outcomes is therefore what makes the adjustment work.
Weighting classes
Units are partitioned into cells (the weighting classes) according to one or more categorical auxiliaries, and within each cell the respondents absorb the weight of the nonrespondents. The method rests on a homogeneity assumption: every unit in a cell is taken to have the same response probability, so that within the cell the respondents are a random subsample of the active units (response is MCAR within the cell, MAR across cells). Equivalently, it is a model in which the expected outcome is the same for respondents and nonrespondents of the same cell; the adjustment removes bias to the extent that this within-cell equality holds. Cells should therefore be chosen so that response rates differ between cells while the units inside a cell are homogeneous (i.e., similar in their propensity to respond and, ideally, in the survey outcomes).
This adjustment is the natural choice when nothing is known about the nonrespondents beyond what the sampling frame already carries (e.g., strata, primary sampling units, region, and other design variables available for sampled respondents and nonrespondents alike). When the auxiliaries are known for the whole population rather than only the sample, the same arithmetic becomes post-stratification.
The adjustment factor in a cell is the total weight of the active units over the weight of the respondents in that cell,
Each respondent’s weight is multiplied by and nonrespondents go to zero, so for . This is the special case of a propensity model in which is estimated by the (weighted) response rate within the cell — a single estimated propensity shared by every unit of the cell.
In step_nonresponse() the cells are specified through
the by argument, which names the categorical variables that
define them (here, region):
wf <- weighting_spec(sample_survey, base_weights = pw) |>
step_nonresponse(respondent = responded, method = "weighting_class",
by = "region") |>
prep()
summary(wf)
#>
#> == Weighting specification (weightflow) ==
#> Data : 467 cases
#> Base wts: pw
#> Steps :
#> 1. nonresponse (weighting class)
#> Status : estimated (prep)
#>
#> Stage summary:
#> stage n_active sum_wts cv_wts deff_kish n_eff
#> base 467 4371 0.236 1.056 442
#> stage_1_step_nonresponse 270 4371 0.144 1.021 265
#>
#> deff_kish = 1 + CV^2 (Kish design effect from unequal weighting);
#> n_eff = n_active / deff_kish. Both worsen with each adjustment and
#> improve with trimming.
#>
#> --- Step 1: nonresponse (weighting class) ---
#> cell n_respondents n_nonresponse factor
#> East 52 44 1.846154
#> North 78 41 1.525641
#> South 72 49 1.680556
#> West 68 63 1.926471
#> Kish deff: 1.056 -> 1.021 | n_eff: 442 -> 265Validation. By construction the total weight is preserved within each cell (the nonrespondents’ weight is moved to the respondents, not lost). So the weighted total per region after the step equals the base-weight total before it:
before <- tapply(sample_survey$pw, sample_survey$region, sum)
after <- tapply(wf$final_weight, sample_survey$region, sum)
round(cbind(before, after, diff = after - before), 6)
#> before after diff
#> North 1487.5000 1487.5000 0
#> South 1210.0000 1210.0000 0
#> East 800.0000 800.0000 0
#> West 873.3333 873.3333 0The differences are zero: weighting classes redistribute, they do not create or destroy weight.
Response propensities
Instead of cells, the probability of responding is modelled from auxiliaries known for respondents and nonrespondents alike,
and estimated by
.
The model is fitted on the active units, weighted by the current
weights, and two routes follow. With num_classes = NULL,
each respondent is weighted by the inverse propensity,
. With an integer
num_classes, units are grouped into that many classes
formed from quantiles of
and a weighting-class adjustment is applied within each, which is more
robust to a misspecified model.
Logistic regression
wf <- weighting_spec(sample_survey, base_weights = pw) |>
step_nonresponse(respondent = responded, method = "propensity",
formula = ~ region + sex + age, engine = "logit",
num_classes = 5) |>
prep()
summary(wf)
#>
#> == Weighting specification (weightflow) ==
#> Data : 467 cases
#> Base wts: pw
#> Steps :
#> 1. nonresponse (propensity: logit, 5 classes)
#> Status : estimated (prep)
#>
#> Stage summary:
#> stage n_active sum_wts cv_wts deff_kish n_eff
#> base 467 4371 0.236 1.056 442
#> stage_1_step_nonresponse 270 4371 0.155 1.024 264
#>
#> deff_kish = 1 + CV^2 (Kish design effect from unequal weighting);
#> n_eff = n_active / deff_kish. Both worsen with each adjustment and
#> improve with trimming.
#>
#> --- Step 1: nonresponse (propensity: logit, 5 classes) ---
#> propensity_class n mean_prop factor
#> [0.499,0.524] 94 0.5122335 2.224138
#> (0.524,0.547] 93 0.5341818 1.837719
#> (0.547,0.594] 101 0.5721654 1.602273
#> (0.594,0.643] 86 0.6162161 1.574468
#> (0.643,0.684] 93 0.6600845 1.576271
#> Kish deff: 1.056 -> 1.024 | n_eff: 442 -> 264Because the model is fitted with survey weights, a logistic fit may print a “non-integer #successes” message: that is expected for a weighted binomial fit and does not affect the estimated propensities.
Trees, forests and boosting
The same propensity can be estimated with a regression tree
(engine = "tree", package rpart), a random
forest (engine = "forest", package ranger), or
gradient boosting (engine = "boost", package
xgboost), which capture nonlinearities and interactions
without specifying them. More flexibility is not free, though: a very
flexible model can overfit the response and produce more dispersed
adjustment factors, which raises the variance of the weights (a higher
design effect). Compare the deff after each engine below, the forest and
boosting typically yield the largest, the weighting classes the
smallest.
wf <- weighting_spec(sample_survey, base_weights = pw) |>
step_nonresponse(respondent = responded, method = "propensity",
formula = ~ region + sex + age, engine = "tree",
num_classes = 5) |>
prep()
design_effect(wf$final_weight)$deff
#> [1] 1.055763
wf <- weighting_spec(sample_survey, base_weights = pw) |>
step_nonresponse(respondent = responded, method = "propensity",
formula = ~ region + sex + age, engine = "forest",
num_classes = 5) |>
prep()
design_effect(wf$final_weight)$deff
#> [1] 1.12651Flexibility, overfitting, and cross-fitting
The reason flexibility is not free deserves a closer look. A very flexible model can fit the noise of the particular sample in addition to the signal (overfitting). When the propensity is then predicted for the very units the model was trained on, the estimates are pulled toward the observed responses: some respondents receive artificially low propensities, and since the adjustment is , those units get extreme weights that inflate the variance. The model is not bad at prediction; i.e., it predicts too well in-sample and poorly out of it.
The remedy is cross-fitting: estimate each unit’s
propensity with a model trained on other units (held-out
folds), so the prediction is out-of-sample and free of this optimism.
weightflow provides it through the crossfit argument:
wf <- weighting_spec(sample_survey, base_weights = pw) |>
step_nonresponse(respondent = responded, method = "propensity",
formula = ~ region + sex + age, engine = "forest",
num_classes = 5, crossfit = 5, crossfit_seed = 1) |>
prep()
design_effect(wf$final_weight)$deff
#> [1] 1.050402The Machine learning, cross-fitting and robust calibration article develops the boosting engine and cross-fitting in full, with a worked comparison of the design effect with and without cross-fitting.
Person or household level
Nonresponse can occur at the person level (within a reached
household) or at the household level (the whole household is not
reached). The cluster argument moves the adjustment to the
household: each household counts once with its weight, and the
redistribution (or the propensity model) is done over households, then
assigned to their members.
wf <- weighting_spec(sample_survey, base_weights = pw) |>
step_nonresponse(respondent = responded, method = "weighting_class",
by = "region", cluster = "household_id") |>
prep()
design_effect(wf$final_weight)$deff
#> [1] 1.06383The level is dictated by what is known about the nonrespondents:
household auxiliaries and a whole-household outcome call for
cluster; person-level auxiliaries within reached households
do not. Note that the effective sample size drops more at the household
level, since households (not persons) are the independent units being
adjusted.
Which to use
Weighting classes need categorical auxiliaries and enough respondents
per cell; they are simple and transparent. Propensity models handle
continuous predictors and many auxiliaries at once, and the tree/forest
engines relax functional-form assumptions. Using propensity classes
(num_classes) rather than the direct
keeps the adjustment stable when the model is imperfect, at the cost of
some efficiency. In all cases, model the response on auxiliaries that
are both predictive of responding and related to the survey
outcomes.