A realistic multistage design (stratum -> PSU -> household, then one selected
person per household). Unknown-eligibility and ineligible addresses appear as
single rows with no roster; resolved eligible households are either reached
(a roster is obtained) or are household nonresponse; in reached households one
person is selected with an unequal within-household probability and may or may
not respond. Supports the full household pipeline: household-level eligibility
(cluster), dropping ineligibles, household and person nonresponse, and
step_select_within. Generated by data-raw/weightflow_data.R.
Format
A data frame with one row per sampled household (the selected person, or a single placeholder row for non-roster cases):
- person_id, household_id, psu
identifiers
- region
stratum
- sex, age
selected person's attributes (NA on non-roster rows)
- pw
design base weight (product of the stage selection probabilities)
- status
"eligible", "ineligible" or "unknown"
- unknown_elig
1 if eligibility is unknown (no roster)
- ineligible
1 if the address is out of scope (no roster)
- hh_responded
1 reached, 0 household nonresponse, NA for non-eligible
- responded
1 if the selected person responded (NA on non-roster rows)
- n_elig
number of eligible persons in the household (NA on non-roster rows)
- p_within
within-household selection probability of the selected person
- income, employed
survey outcomes; NA unless the person responded
