from balance import load_data
from balance.weighting_methods.poststratify import poststratify
import pandas as pd

target_df, sample_df = load_data()
target_df.head()

INFO (2026-01-08 21:58:59,178) [__init__/<module> (line 72)]: Using balance version 0.14.x

balance (Version 0.14.x) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

sample_gender = sample_df.dropna(subset=["gender"])
target_gender = target_df.dropna(subset=["gender"])

gender_result = poststratify(
    sample_df=sample_gender[["gender"]],
    sample_weights=pd.Series(1, index=sample_gender.index),
    target_df=target_gender[["gender"]],
    target_weights=pd.Series(1, index=target_gender.index),
)

gender_weights = sample_gender.assign(weight=gender_result["weight"])
gender_summary = pd.concat(
    [
        gender_weights.groupby("gender")["weight"].sum().rename("weighted_sample"),
        target_gender.groupby("gender").size().rename("target_population"),
    ],
    axis=1,
)
gender_summary

INFO (2026-01-08 21:58:59,216) [adjustment/apply_transformations (line 470)]: Adding the variables: []

INFO (2026-01-08 21:58:59,217) [adjustment/apply_transformations (line 471)]: Transforming the variables: ['gender']

INFO (2026-01-08 21:58:59,221) [adjustment/apply_transformations (line 506)]: Final variables in output: ['gender']

covariates = ["gender", "age_group"]
sample_cells = sample_df.dropna(subset=covariates)
target_cells = target_df.dropna(subset=covariates)

joint_result = poststratify(
    sample_df=sample_cells[covariates],
    sample_weights=pd.Series(1, index=sample_cells.index),
    target_df=target_cells[covariates],
    target_weights=pd.Series(1, index=target_cells.index),
)

joint_weights = sample_cells.assign(weight=joint_result["weight"])
joint_summary = pd.concat(
    [
        joint_weights.groupby(covariates)["weight"].sum().rename("weighted_sample"),
        target_cells.groupby(covariates).size().rename("target_population"),
    ],
    axis=1,
)
joint_summary

INFO (2026-01-08 21:58:59,248) [adjustment/apply_transformations (line 470)]: Adding the variables: []

INFO (2026-01-08 21:58:59,249) [adjustment/apply_transformations (line 471)]: Transforming the variables: ['gender', 'age_group']

INFO (2026-01-08 21:58:59,256) [adjustment/apply_transformations (line 506)]: Final variables in output: ['gender', 'age_group']

	id	gender	age_group	income	happiness
0	100000	Male	45+	10.183951	61.706333
1	100001	Male	45+	6.036858	79.123670
2	100002	Male	35-44	5.226629	44.206949
3	100003	NaN	45+	5.752147	83.985716
4	100004	NaN	25-34	4.837484	49.339713

	weighted_sample	target_population
gender
Female	4551.0	4551
Male	4551.0	4551

		weighted_sample	target_population
gender	age_group
Female	18-24	876.0	876
	25-34	1360.0	1360
	35-44	1370.0	1370
	45+	945.0	945
Male	18-24	905.0	905
	25-34	1355.0	1355
	35-44	1347.0	1347
	45+	944.0	944

balance Quickstart (post-stratify): Matching known cell totals¶

1. Load simulated data¶

2. Post-stratify on a single variable¶

3. Post-stratify on the joint distribution of two variables¶