balance Quickstart (raking): Analyzing and adjusting the bias on a simulated toy dataset¶

The raking method is an advanced technique that extends post-stratification. It is well-suited for situations where we have marginal distributions of multiple covariates and we don't know the joint distribution. Raking works by applying post-stratification to the data based on the first covariate, using the resulting output weights as input for adjustment based on the second covariate, and so forth. Once all covariates have been utilized for adjustment, the process is repeated until a specified level of convergence is attained

One of the main advantages of raking is its ability to work with user-level data while also utilizing marginal distributions that lack user-level granularity. Another benefit is its capacity to closely fit these distributions, depending on the convergence achieved. This is in contrast to techniques such as inverse probability weighting (IPW) and covariate balancing propensity score (CBPS), which may only approximate the data and potentially fail to fit them even at marginal levels.

This notebook demonstrates how to use the raking method and showcases the high degree of fit it can provide.

Load the data¶

In [1]:
%matplotlib inline

import plotly.offline as offline
offline.init_notebook_mode()

from balance import load_data
INFO (2026-02-21 04:46:08,150) [__init__/<module> (line 72)]: Using balance version 0.16.1
balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

In [2]:
target_df, sample_df = load_data()

print("target_df: \n", target_df.head())
print("sample_df: \n", sample_df.head())
target_df: 
        id gender age_group     income  happiness
0  100000   Male       45+  10.183951  61.706333
1  100001   Male       45+   6.036858  79.123670
2  100002   Male     35-44   5.226629  44.206949
3  100003    NaN       45+   5.752147  83.985716
4  100004    NaN     25-34   4.837484  49.339713
sample_df: 
   id  gender age_group     income  happiness
0  0    Male     25-34   6.428659  26.043029
1  1  Female     18-24   9.940280  66.885485
2  2    Male     18-24   2.673623  37.091922
3  3     NaN     18-24  10.550308  49.394050
4  4     NaN     18-24   2.689994  72.304208
In [3]:
from balance import Sample

Raking can work with numerical variables since the variable is automatically bucketed. But for the simplicity of the discussion, we'll focus only on age and gender.

In [4]:
sample = Sample.from_frame(sample_df[['id', 'gender', 'age_group', 'happiness']], outcome_columns=["happiness"])
target = Sample.from_frame(target_df[['id', 'gender', 'age_group', 'happiness']], outcome_columns=["happiness"])
sample_with_target = sample.set_target(target)
WARNING (2026-02-21 04:46:08,355) [input_validation/guess_id_column (line 337)]: Guessed id column name id for the data
WARNING (2026-02-21 04:46:08,367) [sample_class/from_frame (line 549)]: No weights passed. Adding a 'weight' column and setting all values to 1
WARNING (2026-02-21 04:46:08,379) [input_validation/guess_id_column (line 337)]: Guessed id column name id for the data
WARNING (2026-02-21 04:46:08,392) [sample_class/from_frame (line 549)]: No weights passed. Adding a 'weight' column and setting all values to 1

Fit models using ipw and rake¶

Fit an ipw model:

In [5]:
adjusted_ipw = sample_with_target.adjust(method = "ipw")
INFO (2026-02-21 04:46:08,407) [ipw/ipw (line 703)]: Starting ipw function
INFO (2026-02-21 04:46:08,409) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-02-21 04:46:08,410) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group']
INFO (2026-02-21 04:46:08,415) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group']
INFO (2026-02-21 04:46:08,420) [ipw/ipw (line 738)]: Building model matrix
INFO (2026-02-21 04:46:08,499) [ipw/ipw (line 764)]: The formula used to build the model matrix: ['gender + age_group + _is_na_gender']
INFO (2026-02-21 04:46:08,500) [ipw/ipw (line 767)]: The number of columns in the model matrix: 7
INFO (2026-02-21 04:46:08,500) [ipw/ipw (line 768)]: The number of rows in the model matrix: 11000
INFO (2026-02-21 04:46:23,547) [ipw/ipw (line 990)]: Done with sklearn
INFO (2026-02-21 04:46:23,548) [ipw/ipw (line 992)]: max_de: None
INFO (2026-02-21 04:46:23,549) [ipw/ipw (line 1014)]: Starting model selection
INFO (2026-02-21 04:46:23,552) [ipw/ipw (line 1047)]: Chosen lambda: 0.041158338186664825
INFO (2026-02-21 04:46:23,553) [ipw/ipw (line 1065)]: Proportion null deviance explained 0.11579381555381918

Fit a raking model (on the user level data as input):

In [6]:
adjusted_rake = sample_with_target.adjust(method = "rake")
INFO (2026-02-21 04:46:23,571) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-02-21 04:46:23,571) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group']
INFO (2026-02-21 04:46:23,577) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group']
INFO (2026-02-21 04:46:23,586) [rake/rake (line 274)]: Final covariates and levels that will be used in raking: {'age_group': ['18-24', '25-34', '35-44', '45+'], 'gender': ['Female', 'Male', '__NaN__']}.

When comparing the results of ipw and rake, we can see that rake has a larger design effect, and that it provides a perfect fit. In contrast, ipw gives only a partial fit.

We can see it in the ASMD and also the bar plots.

In [7]:
print(adjusted_ipw.summary())
Adjustment details:
    method: ipw
    weight trimming mean ratio: 20
Covariate diagnostics:
    Covar ASMD reduction: 77.6%
    Covar ASMD (6 variables): 0.243 -> 0.054
    Covar mean KLD reduction: 92.2%
    Covar mean KLD (2 variables): 0.179 -> 0.014
Weight diagnostics:
    design effect (Deff): 1.527
    effective sample size proportion (ESSP): 0.655
    effective sample size (ESS): 654.8
Outcome weighted means:
            happiness
source               
self           53.889
target         56.278
unadjusted     48.559
Model performance: Model proportion deviance explained: 0.116
In [8]:
print(adjusted_rake.summary())
Adjustment details:
    method: rake
Covariate diagnostics:
    Covar ASMD reduction: 100.0%
    Covar ASMD (6 variables): 0.243 -> 0.000
    Covar mean KLD reduction: 100.0%
    Covar mean KLD (2 variables): 0.179 -> 0.000
Weight diagnostics:
    design effect (Deff): 2.103
    effective sample size proportion (ESSP): 0.476
    effective sample size (ESS): 475.6
Outcome weighted means:
            happiness
source               
self           55.484
target         56.278
unadjusted     48.559
In [9]:
adjusted_ipw.covars().plot()
In [10]:
adjusted_rake.covars().plot()

Outcome analysis¶

In [11]:
print(adjusted_ipw.outcomes().summary())
adjusted_ipw.outcomes().plot()
1 outcomes: ['happiness']
Mean outcomes (with 95% confidence intervals):
source       self  target  unadjusted           self_ci         target_ci     unadjusted_ci
happiness  53.889  56.278      48.559  (52.736, 55.042)  (55.961, 56.595)  (47.669, 49.449)

Weights impact on outcomes (t_test):
           mean_yw0  mean_yw1  mean_diff  diff_ci_lower  diff_ci_upper  t_stat  p_value       n
outcome                                                                                        
happiness    48.559    53.889       5.33           2.58          8.081   3.803      0.0  1000.0

Response rates (relative to number of respondents in sample):
   happiness
n     1000.0
%      100.0
Response rates (relative to notnull rows in the target):
    happiness
n     1000.0
%       10.0
Response rates (in the target):
    happiness
n    10000.0
%      100.0

The above shows the estimated mean happiness for our sample: unadjusted, IPW-adjusted, and target values. The following shows the corresponding happiness outcomes after raking:

In [12]:
print(adjusted_rake.outcomes().summary())
adjusted_rake.outcomes().plot()
1 outcomes: ['happiness']
Mean outcomes (with 95% confidence intervals):
source       self  target  unadjusted           self_ci         target_ci     unadjusted_ci
happiness  55.484  56.278      48.559  (54.173, 56.796)  (55.961, 56.595)  (47.669, 49.449)

Weights impact on outcomes (t_test):
           mean_yw0  mean_yw1  mean_diff  diff_ci_lower  diff_ci_upper  t_stat  p_value       n
outcome                                                                                        
happiness    48.559    55.484      6.926          2.827         11.024   3.316    0.001  1000.0

Response rates (relative to number of respondents in sample):
   happiness
n     1000.0
%      100.0
Response rates (relative to notnull rows in the target):
    happiness
n     1000.0
%       10.0
Response rates (in the target):
    happiness
n    10000.0
%      100.0

As we can see, both IPW and raking impact the outcome estimate. Raking achieves exact balance on the target marginals (as shown by the perfect ASMD scores earlier), which can be important when precise matching to known population distributions is required.

Using marginal distribution with rake¶

The benefit of rake is that we can define a target population from a marginal distribution, and fit towards it. The function to use for this purpose is prepare_marginal_dist_for_raking.

In order to demonstrate this point, let us assume we have another target population in mind, with different proportions. Since it is known, we can create a sample with that target population based on a dict of marginal distributions using the realize_dicts_of_proportions function.

In [13]:
from balance.weighting_methods.rake import prepare_marginal_dist_for_raking
# import pandas as pd
import numpy as np

a_dict_with_marginal_distributions = {"gender": {"Female": 0.1, "Male": 0.85, np.nan: 0.05}, "age_group": {"18-24": 0.25, "25-34": 0.25, "35-44": 0.25, "45+": 0.25}}

target_df_from_marginals = prepare_marginal_dist_for_raking(a_dict_with_marginal_distributions)
In [14]:
target_df_from_marginals
Out[14]:
gender age_group id
0 Female 18-24 0
1 Female 25-34 1
2 Male 35-44 2
3 Male 45+ 3
4 Male 18-24 4
5 Male 25-34 5
6 Male 35-44 6
7 Male 45+ 7
8 Male 18-24 8
9 Male 25-34 9
10 Male 35-44 10
11 Male 45+ 11
12 Male 18-24 12
13 Male 25-34 13
14 Male 35-44 14
15 Male 45+ 15
16 Male 18-24 16
17 Male 25-34 17
18 Male 35-44 18
19 NaN 45+ 19
In [15]:
target_df_from_marginals.info()
<class 'pandas.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   gender     19 non-null     str  
 1   age_group  20 non-null     str  
 2   id         20 non-null     int64
dtypes: int64(1), str(2)
memory usage: 612.0 bytes

With the new target_df_from_marginals object ready, we can use it as a target. Notice that this makes sense ONLY for the raking method. This should NOT be used for any other method.

In [16]:
target_from_marginals = Sample.from_frame(target_df_from_marginals)
sample_with_target_2 = sample.set_target(target_from_marginals)
WARNING (2026-02-21 04:46:25,038) [input_validation/guess_id_column (line 337)]: Guessed id column name id for the data
WARNING (2026-02-21 04:46:25,039) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-02-21 04:46:25,049) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-02-21 04:46:25,050) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-02-21 04:46:25,051) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
id    int64
dtype: object
WARNING (2026-02-21 04:46:25,051) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-02-21 04:46:25,052) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
id    str
dtype: object
WARNING (2026-02-21 04:46:25,053) [sample_class/from_frame (line 549)]: No weights passed. Adding a 'weight' column and setting all values to 1

And fit a raking model:

In [17]:
adjusted_rake_2 = sample_with_target_2.adjust(method = "rake")
INFO (2026-02-21 04:46:25,064) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-02-21 04:46:25,065) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group']
INFO (2026-02-21 04:46:25,068) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group']
INFO (2026-02-21 04:46:25,072) [rake/rake (line 274)]: Final covariates and levels that will be used in raking: {'age_group': ['18-24', '25-34', '35-44', '45+'], 'gender': ['Female', 'Male', '__NaN__']}.

As the following code shows, we get our data to have a perfect fit to the marginal distribution defined for age and gender.

In [18]:
print(adjusted_rake_2.summary())
Adjustment details:
    method: rake
Covariate diagnostics:
    Covar ASMD reduction: 100.0%
    Covar ASMD (6 variables): 0.341 -> 0.000
    Covar mean KLD reduction: 100.0%
    Covar mean KLD (2 variables): 0.183 -> 0.000
Weight diagnostics:
    design effect (Deff): 2.176
    effective sample size proportion (ESSP): 0.460
    effective sample size (ESS): 459.6
Outcome weighted means:
            happiness
source               
self           49.383
unadjusted     48.559
In [19]:
adjusted_rake_2.covars().plot()
In [ ]: