balance Quickstart: New API (SampleFrame / BalanceFrame)¶

This tutorial demonstrates the new SampleFrame + BalanceFrame API introduced in balance 0.18.0. It mirrors the original balance_quickstart.ipynb step-by-step, but uses only the new classes — no Sample object is needed.

Why a new API?¶

Old API (Sample) New API (SampleFrame / BalanceFrame)
Column roles inferred by exclusion Column roles declared explicitly
Mutable .set_target() / .adjust() Immutable — adjust() returns a new object
One class does everything Clear separation: data container vs. adjustment orchestrator
Weight provenance not tracked Weight metadata recorded per-column

The old Sample API still works and is fully supported; this notebook simply shows how to do the same analysis with the new classes.

Analysis¶

There are four main steps to analysis with the new API:

  1. Load data into pandas DataFrames
  2. Create SampleFrame objects with explicit column roles
  3. Build a BalanceFrame, adjust, and inspect diagnostics
  4. Output results (CSV, download)

Example dataset¶

The following is a toy simulated dataset (same data used in the original quickstart).

In [1]:
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

from balance import load_data

target_df, sample_df = load_data()

print("target_df: \n", target_df.head())
print("sample_df: \n", sample_df.head())
INFO (2026-04-09 17:19:51,344) [__init__/<module> (line 75)]: Using balance version 0.19.0
INFO (2026-04-09 17:19:51,344) [__init__/<module> (line 80)]: 
balance (Version 0.19.0) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

target_df: 
        id gender age_group     income  happiness
0  100000   Male       45+  10.183951  61.706333
1  100001   Male       45+   6.036858  79.123670
2  100002   Male     35-44   5.226629  44.206949
3  100003    NaN       45+   5.752147  83.985716
4  100004    NaN     25-34   4.837484  49.339713
sample_df: 
   id  gender age_group     income  happiness
0  0    Male     25-34   6.428659  26.043029
1  1  Female     18-24   9.940280  66.885485
2  2    Male     18-24   2.673623  37.091922
3  3     NaN     18-24  10.550308  49.394050
4  4     NaN     18-24   2.689994  72.304208

In practice, you can use pandas.read_csv() (or any pandas loader) to import your own data. The new API also provides SampleFrame.from_csv() for a one-step shortcut.

Load data into SampleFrame objects¶

With the old API you would call Sample.from_frame(df). The new API uses SampleFrame.from_frame() where you explicitly declare which columns are covariates, outcomes, etc. If you omit these arguments, the factory auto-detects roles the same way Sample does (by exclusion from the id and weight columns).

In [2]:
from balance import SampleFrame, BalanceFrame
In [3]:
sample_sf = SampleFrame.from_frame(
    sample_df,
    outcome_columns=["happiness"],
)
# Often times we don't have the outcome for the target.
# In this case we've added it just to validate later that
# the weights indeed help us reduce the bias.
target_sf = SampleFrame.from_frame(
    target_df,
    outcome_columns=["happiness"],
)
WARNING (2026-04-09 17:19:51,395) [input_validation/guess_id_column (line 336)]: Guessed id column name id for the data
WARNING (2026-04-09 17:19:51,407) [sample_frame/from_frame (line 326)]: No weights passed. Adding a 'weight' column and setting all values to 1
WARNING (2026-04-09 17:19:51,409) [input_validation/guess_id_column (line 336)]: Guessed id column name id for the data
WARNING (2026-04-09 17:19:51,424) [sample_frame/from_frame (line 326)]: No weights passed. Adding a 'weight' column and setting all values to 1

Inspecting a SampleFrame¶

You can inspect the column roles and data shape at any time. Unlike Sample.df, each role is a separate property — no magic "everything-that's-left" inference.

In [4]:
print(f"Covariates:  {sample_sf.covar_columns}")
print(f"Outcomes:    {sample_sf.outcome_columns}")
print(f"Weight cols: {sample_sf.weight_columns_all}")
print(f"Active wt:   {sample_sf.weight_column}")
print(f"ID column:   {sample_sf.id_column_name}")
print(f"Rows:        {len(sample_sf)}")
Covariates:  ['gender', 'age_group', 'income']
Outcomes:    ['happiness']
Weight cols: ['weight']
Active wt:   weight
ID column:   id
Rows:        1000
In [5]:
sample_sf.df.info()
<class 'pandas.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   id         1000 non-null   str    
 1   gender     912 non-null    str    
 2   age_group  1000 non-null   str    
 3   income     1000 non-null   float64
 4   happiness  1000 non-null   float64
 5   weight     1000 non-null   float64
dtypes: float64(3), str(3)
memory usage: 47.0 KB
In [6]:
print(sample_sf)
print(target_sf)
SampleFrame: 1000 observations x 3 covariates: gender,age_group,income
  id_column: id, weight_columns_all: ['weight'], outcome_columns: happiness
SampleFrame: 10000 observations x 3 covariates: gender,age_group,income
  id_column: id, weight_columns_all: ['weight'], outcome_columns: happiness

Create a BalanceFrame¶

With the old API you would call sample.set_target(target). The new API constructs a BalanceFrame directly from two SampleFrame objects.

A BalanceFrame is immutable — adjust() returns a new BalanceFrame rather than mutating the existing one.

In [7]:
bf = BalanceFrame(sample=sample_sf, target=target_sf)
print(bf)
        balance Sample object with target set
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
            target:
                 SampleFrame: 10000 observations x 3 covariates: gender,age_group,income
	  id_column: id, weight_columns_all: ['weight'], outcome_columns: happiness
            3 common variables: gender,age_group,income
            

Pre-Adjustment Diagnostics¶

The .covars(), .weights(), and .outcomes() methods return the same BalanceDFCovars / BalanceDFWeights / BalanceDFOutcomes objects as the old API. All of .mean(), .asmd(), .plot(), etc. work identically.

In [8]:
print(bf.covars().mean().T)
source                     self     target
_is_na_gender[T.True]  0.088000   0.089800
age_group[T.25-34]     0.300000   0.297400
age_group[T.35-44]     0.156000   0.299200
age_group[T.45+]       0.053000   0.206300
gender[Female]         0.268000   0.455100
gender[Male]           0.644000   0.455100
gender[_NA]            0.088000   0.089800
income                 6.297302  12.737608
In [9]:
print(bf.covars().asmd().T)
source                  self
age_group[T.25-34]  0.005688
age_group[T.35-44]  0.312711
age_group[T.45+]    0.378828
gender[Female]      0.375699
gender[Male]        0.379314
gender[_NA]         0.006296
income              0.494217
mean(asmd)          0.326799
In [10]:
print(bf.covars().asmd(aggregate_by_main_covar=True).T)
source          self
age_group   0.232409
gender      0.253769
income      0.494217
mean(asmd)  0.326799

Visualizing the unadjusted comparison¶

In [11]:
bf.covars().plot()

Adjusting Sample to Population¶

The default method is 'ipw' (inverse probability/propensity weights via logistic regression with lasso regularization).

Key difference from the old API: adjust() returns a new BalanceFrame — the original bf is unchanged.

In [12]:
adjusted = bf.adjust()
print(adjusted)
INFO (2026-04-09 17:19:57,466) [ipw/ipw (line 703)]: Starting ipw function
INFO (2026-04-09 17:19:57,468) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-04-09 17:19:57,469) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group', 'income']
INFO (2026-04-09 17:19:57,480) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group', 'income']
INFO (2026-04-09 17:19:57,488) [ipw/ipw (line 738)]: Building model matrix
INFO (2026-04-09 17:19:57,601) [ipw/ipw (line 764)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']
INFO (2026-04-09 17:19:57,602) [ipw/ipw (line 767)]: The number of columns in the model matrix: 16
INFO (2026-04-09 17:19:57,602) [ipw/ipw (line 768)]: The number of rows in the model matrix: 11000
INFO (2026-04-09 17:20:15,411) [ipw/ipw (line 990)]: Done with sklearn
INFO (2026-04-09 17:20:15,412) [ipw/ipw (line 992)]: max_de: None
INFO (2026-04-09 17:20:15,413) [ipw/ipw (line 1014)]: Starting model selection
INFO (2026-04-09 17:20:15,416) [ipw/ipw (line 1047)]: Chosen lambda: 0.041158338186664825
INFO (2026-04-09 17:20:15,417) [ipw/ipw (line 1065)]: Proportion null deviance explained 0.172637976731583
        Adjusted balance Sample object with target set using ipw
        1000 observations x 3 variables: gender,age_group,income,weight_pre_adjust,weight_adjusted_1
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
        adjustment details:
            method: ipw
            weight trimming mean ratio: 20
            design effect (Deff): 1.880
            effective sample size proportion (ESSP): 0.532
            effective sample size (ESS): 531.9
                
            target:
                 SampleFrame: 10000 observations x 3 covariates: gender,age_group,income
	  id_column: id, weight_columns_all: ['weight'], outcome_columns: happiness
            3 common variables: gender,age_group,income
            
In [13]:
# The original is still unadjusted:
print(f"bf.is_adjusted        = {bf.is_adjusted}")
print(f"adjusted.is_adjusted  = {adjusted.is_adjusted}")
bf.is_adjusted        = False
adjusted.is_adjusted  = True

Evaluation of the Results¶

In [14]:
print(adjusted.summary())
Adjustment details:
    method: ipw
    weight trimming mean ratio: 20
Covariate diagnostics:
    Covar ASMD reduction: 63.4%
    Covar ASMD (7 variables): 0.327 -> 0.120
    Covar mean KLD reduction: 92.3%
    Covar mean KLD (3 variables): 0.157 -> 0.012
Weight diagnostics:
    design effect (Deff): 1.880
    effective sample size proportion (ESSP): 0.532
    effective sample size (ESS): 531.9
Outcome weighted means:
            happiness
source               
self           53.295
target         56.278
unadjusted     48.559
Model performance: Model proportion deviance explained: 0.173
In [15]:
print(adjusted.covars().mean().T)
source                      self     target  unadjusted
_is_na_gender[T.True]   0.086776   0.089800    0.088000
age_group[T.25-34]      0.307355   0.297400    0.300000
age_group[T.35-44]      0.273609   0.299200    0.156000
age_group[T.45+]        0.137581   0.206300    0.053000
gender[Female]          0.406337   0.455100    0.268000
gender[Male]            0.506887   0.455100    0.644000
gender[_NA]             0.086776   0.089800    0.088000
income                 10.060068  12.737608    6.297302

We see an improvement in the average ASMD. Detailed per-variable ASMD:

In [16]:
print(adjusted.covars().asmd().T)
source                  self  unadjusted  unadjusted - self
age_group[T.25-34]  0.021777    0.005688          -0.016090
age_group[T.35-44]  0.055884    0.312711           0.256827
age_group[T.45+]    0.169816    0.378828           0.209013
gender[Female]      0.097916    0.375699           0.277783
gender[Male]        0.103989    0.379314           0.275324
gender[_NA]         0.010578    0.006296          -0.004282
income              0.205469    0.494217           0.288748
mean(asmd)          0.119597    0.326799           0.207202

Covariate plots¶

In [17]:
adjusted.covars().plot()
In [18]:
# Seaborn KDE density plots
adjusted.covars().plot(library="seaborn", dist_type="kde")
No description has been provided for this image

ASCII plots¶

Use library="balance" for a text-based comparison of unadjusted, adjusted, and target — useful in terminals or logging contexts.

In [19]:
adjusted.covars().plot(library="balance", bar_width=30);
=== gender (categorical) ===

Category | population  adjusted  sample
         |
Female   | █████████████████████ (50.0%)
         | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ (44.5%)
         | ▐▐▐▐▐▐▐▐▐▐▐▐ (29.4%)

Male     | █████████████████████ (50.0%)
         | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ (55.5%)
         | ▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐ (70.6%)

Legend: █ population  ▒ adjusted  ▐ sample
Bar lengths are proportional to weighted frequency within each dataset.

=== age_group (categorical) ===

Category | population  adjusted  sample
         |
18-24    | ████████████ (19.7%)
         | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ (28.1%)
         | ▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐ (49.1%)

25-34    | ██████████████████ (29.7%)
         | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ (30.7%)
         | ▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐ (30.0%)

35-44    | ██████████████████ (29.9%)
         | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ (27.4%)
         | ▐▐▐▐▐▐▐▐▐▐ (15.6%)

45+      | █████████████ (20.6%)
         | ▒▒▒▒▒▒▒▒ (13.8%)
         | ▐▐▐ (5.3%)

Legend: █ population  ▒ adjusted  ▐ sample
Bar lengths are proportional to weighted frequency within each dataset.

=== income (numeric, comparative) ===

Range            | population (%)            | adjusted (%)                | sample (%)                         
----------------------------------------------------------------------------------------------------------------
[0.00, 8.57)     | ████████████████████ 49.0 | ████████████████████▒▒ 54.8 | ████████████████████▒▒▒▒▒▒▒▒▒▒ 73.2
[8.57, 17.14)    | █████████ 23.1            | █████████▒▒ 26.3            | ████████] 19.2                     
[17.14, 25.71)   | █████ 13.2                | █████ 12.3                  | ██  ] 5.3                          
[25.71, 34.28)   | ███ 7.3                   | ██] 3.9                     | █ ] 1.6                            
[34.28, 42.85)   | ██ 3.9                    | █] 1.5                      |  ] 0.4                             
[42.85, 51.41)   | █ 1.8                     | ] 0.2                       | ] 0.1                              
[51.41, 59.98)   | 0.9                       | 1.0                         | 0.2                                
[59.98, 68.55)   | 0.4                       | 0.0                         | 0.0                                
[68.55, 77.12)   | 0.2                       | 0.0                         | 0.0                                
[77.12, 85.69)   | 0.1                       | 0.0                         | 0.0                                
[85.69, 94.26)   | 0.0                       | 0.0                         | 0.0                                
[94.26, 102.83)  | 0.0                       | 0.0                         | 0.0                                
[102.83, 111.40) | 0.0                       | 0.0                         | 0.0                                
[111.40, 119.97) | 0.0                       | 0.0                         | 0.0                                
[119.97, 128.54] | 0.0                       | 0.0                         | 0.0                                
----------------------------------------------------------------------------------------------------------------
Total            | 100.0                     | 100.0                       | 100.0                              

Key: █ = shared with population, ▒ = excess,    ] = deficit

Understanding the weights¶

In [20]:
adjusted.weights().plot()
No description has been provided for this image
In [21]:
print(adjusted.weights().summary().round(2))
                                var       val
0                     design_effect      1.88
1       effective_sample_proportion      0.53
2             effective_sample_size    531.92
3                               sum  10000.00
4                    describe_count   1000.00
5                     describe_mean      1.00
6                      describe_std      0.94
7                      describe_min      0.30
8                      describe_25%      0.45
9                      describe_50%      0.65
10                     describe_75%      1.17
11                     describe_max     11.36
12                    prop(w < 0.1)      0.00
13                    prop(w < 0.2)      0.00
14                  prop(w < 0.333)      0.11
15                    prop(w < 0.5)      0.32
16                      prop(w < 1)      0.67
17                     prop(w >= 1)      0.33
18                     prop(w >= 2)      0.10
19                     prop(w >= 3)      0.03
20                     prop(w >= 5)      0.01
21                    prop(w >= 10)      0.00
22               nonparametric_skew      0.37
23  weighted_median_breakdown_point      0.21

Design effect and effective sample size¶

The new API exposes design effect diagnostics through the weights view: adjusted.weights().design_effect() and adjusted.weights().design_effect_prop().

In [22]:
print(f"Design effect:           {adjusted.weights().design_effect():.4f}")
print(f"Effective sample size %: {adjusted.weights().design_effect_prop():.2%}")
Design effect:           1.8800
Effective sample size %: 88.00%

Outcome analysis¶

In [23]:
print(adjusted.outcomes().summary())
1 outcomes: ['happiness']
Mean outcomes (with 95% confidence intervals):
source       self  target  unadjusted           self_ci         target_ci     unadjusted_ci
happiness  53.295  56.278      48.559  (52.096, 54.495)  (55.961, 56.595)  (47.669, 49.449)

Weights impact on outcomes (t_test):
           mean_yw0  mean_yw1  mean_diff  diff_ci_lower  diff_ci_upper  t_stat  p_value       n
outcome                                                                                        
happiness    48.559    53.295      4.736          1.312          8.161   2.714    0.007  1000.0

Response rates (relative to number of respondents in sample):
   happiness
n     1000.0
%      100.0
Response rates (relative to notnull rows in the target):
    happiness
n     1000.0
%       10.0
Response rates (in the target):
    happiness
n    10000.0
%      100.0

In [24]:
adjusted.outcomes().plot()

Analytics via BalanceDF views¶

The new API accesses analytics through the .covars(), .weights(), and .outcomes() views rather than top-level convenience methods:

In [25]:
print("Covariate means (unadjusted / adjusted / target):")
print(adjusted.covars().mean().T)
Covariate means (unadjusted / adjusted / target):
source                      self     target  unadjusted
_is_na_gender[T.True]   0.086776   0.089800    0.088000
age_group[T.25-34]      0.307355   0.297400    0.300000
age_group[T.35-44]      0.273609   0.299200    0.156000
age_group[T.45+]        0.137581   0.206300    0.053000
gender[Female]          0.406337   0.455100    0.268000
gender[Male]            0.506887   0.455100    0.644000
gender[_NA]             0.086776   0.089800    0.088000
income                 10.060068  12.737608    6.297302
In [26]:
print("Outcome SD proportional change:")
print(adjusted.outcomes().outcome_sd_prop())
print()
print("Outcome variance ratio (adjusted / unadjusted):")
print(adjusted.outcomes().outcome_variance_ratio())
Outcome SD proportional change:
happiness    0.013516
dtype: float64

Outcome variance ratio (adjusted / unadjusted):
happiness    1.027215
dtype: float64

Other Adjustment Methods¶

BalanceFrame supports all the same methods as Sample:

  • "ipw" — inverse propensity weighting (default)
  • "cbps" — covariate balancing propensity score
  • "rake" — iterative proportional fitting (raking)
  • "poststratify" — post-stratification

Each returns a new BalanceFrame — the original stays unchanged.

In [27]:
adjusted_cbps = bf.adjust(method="cbps")
print(f"CBPS design effect: {adjusted_cbps.weights().design_effect():.4f}")
print(adjusted_cbps.covars().asmd(aggregate_by_main_covar=True).T)
INFO (2026-04-09 17:20:18,381) [cbps/cbps (line 538)]: Starting cbps function
INFO (2026-04-09 17:20:18,383) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-04-09 17:20:18,384) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group', 'income']
INFO (2026-04-09 17:20:18,392) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group', 'income']
INFO (2026-04-09 17:20:18,511) [cbps/cbps (line 589)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']
INFO (2026-04-09 17:20:18,513) [cbps/cbps (line 600)]: The number of columns in the model matrix: 16
INFO (2026-04-09 17:20:18,513) [cbps/cbps (line 601)]: The number of rows in the model matrix: 11000
INFO (2026-04-09 17:20:18,520) [cbps/cbps (line 670)]: Finding initial estimator for GMM optimization
INFO (2026-04-09 17:20:18,676) [cbps/cbps (line 697)]: Finding initial estimator for GMM optimization that minimizes the balance loss
INFO (2026-04-09 17:20:20,129) [cbps/cbps (line 733)]: Running GMM optimization
INFO (2026-04-09 17:20:21,701) [cbps/cbps (line 860)]: Done cbps function
CBPS design effect: 2.7543
source          self  unadjusted  unadjusted - self
age_group   0.064140    0.232409           0.168269
gender      0.044220    0.253769           0.209549
income      0.113018    0.494217           0.381199
mean(asmd)  0.073793    0.326799           0.253006

Compound / Sequential Adjustments¶

adjust() can be called multiple times — each call uses the previously adjusted weights as design weights, so adjustments compound. This is useful for multi-step workflows, e.g., IPW for broad correction followed by raking for fine-tuning on specific variables.

The original unadjusted baseline is always preserved:

  • _sf_sample_pre_adjust points to the original SampleFrame
  • _links["unadjusted"] points to the original BalanceFrame
  • asmd_improvement() shows total improvement across all steps
In [28]:
# Step 1: broad IPW correction across all covariates
adjusted_ipw = bf.adjust(method="ipw", max_de=2)

# Step 2: fine-tune with raking on gender and age_group
adjusted_final = adjusted_ipw.adjust(method="rake", variables=["gender", "age_group"])

print("After IPW only:")
print(adjusted_ipw.covars().asmd(aggregate_by_main_covar=True).T)
print("\nAfter IPW + rake on gender & age_group:")
print(adjusted_final.covars().asmd(aggregate_by_main_covar=True).T)
print(f"\nTotal ASMD improvement (vs original): {adjusted_final.covars().asmd_improvement():.2%}")

# The original BalanceFrame is unchanged (immutable pattern)
print(f"\nbf.is_adjusted = {bf.is_adjusted}")
print(f"adjusted_final.is_adjusted = {adjusted_final.is_adjusted}")
INFO (2026-04-09 17:20:21,858) [ipw/ipw (line 703)]: Starting ipw function
INFO (2026-04-09 17:20:21,860) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-04-09 17:20:21,861) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group', 'income']
INFO (2026-04-09 17:20:21,869) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group', 'income']
INFO (2026-04-09 17:20:21,877) [ipw/ipw (line 738)]: Building model matrix
INFO (2026-04-09 17:20:21,987) [ipw/ipw (line 764)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']
INFO (2026-04-09 17:20:21,988) [ipw/ipw (line 767)]: The number of columns in the model matrix: 16
INFO (2026-04-09 17:20:21,989) [ipw/ipw (line 768)]: The number of rows in the model matrix: 11000
INFO (2026-04-09 17:20:23,569) [ipw/ipw (line 990)]: Done with sklearn
INFO (2026-04-09 17:20:23,570) [ipw/ipw (line 992)]: max_de: 2
INFO (2026-04-09 17:20:23,571) [ipw/choose_regularization (line 368)]: Starting choosing regularisation parameters
INFO (2026-04-09 17:20:32,462) [ipw/choose_regularization (line 454)]: Best regularisation: 
           s  s_index  trim  design_effect  asmd_improvement     asmd
9  0.009726      125   5.0       1.998665          0.711052  0.05646
INFO (2026-04-09 17:20:32,465) [ipw/ipw (line 1047)]: Chosen lambda: 0.009726392859944848
INFO (2026-04-09 17:20:32,466) [ipw/ipw (line 1065)]: Proportion null deviance explained 0.18189302029172694
INFO (2026-04-09 17:20:32,475) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-04-09 17:20:32,476) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group']
INFO (2026-04-09 17:20:32,481) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group']
INFO (2026-04-09 17:20:32,487) [rake/rake (line 279)]: Final covariates and levels that will be used in raking: {'age_group': ['18-24', '25-34', '35-44', '45+'], 'gender': ['Female', 'Male', '__NaN__']}.
After IPW only:
source          self  unadjusted  unadjusted - self
age_group   0.071234    0.232409           0.161175
gender      0.059885    0.253769           0.193884
income      0.189626    0.494217           0.304591
mean(asmd)  0.106915    0.326799           0.219883

After IPW + rake on gender & age_group:
source          self  unadjusted  unadjusted - self
age_group   0.196823    0.232409           0.035586
gender      0.204462    0.253769           0.049307
income      0.494080    0.494217           0.000138
mean(asmd)  0.298455    0.326799           0.028344

Total ASMD improvement (vs original): 8.67%

bf.is_adjusted = False
adjusted_final.is_adjusted = True

Transformations¶

Transformations (one-hot encoding, interaction terms, etc.) are applied automatically during adjust(). You can also pass custom transformations via kwargs. To inspect the transformed design matrix, use model_matrix().

In [29]:
# Inspect the model matrix (transformed covariates) used during adjustment
print("Transformed covariates columns:")
print(adjusted.model_matrix().columns.tolist())
Transformed covariates columns:
['_is_na_gender[T.True]', 'age_group[T.25-34]', 'age_group[T.35-44]', 'age_group[T.45+]', 'gender[Female]', 'gender[Male]', 'gender[_NA]', 'income']

Diagnostics¶

diagnostics() returns a DataFrame with bias metrics per covariate.

In [30]:
print(adjusted.diagnostics().to_string())
INFO (2026-04-09 17:20:32,927) [balance_frame/diagnostics (line 1412)]: Starting computation of diagnostics of the fitting
INFO (2026-04-09 17:20:33,234) [balance_frame/diagnostics (line 1438)]: Done computing diagnostics
                                      metric           val                                                             var
0                                       size   1000.000000                                                      sample_obs
1                                       size      3.000000                                                   sample_covars
2                                       size  10000.000000                                                      target_obs
3                                       size      3.000000                                                   target_covars
4                        weights_diagnostics      1.879989                                                   design_effect
5                        weights_diagnostics      0.531918                                     effective_sample_proportion
6                        weights_diagnostics    531.918146                                           effective_sample_size
7                        weights_diagnostics  10000.000000                                                             sum
8                        weights_diagnostics   1000.000000                                                  describe_count
9                        weights_diagnostics      1.000000                                                   describe_mean
10                       weights_diagnostics      0.938546                                                    describe_std
11                       weights_diagnostics      0.304163                                                    describe_min
12                       weights_diagnostics      0.445457                                                    describe_25%
13                       weights_diagnostics      0.653173                                                    describe_50%
14                       weights_diagnostics      1.166355                                                    describe_75%
15                       weights_diagnostics     11.355142                                                    describe_max
16                       weights_diagnostics      0.000000                                                   prop(w < 0.1)
17                       weights_diagnostics      0.000000                                                   prop(w < 0.2)
18                       weights_diagnostics      0.106000                                                 prop(w < 0.333)
19                       weights_diagnostics      0.323000                                                   prop(w < 0.5)
20                       weights_diagnostics      0.668000                                                     prop(w < 1)
21                       weights_diagnostics      0.332000                                                    prop(w >= 1)
22                       weights_diagnostics      0.096000                                                    prop(w >= 2)
23                       weights_diagnostics      0.030000                                                    prop(w >= 3)
24                       weights_diagnostics      0.011000                                                    prop(w >= 5)
25                       weights_diagnostics      0.001000                                                   prop(w >= 10)
26                       weights_diagnostics      0.369537                                              nonparametric_skew
27                       weights_diagnostics      0.214000                                 weighted_median_breakdown_point
28        weights_impact_on_outcome_mean_yw0     48.558814                                                       happiness
29        weights_impact_on_outcome_mean_yw1     53.295272                                                       happiness
30       weights_impact_on_outcome_mean_diff      4.736458                                                       happiness
31   weights_impact_on_outcome_diff_ci_lower      1.312255                                                       happiness
32   weights_impact_on_outcome_diff_ci_upper      8.160661                                                       happiness
33          weights_impact_on_outcome_t_stat      2.714368                                                       happiness
34         weights_impact_on_outcome_p_value      0.006755                                                       happiness
35               weights_impact_on_outcome_n   1000.000000                                                       happiness
36                         adjustment_method      0.000000                                                             ipw
37                          ipw_model_glance      9.000000                                                         n_iter_
38                          ipw_model_glance      0.138619                                                      intercept_
39                               ipw_penalty      0.000000                                                      deprecated
40                                ipw_solver      0.000000                                                           lbfgs
41                              model_glance      0.000100                                                             tol
42                              model_glance      0.000000                                                        l1_ratio
43                           ipw_multi_class      0.000000                                                            auto
44                              model_glance      0.041158                                                          lambda
45                              model_glance      1.386294                                                   null_deviance
46                              model_glance      1.146967                                                        deviance
47                              model_glance      0.172638                                              prop_dev_explained
48                              model_glance      1.155558                                                     cv_dev_mean
49                              model_glance      0.002568                                                      lambda_min
50                              model_glance      1.141446                                                 min_cv_dev_mean
51                              model_glance      0.014287                                                   min_cv_dev_sd
52                                model_coef      0.138619                                                       intercept
53                                model_coef      0.043944                                           _is_na_gender[T.True]
54                                model_coef     -0.203732                                              age_group[T.25-34]
55                                model_coef     -0.428683                                              age_group[T.35-44]
56                                model_coef     -0.529556                                                age_group[T.45+]
57                                model_coef      0.332490                                                  gender[T.Male]
58                                model_coef      0.043944                                                   gender[T._NA]
59                                model_coef      0.169578  income[Interval(-0.0009997440000000001, 0.44, closed='right')]
60                                model_coef      0.154197                   income[Interval(0.44, 1.664, closed='right')]
61                                model_coef      0.111212                  income[Interval(1.664, 3.472, closed='right')]
62                                model_coef     -0.041457                income[Interval(11.312, 15.139, closed='right')]
63                                model_coef     -0.161148                income[Interval(15.139, 20.567, closed='right')]
64                                model_coef     -0.211197                income[Interval(20.567, 29.504, closed='right')]
65                                model_coef     -0.357491               income[Interval(29.504, 128.536, closed='right')]
66                                model_coef      0.093738                  income[Interval(3.472, 5.663, closed='right')]
67                                model_coef      0.072936                  income[Interval(5.663, 8.211, closed='right')]
68                                model_coef      0.005787                 income[Interval(8.211, 11.312, closed='right')]
69                       covar_asmd_adjusted      0.021777                                              age_group[T.25-34]
70                       covar_asmd_adjusted      0.055884                                              age_group[T.35-44]
71                       covar_asmd_adjusted      0.169816                                                age_group[T.45+]
72                       covar_asmd_adjusted      0.097916                                                  gender[Female]
73                       covar_asmd_adjusted      0.103989                                                    gender[Male]
74                       covar_asmd_adjusted      0.010578                                                     gender[_NA]
75                       covar_asmd_adjusted      0.205469                                                          income
76                       covar_asmd_adjusted      0.119597                                                      mean(asmd)
77                     covar_asmd_unadjusted      0.005688                                              age_group[T.25-34]
78                     covar_asmd_unadjusted      0.312711                                              age_group[T.35-44]
79                     covar_asmd_unadjusted      0.378828                                                age_group[T.45+]
80                     covar_asmd_unadjusted      0.375699                                                  gender[Female]
81                     covar_asmd_unadjusted      0.379314                                                    gender[Male]
82                     covar_asmd_unadjusted      0.006296                                                     gender[_NA]
83                     covar_asmd_unadjusted      0.494217                                                          income
84                     covar_asmd_unadjusted      0.326799                                                      mean(asmd)
85                    covar_asmd_improvement     -0.016090                                              age_group[T.25-34]
86                    covar_asmd_improvement      0.256827                                              age_group[T.35-44]
87                    covar_asmd_improvement      0.209013                                                age_group[T.45+]
88                    covar_asmd_improvement      0.277783                                                  gender[Female]
89                    covar_asmd_improvement      0.275324                                                    gender[Male]
90                    covar_asmd_improvement     -0.004282                                                     gender[_NA]
91                    covar_asmd_improvement      0.288748                                                          income
92                    covar_asmd_improvement      0.207202                                                      mean(asmd)
93                  covar_main_asmd_adjusted      0.082492                                                       age_group
94                covar_main_asmd_unadjusted      0.232409                                                       age_group
95               covar_main_asmd_improvement      0.149917                                                       age_group
96                  covar_main_asmd_adjusted      0.070828                                                          gender
97                covar_main_asmd_unadjusted      0.253769                                                          gender
98               covar_main_asmd_improvement      0.182942                                                          gender
99                  covar_main_asmd_adjusted      0.205469                                                          income
100               covar_main_asmd_unadjusted      0.494217                                                          income
101              covar_main_asmd_improvement      0.288748                                                          income
102                 covar_main_asmd_adjusted      0.119597                                                      mean(asmd)
103               covar_main_asmd_unadjusted      0.326799                                                      mean(asmd)
104              covar_main_asmd_improvement      0.207202                                                      mean(asmd)
105                       adjustment_failure      0.000000                                                             NaN

Exporting results¶

The .df property returns the responder DataFrame with id, covariates, outcomes, weights, and any ignored columns. Use .to_csv() to export the adjusted data.

In [31]:
print("Adjusted DataFrame columns:", adjusted.df.columns.tolist())
print(adjusted.df.head())
Adjusted DataFrame columns: ['id', 'gender', 'age_group', 'income', 'happiness', 'weight']
  id  gender age_group     income  happiness    weight
0  0    Male     25-34   6.428659  26.043029  6.531728
1  1  Female     18-24   9.940280  66.885485  9.617159
2  2    Male     18-24   2.673623  37.091922  3.562973
3  3     NaN     18-24  10.550308  49.394050  6.952117
4  4     NaN     18-24   2.689994  72.304208  5.129230
In [32]:
# Export to CSV (showing first 500 characters)
print(adjusted.to_csv()[:500])
id,gender,age_group,income,happiness,weight
0,Male,25-34,6.428659499046228,26.043028759747298,6.531727983159214
1,Female,18-24,9.940280228116047,66.88548460632677,9.617159404461365
2,Male,18-24,2.6736231547518043,37.091921916683006,3.562973405562926
3,,18-24,10.550307519418066,49.39405003271002,6.952116676608549
4,,18-24,2.689993854299385,72.30420755038209,5.1292302114666075
5,,35-44,5.995497722733131,57.28281646341816,16.424761754946537
6,,18-24,12.63469573898972,31.663293445944596,8.1911333259
In [33]:
adjusted.to_download()
Out[33]:
Click here to download: /tmp/tmp_balance_out_6c08992f-85de-4c59-8c87-c816fb678759.csv

Filtering rows/columns¶

keep_only_some_rows_columns() returns a new BalanceFrame with filtered data — the original remains unchanged (immutable pattern).

In [34]:
filtered = adjusted.keep_only_some_rows_columns(
    rows_to_keep="gender == 'Female'",
    columns_to_keep=["gender", "age", "income"],
)
print(f"Original rows: {len(adjusted.responders)}")
print(f"Filtered rows: {len(filtered.responders)}")
print(filtered.df.head())
INFO (2026-04-09 17:20:33,282) [balance_frame/_filter_sf (line 1645)]: (rows_filtered/total_rows) = (268/1000)
INFO (2026-04-09 17:20:33,285) [balance_frame/_filter_sf (line 1645)]: (rows_filtered/total_rows) = (4551/10000)
INFO (2026-04-09 17:20:33,288) [balance_frame/_filter_sf (line 1645)]: (rows_filtered/total_rows) = (268/1000)
INFO (2026-04-09 17:20:33,291) [balance_frame/_filter_sf (line 1645)]: (rows_filtered/total_rows) = (268/268)
INFO (2026-04-09 17:20:33,293) [balance_frame/_filter_sf (line 1645)]: (rows_filtered/total_rows) = (4551/4551)
INFO (2026-04-09 17:20:33,295) [balance_frame/_filter_sf (line 1645)]: (rows_filtered/total_rows) = (268/268)
Original rows: 1000
Filtered rows: 268
   id  gender     income  happiness     weight
0   1  Female   9.940280  66.885485   9.617159
1  92  Female   0.185097  84.464522  17.392266
2  94  Female   1.183696  65.742184  17.794007
3  95  Female   3.716007  67.624539   7.283279
4  98  Female  16.751931  44.868651  48.725241

Summary: Old vs New API side-by-side¶

Step Old API (Sample) New API (SampleFrame / BalanceFrame)
Load data s = Sample.from_frame(df) sf = SampleFrame.from_frame(df)
Pair sample + target s.set_target(target) bf = BalanceFrame(sample=sf, target=tf)
Adjust adjusted = s.adjust() (mutates s) adjusted = bf.adjust() (bf unchanged)
Summary adjusted.summary() adjusted.summary()
Diagnostics adjusted.diagnostics() adjusted.diagnostics()
Covariates adjusted.covars().mean() adjusted.covars().mean()
Design effect adjusted.design_effect() adjusted.weights().design_effect()
CSV export adjusted.to_csv() adjusted.to_csv()
Filter (not available) adjusted.keep_only_some_rows_columns(...)