%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

from balance import load_data

target_df, sample_df = load_data()

print("target_df: \n", target_df.head())
print("sample_df: \n", sample_df.head())

INFO (2026-07-07 14:19:29,544) [__init__/<module> (line 76)]: Using balance version 0.21.0

INFO (2026-07-07 14:19:29,545) [__init__/<module> (line 81)]: 
balance (Version 0.21.0) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

target_df: 
        id gender age_group     income  happiness
0  100000   Male       45+  10.183951  61.706333
1  100001   Male       45+   6.036858  79.123670
2  100002   Male     35-44   5.226629  44.206949
3  100003    NaN       45+   5.752147  83.985716
4  100004    NaN     25-34   4.837484  49.339713
sample_df: 
   id  gender age_group     income  happiness
0  0    Male     25-34   6.428659  26.043029
1  1  Female     18-24   9.940280  66.885485
2  2    Male     18-24   2.673623  37.091922
3  3     NaN     18-24  10.550308  49.394050
4  4     NaN     18-24   2.689994  72.304208

from balance import SampleFrame, BalanceFrame

sample_sf = SampleFrame.from_frame(
    sample_df,
    outcome_columns=["happiness"],
)
# Often times we don't have the outcome for the target.
# In this case we've added it just to validate later that
# the weights indeed help us reduce the bias.
target_sf = SampleFrame.from_frame(
    target_df,
    outcome_columns=["happiness"],
)

WARNING (2026-07-07 14:19:29,600) [input_validation/guess_id_column (line 336)]: Guessed id column name id for the data

WARNING (2026-07-07 14:19:29,611) [sample_frame/from_frame (line 326)]: No weights passed. Adding a 'weight' column and setting all values to 1

WARNING (2026-07-07 14:19:29,613) [input_validation/guess_id_column (line 336)]: Guessed id column name id for the data

WARNING (2026-07-07 14:19:29,627) [sample_frame/from_frame (line 326)]: No weights passed. Adding a 'weight' column and setting all values to 1

print(f"Covariates:  {sample_sf.covar_columns}")
print(f"Outcomes:    {sample_sf.outcome_columns}")
print(f"Weight cols: {sample_sf.weight_columns_all}")
print(f"Active wt:   {sample_sf.weight_column}")
print(f"ID column:   {sample_sf._id_column_name}")
print(f"Rows:        {len(sample_sf)}")

Covariates:  ['gender', 'age_group', 'income']
Outcomes:    ['happiness']
Weight cols: ['weight']
Active wt:   weight
ID column:   id
Rows:        1000

sample_sf.df.info()

<class 'pandas.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   id         1000 non-null   str    
 1   gender     912 non-null    str    
 2   age_group  1000 non-null   str    
 3   income     1000 non-null   float64
 4   happiness  1000 non-null   float64
 5   weight     1000 non-null   float64
dtypes: float64(3), str(3)
memory usage: 47.0 KB

print(sample_sf)
print(target_sf)

SampleFrame: 1000 observations x 3 covariates: gender,age_group,income
  id_column: id, weight_columns_all: ['weight'], outcome_columns: happiness
SampleFrame: 10000 observations x 3 covariates: gender,age_group,income
  id_column: id, weight_columns_all: ['weight'], outcome_columns: happiness

bf = BalanceFrame(sample=sample_sf, target=target_sf)
print(bf)

        balance Sample object with target set
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
            target:
                 SampleFrame: 10000 observations x 3 covariates: gender,age_group,income
	  id_column: id, weight_columns_all: ['weight'], outcome_columns: happiness
            3 common variables: gender,age_group,income

print(bf.covars().mean().T)

source                     self     target
_is_na_gender[T.True]  0.088000   0.089800
age_group[T.25-34]     0.300000   0.297400
age_group[T.35-44]     0.156000   0.299200
age_group[T.45+]       0.053000   0.206300
gender[Female]         0.268000   0.455100
gender[Male]           0.644000   0.455100
gender[_NA]            0.088000   0.089800
income                 6.297302  12.737608

print(bf.covars().asmd().T)

source                  self
age_group[T.25-34]  0.005688
age_group[T.35-44]  0.312711
age_group[T.45+]    0.378828
gender[Female]      0.375699
gender[Male]        0.379314
gender[_NA]         0.006296
income              0.494217
mean(asmd)          0.326799

print(bf.covars().asmd(aggregate_by_main_covar=True).T)

source          self
age_group   0.232409
gender      0.253769
income      0.494217
mean(asmd)  0.326799

bf.covars().plot()

adjusted = bf.adjust()
print(adjusted)

INFO (2026-07-07 14:19:33,203) [ipw/ipw (line 735)]: Starting ipw function

INFO (2026-07-07 14:19:33,209) [adjustment/apply_transformations (line 435)]: Adding the variables: []

INFO (2026-07-07 14:19:33,210) [adjustment/apply_transformations (line 436)]: Transforming the variables: ['gender', 'age_group', 'income']

INFO (2026-07-07 14:19:33,221) [adjustment/apply_transformations (line 472)]: Final variables in output: ['gender', 'age_group', 'income']

INFO (2026-07-07 14:19:33,341) [ipw/ipw (line 813)]: Building model matrix

INFO (2026-07-07 14:19:33,342) [ipw/ipw (line 814)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']

INFO (2026-07-07 14:19:33,342) [ipw/ipw (line 815)]: The number of columns in the model matrix: 16

INFO (2026-07-07 14:19:33,343) [ipw/ipw (line 816)]: The number of rows in the model matrix: 11000

INFO (2026-07-07 14:19:53,147) [ipw/ipw (line 1012)]: Done with sklearn

INFO (2026-07-07 14:19:53,150) [ipw/ipw (line 1014)]: max_de: None

# The original is still unadjusted:
print(f"bf.is_adjusted        = {bf.is_adjusted}")
print(f"adjusted.is_adjusted  = {adjusted.is_adjusted}")

bf.is_adjusted        = False
adjusted.is_adjusted  = True

import numpy as np

fitted = bf.fit(method="ipw", inplace=False)
x_sample, x_target = fitted.design_matrix(on="both")
p_target = fitted.predict_proba(on="target", output="probability")
w_reproduced = fitted.predict_weights()

print("Sample matrix shape:", x_sample.shape)
print("Target matrix shape:", x_target.shape)
print("Target propensity rows:", p_target.shape[0])
print(
    "Reproduced weights match fitted weights:",
    np.allclose(w_reproduced.to_numpy(), fitted.weight_series.to_numpy()),
)

INFO (2026-07-07 14:19:53,198) [ipw/ipw (line 735)]: Starting ipw function

INFO (2026-07-07 14:19:53,204) [adjustment/apply_transformations (line 435)]: Adding the variables: []

INFO (2026-07-07 14:19:53,206) [adjustment/apply_transformations (line 436)]: Transforming the variables: ['gender', 'age_group', 'income']

INFO (2026-07-07 14:19:53,217) [adjustment/apply_transformations (line 472)]: Final variables in output: ['gender', 'age_group', 'income']

INFO (2026-07-07 14:19:53,348) [ipw/ipw (line 813)]: Building model matrix

INFO (2026-07-07 14:19:53,349) [ipw/ipw (line 814)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']

INFO (2026-07-07 14:19:53,349) [ipw/ipw (line 815)]: The number of columns in the model matrix: 16

INFO (2026-07-07 14:19:53,350) [ipw/ipw (line 816)]: The number of rows in the model matrix: 11000

INFO (2026-07-07 14:20:13,221) [ipw/ipw (line 1012)]: Done with sklearn

INFO (2026-07-07 14:20:13,221) [ipw/ipw (line 1014)]: max_de: None

train_n = 700

train_sample_sf = SampleFrame.from_frame(sample_df.iloc[:train_n].copy())
holdout_sample_sf = SampleFrame.from_frame(sample_df.iloc[train_n:].copy())
train_target_sf = SampleFrame.from_frame(target_df.iloc[:7000].copy())
holdout_target_sf = SampleFrame.from_frame(target_df.iloc[7000:].copy())

train_bf = BalanceFrame(sample=train_sample_sf, target=train_target_sf)
holdout_bf = BalanceFrame(sample=holdout_sample_sf, target=holdout_target_sf)

fitted_train = train_bf.fit(method="ipw", inplace=False)
scored_holdout = holdout_bf.set_fitted_model(fitted_train)

holdout_matrix = scored_holdout.design_matrix(on="sample")
holdout_propensity = scored_holdout.predict_proba(on="sample", output="probability")
holdout_weights = scored_holdout.predict_weights()

print("Holdout matrix shape:", holdout_matrix.shape)
print("Holdout propensity rows:", holdout_propensity.shape[0])
print("Holdout weights rows:", holdout_weights.shape[0])

WARNING (2026-07-07 14:20:13,253) [input_validation/guess_id_column (line 336)]: Guessed id column name id for the data

WARNING (2026-07-07 14:20:13,268) [sample_frame/from_frame (line 326)]: No weights passed. Adding a 'weight' column and setting all values to 1

WARNING (2026-07-07 14:20:13,271) [input_validation/guess_id_column (line 336)]: Guessed id column name id for the data

WARNING (2026-07-07 14:20:13,285) [sample_frame/from_frame (line 326)]: No weights passed. Adding a 'weight' column and setting all values to 1

WARNING (2026-07-07 14:20:13,287) [input_validation/guess_id_column (line 336)]: Guessed id column name id for the data

WARNING (2026-07-07 14:20:13,306) [sample_frame/from_frame (line 326)]: No weights passed. Adding a 'weight' column and setting all values to 1

WARNING (2026-07-07 14:20:13,310) [input_validation/guess_id_column (line 336)]: Guessed id column name id for the data

WARNING (2026-07-07 14:20:13,326) [sample_frame/from_frame (line 326)]: No weights passed. Adding a 'weight' column and setting all values to 1

INFO (2026-07-07 14:20:13,334) [ipw/ipw (line 735)]: Starting ipw function

INFO (2026-07-07 14:20:13,338) [adjustment/apply_transformations (line 435)]: Adding the variables: []

print(adjusted.summary())

Adjustment details:
    method: ipw
    weight trimming mean ratio: 20
Covariate diagnostics:
    Covar ASMD reduction: 63.4%
    Covar ASMD (7 variables): 0.327 -> 0.120
    Covar mean KLD reduction: 92.3%
    Covar mean KLD (3 variables): 0.157 -> 0.012
Weight diagnostics:
    design effect (Deff): 1.880
    effective sample size proportion (ESSP): 0.532
    effective sample size (ESS): 531.9
Outcome weighted means:
            happiness
source               
self           53.295
target         56.278
unadjusted     48.559
Model performance: Model proportion deviance explained: 0.173

print(adjusted.covars().mean().T)

source                      self     target  unadjusted
_is_na_gender[T.True]   0.086776   0.089800    0.088000
age_group[T.25-34]      0.307355   0.297400    0.300000
age_group[T.35-44]      0.273609   0.299200    0.156000
age_group[T.45+]        0.137581   0.206300    0.053000
gender[Female]          0.406337   0.455100    0.268000
gender[Male]            0.506887   0.455100    0.644000
gender[_NA]             0.086776   0.089800    0.088000
income                 10.060068  12.737608    6.297302

print(adjusted.covars().asmd().T)

source                  self  unadjusted  unadjusted - self
age_group[T.25-34]  0.021777    0.005688          -0.016090
age_group[T.35-44]  0.055884    0.312711           0.256827
age_group[T.45+]    0.169816    0.378828           0.209013
gender[Female]      0.097916    0.375699           0.277783
gender[Male]        0.103989    0.379314           0.275324
gender[_NA]         0.010578    0.006296          -0.004282
income              0.205469    0.494217           0.288748
mean(asmd)          0.119597    0.326799           0.207202

adjusted.covars().plot()

# Seaborn KDE density plots
adjusted.covars().plot(library="seaborn", dist_type="kde")

adjusted.covars().plot(library="balance", bar_width=30);

=== gender (categorical) ===

Category | population  adjusted  sample
         |
Female   | █████████████████████ (50.0%)
         | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ (44.5%)
         | ▐▐▐▐▐▐▐▐▐▐▐▐ (29.4%)

Male     | █████████████████████ (50.0%)
         | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ (55.5%)
         | ▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐ (70.6%)

Legend: █ population  ▒ adjusted  ▐ sample
Bar lengths are proportional to weighted frequency within each dataset.

=== age_group (categorical) ===

Category | population  adjusted  sample
         |
18-24    | ████████████ (19.7%)
         | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ (28.1%)
         | ▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐ (49.1%)

25-34    | ██████████████████ (29.7%)
         | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ (30.7%)
         | ▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐▐ (30.0%)

35-44    | ██████████████████ (29.9%)
         | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ (27.4%)
         | ▐▐▐▐▐▐▐▐▐▐ (15.6%)

45+      | █████████████ (20.6%)
         | ▒▒▒▒▒▒▒▒ (13.8%)
         | ▐▐▐ (5.3%)

Legend: █ population  ▒ adjusted  ▐ sample
Bar lengths are proportional to weighted frequency within each dataset.

=== income (numeric, comparative) ===

Range            | population (%)            | adjusted (%)                | sample (%)                         
----------------------------------------------------------------------------------------------------------------
[0.00, 8.57)     | ████████████████████ 49.0 | ████████████████████▒▒ 54.8 | ████████████████████▒▒▒▒▒▒▒▒▒▒ 73.2
[8.57, 17.14)    | █████████ 23.1            | █████████▒▒ 26.3            | ████████] 19.2                     
[17.14, 25.71)   | █████ 13.2                | █████ 12.3                  | ██  ] 5.3                          
[25.71, 34.28)   | ███ 7.3                   | ██] 3.9                     | █ ] 1.6                            
[34.28, 42.85)   | ██ 3.9                    | █] 1.5                      |  ] 0.4                             
[42.85, 51.41)   | █ 1.8                     | ] 0.2                       | ] 0.1                              
[51.41, 59.98)   | 0.9                       | 1.0                         | 0.2                                
[59.98, 68.55)   | 0.4                       | 0.0                         | 0.0                                
[68.55, 77.12)   | 0.2                       | 0.0                         | 0.0                                
[77.12, 85.69)   | 0.1                       | 0.0                         | 0.0                                
[85.69, 94.26)   | 0.0                       | 0.0                         | 0.0                                
[94.26, 102.83)  | 0.0                       | 0.0                         | 0.0                                
[102.83, 111.40) | 0.0                       | 0.0                         | 0.0                                
[111.40, 119.97) | 0.0                       | 0.0                         | 0.0                                
[119.97, 128.54] | 0.0                       | 0.0                         | 0.0                                
----------------------------------------------------------------------------------------------------------------
Total            | 100.0                     | 100.0                       | 100.0                              

Key: █ = shared with population, ▒ = excess,    ] = deficit

adjusted.weights().plot()

print(adjusted.weights().summary().round(2))

                                var       val
0                     design_effect      1.88
1       effective_sample_proportion      0.53
2             effective_sample_size    531.92
3                               sum  10000.00
4                    describe_count   1000.00
5                     describe_mean      1.00
6                      describe_std      0.94
7                      describe_min      0.30
8                      describe_25%      0.45
9                      describe_50%      0.65
10                     describe_75%      1.17
11                     describe_max     11.36
12                    prop(w < 0.1)      0.00
13                    prop(w < 0.2)      0.00
14                  prop(w < 0.333)      0.11
15                    prop(w < 0.5)      0.32
16                      prop(w < 1)      0.67
17                     prop(w >= 1)      0.33
18                     prop(w >= 2)      0.10
19                     prop(w >= 3)      0.03
20                     prop(w >= 5)      0.01
21                    prop(w >= 10)      0.00
22               nonparametric_skew      0.37
23  weighted_median_breakdown_point      0.21

print(f"Design effect:           {adjusted.weights().design_effect():.4f}")
print(f"Effective sample size %: {adjusted.weights().design_effect_prop():.2%}")

Design effect:           1.8800
Effective sample size %: 88.00%

print(adjusted.outcomes().summary())

1 outcomes: ['happiness']
Mean outcomes (with 95% confidence intervals):
source       self  target  unadjusted           self_ci         target_ci     unadjusted_ci
happiness  53.295  56.278      48.559  (52.096, 54.495)  (55.961, 56.595)  (47.669, 49.449)

Weights impact on outcomes (t_test):
           mean_yw0  mean_yw1  mean_diff  diff_ci_lower  diff_ci_upper  t_stat  p_value       n
outcome                                                                                        
happiness    48.559    53.295      4.736          1.312          8.161   2.714    0.007  1000.0

Response rates (relative to number of respondents in sample):
   happiness
n     1000.0
%      100.0
Response rates (relative to notnull rows in the target):
    happiness
n     1000.0
%       10.0
Response rates (in the target):
    happiness
n    10000.0
%      100.0

adjusted.outcomes().plot()

print("Covariate means (unadjusted / adjusted / target):")
print(adjusted.covars().mean().T)

Covariate means (unadjusted / adjusted / target):

source                      self     target  unadjusted
_is_na_gender[T.True]   0.086776   0.089800    0.088000
age_group[T.25-34]      0.307355   0.297400    0.300000
age_group[T.35-44]      0.273609   0.299200    0.156000
age_group[T.45+]        0.137581   0.206300    0.053000
gender[Female]          0.406337   0.455100    0.268000
gender[Male]            0.506887   0.455100    0.644000
gender[_NA]             0.086776   0.089800    0.088000
income                 10.060068  12.737608    6.297302

print("Outcome SD proportional change:")
print(adjusted.outcomes().outcome_sd_prop())
print()
print("Outcome variance ratio (adjusted / unadjusted):")
print(adjusted.outcomes().outcome_variance_ratio())

Outcome SD proportional change:
happiness    0.013516
dtype: float64

Outcome variance ratio (adjusted / unadjusted):
happiness    1.027215
dtype: float64

adjusted_cbps = bf.adjust(method="cbps")
print(f"CBPS design effect: {adjusted_cbps.weights().design_effect():.4f}")
print(adjusted_cbps.covars().asmd(aggregate_by_main_covar=True).T)

INFO (2026-07-07 14:20:35,767) [cbps/cbps (line 535)]: Starting cbps function

INFO (2026-07-07 14:20:35,769) [adjustment/apply_transformations (line 435)]: Adding the variables: []

INFO (2026-07-07 14:20:35,770) [adjustment/apply_transformations (line 436)]: Transforming the variables: ['gender', 'age_group', 'income']

INFO (2026-07-07 14:20:35,778) [adjustment/apply_transformations (line 472)]: Final variables in output: ['gender', 'age_group', 'income']

INFO (2026-07-07 14:20:35,899) [cbps/cbps (line 587)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']

INFO (2026-07-07 14:20:35,901) [cbps/cbps (line 598)]: The number of columns in the model matrix: 16

INFO (2026-07-07 14:20:35,901) [cbps/cbps (line 599)]: The number of rows in the model matrix: 11000

INFO (2026-07-07 14:20:35,913) [cbps/cbps (line 667)]: Finding initial estimator for GMM optimization

INFO (2026-07-07 14:20:36,078) [cbps/cbps (line 694)]: Finding initial estimator for GMM optimization that minimizes the balance loss

INFO (2026-07-07 14:20:37,606) [cbps/cbps (line 730)]: Running GMM optimization

# Step 1: broad IPW correction across all covariates
adjusted_ipw = bf.adjust(method="ipw", max_de=2)

# Step 2: fine-tune with raking on gender and age_group
adjusted_final = adjusted_ipw.adjust(method="rake", variables=["gender", "age_group"])

print("After IPW only:")
print(adjusted_ipw.covars().asmd(aggregate_by_main_covar=True).T)
print("\nAfter IPW + rake on gender & age_group:")
print(adjusted_final.covars().asmd(aggregate_by_main_covar=True).T)
print(f"\nTotal ASMD improvement (vs original): {adjusted_final.covars().asmd_improvement():.2%}")

# The original BalanceFrame is unchanged (immutable pattern)
print(f"\nbf.is_adjusted = {bf.is_adjusted}")
print(f"adjusted_final.is_adjusted = {adjusted_final.is_adjusted}")

INFO (2026-07-07 14:20:39,485) [ipw/ipw (line 735)]: Starting ipw function

INFO (2026-07-07 14:20:39,490) [adjustment/apply_transformations (line 435)]: Adding the variables: []

INFO (2026-07-07 14:20:39,491) [adjustment/apply_transformations (line 436)]: Transforming the variables: ['gender', 'age_group', 'income']

INFO (2026-07-07 14:20:39,499) [adjustment/apply_transformations (line 472)]: Final variables in output: ['gender', 'age_group', 'income']

INFO (2026-07-07 14:20:39,618) [ipw/ipw (line 813)]: Building model matrix

INFO (2026-07-07 14:20:39,618) [ipw/ipw (line 814)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']

INFO (2026-07-07 14:20:39,619) [ipw/ipw (line 815)]: The number of columns in the model matrix: 16

INFO (2026-07-07 14:20:39,620) [ipw/ipw (line 816)]: The number of rows in the model matrix: 11000

INFO (2026-07-07 14:20:41,446) [ipw/ipw (line 1012)]: Done with sklearn

INFO (2026-07-07 14:20:41,447) [ipw/ipw (line 1014)]: max_de: 2

# Inspect the model matrix (transformed covariates) used during adjustment
print("Transformed covariates columns:")
print(adjusted.model_matrix().columns.tolist())

Transformed covariates columns:
['_is_na_gender[T.True]', 'age_group[T.25-34]', 'age_group[T.35-44]', 'age_group[T.45+]', 'gender[Female]', 'gender[Male]', 'gender[_NA]', 'income']

print(adjusted.diagnostics().to_string())

INFO (2026-07-07 14:20:50,701) [balance_frame/diagnostics (line 3177)]: Starting computation of diagnostics of the fitting

INFO (2026-07-07 14:20:51,006) [balance_frame/diagnostics (line 3204)]: Done computing diagnostics

                                      metric           val                                                             var
0                                       size   1000.000000                                                      sample_obs
1                                       size      3.000000                                                   sample_covars
2                                       size  10000.000000                                                      target_obs
3                                       size      3.000000                                                   target_covars
4                        weights_diagnostics      1.879989                                                   design_effect
5                        weights_diagnostics      0.531918                                     effective_sample_proportion
6                        weights_diagnostics    531.918146                                           effective_sample_size
7                        weights_diagnostics  10000.000000                                                             sum
8                        weights_diagnostics   1000.000000                                                  describe_count
9                        weights_diagnostics      1.000000                                                   describe_mean
10                       weights_diagnostics      0.938546                                                    describe_std
11                       weights_diagnostics      0.304163                                                    describe_min
12                       weights_diagnostics      0.445457                                                    describe_25%
13                       weights_diagnostics      0.653173                                                    describe_50%
14                       weights_diagnostics      1.166355                                                    describe_75%
15                       weights_diagnostics     11.355142                                                    describe_max
16                       weights_diagnostics      0.000000                                                   prop(w < 0.1)
17                       weights_diagnostics      0.000000                                                   prop(w < 0.2)
18                       weights_diagnostics      0.106000                                                 prop(w < 0.333)
19                       weights_diagnostics      0.323000                                                   prop(w < 0.5)
20                       weights_diagnostics      0.668000                                                     prop(w < 1)
21                       weights_diagnostics      0.332000                                                    prop(w >= 1)
22                       weights_diagnostics      0.096000                                                    prop(w >= 2)
23                       weights_diagnostics      0.030000                                                    prop(w >= 3)
24                       weights_diagnostics      0.011000                                                    prop(w >= 5)
25                       weights_diagnostics      0.001000                                                   prop(w >= 10)
26                       weights_diagnostics      0.369537                                              nonparametric_skew
27                       weights_diagnostics      0.214000                                 weighted_median_breakdown_point
28        weights_impact_on_outcome_mean_yw0     48.558814                                                       happiness
29        weights_impact_on_outcome_mean_yw1     53.295272                                                       happiness
30       weights_impact_on_outcome_mean_diff      4.736458                                                       happiness
31   weights_impact_on_outcome_diff_ci_lower      1.312255                                                       happiness
32   weights_impact_on_outcome_diff_ci_upper      8.160661                                                       happiness
33          weights_impact_on_outcome_t_stat      2.714368                                                       happiness
34         weights_impact_on_outcome_p_value      0.006755                                                       happiness
35               weights_impact_on_outcome_n   1000.000000                                                       happiness
36                         adjustment_method      0.000000                                                             ipw
37                          ipw_model_glance      9.000000                                                         n_iter_
38                          ipw_model_glance      0.138619                                                      intercept_
39                               ipw_penalty      0.000000                                                      deprecated
40                                ipw_solver      0.000000                                                           lbfgs
41                              model_glance      0.000100                                                             tol
42                              model_glance      0.000000                                                        l1_ratio
43                           ipw_multi_class      0.000000                                                            auto
44                              model_glance      0.041158                                                          lambda
45                              model_glance      1.386294                                                   null_deviance
46                              model_glance      1.146967                                                        deviance
47                              model_glance      0.172638                                              prop_dev_explained
48                              model_glance      1.155558                                                     cv_dev_mean
49                              model_glance      0.002568                                                      lambda_min
50                              model_glance      1.141446                                                 min_cv_dev_mean
51                              model_glance      0.014287                                                   min_cv_dev_sd
52                                model_coef      0.138619                                                       intercept
53                                model_coef      0.043944                                           _is_na_gender[T.True]
54                                model_coef     -0.203732                                              age_group[T.25-34]
55                                model_coef     -0.428683                                              age_group[T.35-44]
56                                model_coef     -0.529556                                                age_group[T.45+]
57                                model_coef      0.332490                                                  gender[T.Male]
58                                model_coef      0.043944                                                   gender[T._NA]
59                                model_coef      0.169578  income[Interval(-0.0009997440000000001, 0.44, closed='right')]
60                                model_coef      0.154197                   income[Interval(0.44, 1.664, closed='right')]
61                                model_coef      0.111212                  income[Interval(1.664, 3.472, closed='right')]
62                                model_coef     -0.041457                income[Interval(11.312, 15.139, closed='right')]
63                                model_coef     -0.161148                income[Interval(15.139, 20.567, closed='right')]
64                                model_coef     -0.211197                income[Interval(20.567, 29.504, closed='right')]
65                                model_coef     -0.357491               income[Interval(29.504, 128.536, closed='right')]
66                                model_coef      0.093738                  income[Interval(3.472, 5.663, closed='right')]
67                                model_coef      0.072936                  income[Interval(5.663, 8.211, closed='right')]
68                                model_coef      0.005787                 income[Interval(8.211, 11.312, closed='right')]
69                       covar_asmd_adjusted      0.021777                                              age_group[T.25-34]
70                       covar_asmd_adjusted      0.055884                                              age_group[T.35-44]
71                       covar_asmd_adjusted      0.169816                                                age_group[T.45+]
72                       covar_asmd_adjusted      0.097916                                                  gender[Female]
73                       covar_asmd_adjusted      0.103989                                                    gender[Male]
74                       covar_asmd_adjusted      0.010578                                                     gender[_NA]
75                       covar_asmd_adjusted      0.205469                                                          income
76                       covar_asmd_adjusted      0.119597                                                      mean(asmd)
77                     covar_asmd_unadjusted      0.005688                                              age_group[T.25-34]
78                     covar_asmd_unadjusted      0.312711                                              age_group[T.35-44]
79                     covar_asmd_unadjusted      0.378828                                                age_group[T.45+]
80                     covar_asmd_unadjusted      0.375699                                                  gender[Female]
81                     covar_asmd_unadjusted      0.379314                                                    gender[Male]
82                     covar_asmd_unadjusted      0.006296                                                     gender[_NA]
83                     covar_asmd_unadjusted      0.494217                                                          income
84                     covar_asmd_unadjusted      0.326799                                                      mean(asmd)
85                    covar_asmd_improvement     -0.016090                                              age_group[T.25-34]
86                    covar_asmd_improvement      0.256827                                              age_group[T.35-44]
87                    covar_asmd_improvement      0.209013                                                age_group[T.45+]
88                    covar_asmd_improvement      0.277783                                                  gender[Female]
89                    covar_asmd_improvement      0.275324                                                    gender[Male]
90                    covar_asmd_improvement     -0.004282                                                     gender[_NA]
91                    covar_asmd_improvement      0.288748                                                          income
92                    covar_asmd_improvement      0.207202                                                      mean(asmd)
93                  covar_main_asmd_adjusted      0.082492                                                       age_group
94                covar_main_asmd_unadjusted      0.232409                                                       age_group
95               covar_main_asmd_improvement      0.149917                                                       age_group
96                  covar_main_asmd_adjusted      0.070828                                                          gender
97                covar_main_asmd_unadjusted      0.253769                                                          gender
98               covar_main_asmd_improvement      0.182942                                                          gender
99                  covar_main_asmd_adjusted      0.205469                                                          income
100               covar_main_asmd_unadjusted      0.494217                                                          income
101              covar_main_asmd_improvement      0.288748                                                          income
102                 covar_main_asmd_adjusted      0.119597                                                      mean(asmd)
103               covar_main_asmd_unadjusted      0.326799                                                      mean(asmd)
104              covar_main_asmd_improvement      0.207202                                                      mean(asmd)
105                       adjustment_failure      0.000000                                                            None

print("Adjusted DataFrame columns:", adjusted.df.columns.tolist())
print(adjusted.df.head())

Adjusted DataFrame columns: ['id', 'gender', 'age_group', 'income', 'happiness', 'weight']
  id  gender age_group     income  happiness    weight
0  0    Male     25-34   6.428659  26.043029  6.531728
1  1  Female     18-24   9.940280  66.885485  9.617159
2  2    Male     18-24   2.673623  37.091922  3.562973
3  3     NaN     18-24  10.550308  49.394050  6.952117
4  4     NaN     18-24   2.689994  72.304208  5.129230

# Export to CSV (showing first 500 characters)
print(adjusted.to_csv()[:500])

id,gender,age_group,income,happiness,weight
0,Male,25-34,6.428659499046228,26.043028759747298,6.531727983159214
1,Female,18-24,9.940280228116047,66.88548460632677,9.617159404461365
2,Male,18-24,2.6736231547518043,37.091921916683006,3.562973405562926
3,,18-24,10.550307519418066,49.39405003271002,6.952116676608549
4,,18-24,2.689993854299385,72.30420755038209,5.1292302114666075
5,,35-44,5.995497722733131,57.28281646341816,16.424761754946537
6,,18-24,12.63469573898972,31.663293445944596,8.1911333259

adjusted.to_download()

filtered = adjusted.keep_only_some_rows_columns(
    rows_to_keep="gender == 'Female'",
    columns_to_keep=["gender", "age", "income"],
)
print(f"Original rows: {len(adjusted.responders)}")
print(f"Filtered rows: {len(filtered.responders)}")
print(filtered.df.head())

INFO (2026-07-07 14:20:51,055) [balance_frame/_filter_sf (line 3412)]: (rows_filtered/total_rows) = (268/1000)

INFO (2026-07-07 14:20:51,058) [balance_frame/_filter_sf (line 3412)]: (rows_filtered/total_rows) = (4551/10000)

INFO (2026-07-07 14:20:51,061) [balance_frame/_filter_sf (line 3412)]: (rows_filtered/total_rows) = (268/1000)

INFO (2026-07-07 14:20:51,063) [balance_frame/_filter_sf (line 3412)]: (rows_filtered/total_rows) = (268/268)

INFO (2026-07-07 14:20:51,066) [balance_frame/_filter_sf (line 3412)]: (rows_filtered/total_rows) = (4551/4551)

INFO (2026-07-07 14:20:51,068) [balance_frame/_filter_sf (line 3412)]: (rows_filtered/total_rows) = (268/268)

Original rows: 1000
Filtered rows: 268
   id  gender     income  happiness     weight
0   1  Female   9.940280  66.885485   9.617159
1  92  Female   0.185097  84.464522  17.392266
2  94  Female   1.183696  65.742184  17.794007
3  95  Female   3.716007  67.624539   7.283279
4  98  Female  16.751931  44.868651  48.725241

Old API (`Sample`)	New API (`SampleFrame` / `BalanceFrame`)
Column roles inferred by exclusion	Column roles declared explicitly
Mutable `.set_target()` / `.adjust()`	Immutable — `adjust()` returns a new object
One class does everything	Clear separation: data container vs. adjustment orchestrator
Weight provenance not tracked	Weight metadata recorded per-column

Step	Old API (`Sample`)	New API (`SampleFrame` / `BalanceFrame`)
Load data	`s = Sample.from_frame(df)`	`sf = SampleFrame.from_frame(df)`
Pair sample + target	`s.set_target(target)`	`bf = BalanceFrame(sample=sf, target=target_sf)`
Adjust	`adjusted = s.adjust()` (mutates s)	`adjusted = bf.adjust()` (bf unchanged)
Summary	`adjusted.summary()`	`adjusted.summary()`
Diagnostics	`adjusted.diagnostics()`	`adjusted.diagnostics()`
Covariates	`adjusted.covars().mean()`	`adjusted.covars().mean()`
Design effect	`adjusted.design_effect()`	`adjusted.weights().design_effect()`
CSV export	`adjusted.to_csv()`	`adjusted.to_csv()`
Filter	(not available)	`adjusted.keep_only_some_rows_columns(...)`

balance Quickstart: New API (SampleFrame / BalanceFrame)¶

Why a new API?¶

Analysis¶

Example dataset¶

Load data into SampleFrame objects¶

Inspecting a SampleFrame¶

Create a BalanceFrame¶

Pre-Adjustment Diagnostics¶

Visualizing the unadjusted comparison¶

Adjusting Sample to Population¶

Sklearn-style fit / design_matrix / predict_proba workflow¶

Fit on a subset, apply on the rest¶

Evaluation of the Results¶

Covariate plots¶

ASCII plots¶

Understanding the weights¶

Design effect and effective sample size¶

Outcome analysis¶

Analytics via BalanceDF views¶

Other Adjustment Methods¶

Compound / Sequential Adjustments¶

Transformations¶

Diagnostics¶

Exporting results¶

Filtering rows/columns¶

Summary: Old vs New API side-by-side¶