CLI tutorial¶

This tutorial walks through using the balance command-line interface (CLI) to adjust a sample dataset to a target. We will build a small demo dataset, run the CLI, and inspect the outputs.

The real power of a CLI lies in how seamlessly it integrates into the broader ecosystem of automation and data workflows. A CLI command can be invoked directly from shell scripts, scheduled via cron jobs, embedded in CI/CD pipelines, or orchestrated through tools like Airflow - all with minimal overhead. This composability means you can chain balance operations with other command-line tools using pipes, process batches of files in a loop, or trigger analyses based on events, all while maintaining a clear audit trail since the command itself documents exactly what was run. The non-zero exit codes that CLIs return on failure integrate naturally with automated systems that need to halt pipelines or send alerts when something goes wrong. In short, a CLI transforms balance from something you use interactively into a building block for production-grade, reproducible workflows.

Prerequisites¶

Make sure balance is installed and the balance CLI is on your PATH. You can also run the CLI via python -m balance.cli from a checkout of the repository.

In [1]:
import os
import subprocess
import tempfile

import pandas as pd

from balance import load_data
INFO (2026-03-12 17:27:14,077) [__init__/<module> (line 72)]: Using balance version 0.16.1
balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Use the bundled demo data¶

Balance ships with a small demo dataset via load_data(). You can build the CLI input by adding a sample indicator and weight columns, then concatenate sample and target frames.

In [2]:
target_df, sample_df = load_data()

sample_df = sample_df.copy()
target_df = target_df.copy()
sample_df["is_respondent"] = 1
target_df["is_respondent"] = 0
sample_df["weight"] = 1.0
target_df["weight"] = 1.0

load_data_input_df = pd.concat([sample_df, target_df], ignore_index=True)
load_data_input_df.head()
Out[2]:
id gender age_group income happiness is_respondent weight
0 0 Male 25-34 6.428659 26.043029 1 1.0
1 1 Female 18-24 9.940280 66.885485 1 1.0
2 2 Male 18-24 2.673623 37.091922 1 1.0
3 3 NaN 18-24 10.550308 49.394050 1 1.0
4 4 NaN 18-24 2.689994 72.304208 1 1.0

Run the CLI¶

We'll write the input dataset to disk, then call the CLI to compute weights and diagnostics.

In [3]:
with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input.csv")
    output_path = os.path.join(tmpdir, "weights_out.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_out.csv")

    load_data_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file",
        input_path,
        "--output_file",
        output_path,
        "--diagnostics_output_file",
        diagnostics_path,
        "--covariate_columns",
        "gender,age_group,income",
        "--method",
        "ipw",
        "--weights_impact_on_outcome_method",
        "t_test",
    ]

    print("CLI command:", " ".join(cmd))
    subprocess.check_call(cmd)

    load_data_adjusted_df = pd.read_csv(output_path)
    load_data_diagnostics_df = pd.read_csv(diagnostics_path)

load_data_adjusted_df.head()
CLI command: python -m balance.cli --input_file /tmp/tmpzj_gpg8u/input.csv --output_file /tmp/tmpzj_gpg8u/weights_out.csv --diagnostics_output_file /tmp/tmpzj_gpg8u/diagnostics_out.csv --covariate_columns gender,age_group,income --method ipw --weights_impact_on_outcome_method t_test
INFO (2026-03-12 17:27:16,236) [__init__/<module> (line 72)]: Using balance version 0.16.1
INFO (2026-03-12 17:27:16,238) [cli/main (line 1095)]: Running cli.main() using balance version 0.16.1
INFO (2026-03-12 17:27:16,238) [cli/main (line 1130)]: Attributes used by main() for running adjust: {'transformations': 'default', 'formula': None, 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 1.5, 'lambda_min': 1e-05, 'lambda_max': 10, 'num_lambdas': 250, 'weight_trimming_mean_ratio': 20.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.16.1'}
INFO (2026-03-12 17:27:16,249) [cli/load_and_check_input (line 926)]: Number of rows in input file: 11000
INFO (2026-03-12 17:27:16,249) [cli/load_and_check_input (line 932)]: Number of columns in input file: 7
WARNING (2026-03-12 17:27:16,401) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:16,412) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:16,412) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:16,413) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
id               int64
is_respondent    int64
dtype: object
WARNING (2026-03-12 17:27:16,413) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:16,414) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
id                   str
is_respondent    float64
dtype: object
INFO (2026-03-12 17:27:16,415) [cli/process_batch (line 747)]: balance sample object: 
        balance Sample object
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
WARNING (2026-03-12 17:27:16,423) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:16,436) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:16,437) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:16,437) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
id               int64
is_respondent    int64
dtype: object
WARNING (2026-03-12 17:27:16,437) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:16,438) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
id                   str
is_respondent    float64
dtype: object
INFO (2026-03-12 17:27:16,441) [cli/process_batch (line 758)]: balance target object: 
        balance Sample object
        10000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
INFO (2026-03-12 17:27:16,445) [ipw/ipw (line 703)]: Starting ipw function
INFO (2026-03-12 17:27:16,447) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-03-12 17:27:16,447) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group', 'income']
INFO (2026-03-12 17:27:16,454) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group', 'income']
INFO (2026-03-12 17:27:16,461) [ipw/ipw (line 738)]: Building model matrix
INFO (2026-03-12 17:27:16,570) [ipw/ipw (line 764)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']
INFO (2026-03-12 17:27:16,570) [ipw/ipw (line 767)]: The number of columns in the model matrix: 18
INFO (2026-03-12 17:27:16,570) [ipw/ipw (line 768)]: The number of rows in the model matrix: 11000
INFO (2026-03-12 17:27:18,039) [ipw/ipw (line 990)]: Done with sklearn
INFO (2026-03-12 17:27:18,039) [ipw/ipw (line 992)]: max_de: 1.5
INFO (2026-03-12 17:27:18,039) [ipw/choose_regularization (line 368)]: Starting choosing regularisation parameters
INFO (2026-03-12 17:27:26,575) [ipw/choose_regularization (line 454)]: Best regularisation: 
           s  s_index  trim  design_effect  asmd_improvement      asmd
9  0.064155       91   2.5        1.49551          0.535725  0.090719
INFO (2026-03-12 17:27:26,577) [ipw/ipw (line 1047)]: Chosen lambda: 0.06415476458273757
INFO (2026-03-12 17:27:26,577) [ipw/ipw (line 1065)]: Proportion null deviance explained 0.17450914016991492
INFO (2026-03-12 17:27:26,581) [cli/process_batch (line 781)]: Succeeded with adjusting sample to target
INFO (2026-03-12 17:27:26,583) [cli/process_batch (line 782)]: balance adjusted object: 
        Adjusted balance Sample object with target set using ipw
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
        adjustment details:
            method: ipw
            weight trimming mean ratio: 2.5
            design effect (Deff): 1.496
            effective sample size proportion (ESSP): 0.669
            effective sample size (ESS): 668.7
                
            target:
                 
	        balance Sample object
	        10000 observations x 3 variables: gender,age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: None
	        
            3 common variables: gender,age_group,income
            
INFO (2026-03-12 17:27:26,583) [cli/process_batch (line 784)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-03-12 17:27:26,583) [cli/process_batch (line 788)]: Names of columns to keep for diagnostics: None 
INFO (2026-03-12 17:27:26,583) [sample_class/diagnostics (line 1826)]: Starting computation of diagnostics of the fitting
INFO (2026-03-12 17:27:26,859) [sample_class/diagnostics (line 2069)]: Done computing diagnostics
INFO (2026-03-12 17:27:26,864) [cli/process_batch (line 799)]: balance diagnostics object:                          metric           val            var
0                          size   1000.000000     sample_obs
1                          size      3.000000  sample_covars
2                          size  10000.000000     target_obs
3                          size      3.000000  target_covars
4           weights_diagnostics      1.495510  design_effect
..                          ...           ...            ...
91  covar_main_asmd_improvement      0.182907         income
92     covar_main_asmd_adjusted      0.173301     mean(asmd)
93   covar_main_asmd_unadjusted      0.326799     mean(asmd)
94  covar_main_asmd_improvement      0.153497     mean(asmd)
95           adjustment_failure      0.000000            NaN

[96 rows x 3 columns]
INFO (2026-03-12 17:27:26,866) [cli/main (line 1184)]: Done fitting the model, writing output
balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Out[3]:
id gender age_group income weight happiness is_respondent
0 0 Male 25-34 6.428659 7.602714 26.043029 1.0
1 1 Female 18-24 9.940280 9.397964 66.885485 1.0
2 2 Male 18-24 2.673623 3.433402 37.091922 1.0
3 3 NaN 18-24 10.550308 6.491919 49.394050 1.0
4 4 NaN 18-24 2.689994 4.887119 72.304208 1.0

Filter the diagnostics to review the outcome-weight impact metrics:

In [4]:
load_data_diagnostics_df[
    load_data_diagnostics_df["metric"].str.startswith("weights_impact_on_outcome_")
]
Out[4]:
metric val var

Inspect diagnostics¶

The diagnostics output is a flat table that includes adjustment metadata and balance metrics. The metric column identifies the type of diagnostic, while var indicates the variable (or NaN for overall summaries). It is most useful to inspect var in the context of the metric it belongs to. The cells below use the diagnostics from the previous CLI run (load_data_diagnostics_df).

In [5]:
(
    load_data_diagnostics_df.groupby("metric")["var"]
    .apply(lambda col: sorted(col.dropna().unique()))
    .sort_index()
)
Out[5]:
metric
adjustment_failure                                                            []
adjustment_method                                                          [ipw]
covar_asmd_adjusted            [age_group[T.25-34], age_group[T.35-44], age_g...
covar_asmd_improvement         [age_group[T.25-34], age_group[T.35-44], age_g...
covar_asmd_unadjusted          [age_group[T.25-34], age_group[T.35-44], age_g...
covar_main_asmd_adjusted                 [age_group, gender, income, mean(asmd)]
covar_main_asmd_improvement              [age_group, gender, income, mean(asmd)]
covar_main_asmd_unadjusted               [age_group, gender, income, mean(asmd)]
ipw_model_glance                                           [intercept_, n_iter_]
ipw_multi_class                                                           [auto]
ipw_penalty                                                         [deprecated]
ipw_solver                                                               [lbfgs]
model_coef                     [C(_is_na_gender, one_hot_encoding_greater_2)[...
model_glance                   [deviance, l1_ratio, lambda, null_deviance, pr...
size                           [sample_covars, sample_obs, target_covars, tar...
weights_diagnostics            [describe_25%, describe_50%, describe_75%, des...
Name: var, dtype: object
In [6]:
load_data_diagnostics_df.query("metric == 'adjustment_method'")
Out[6]:
metric val var
28 adjustment_method 0.0 ipw

CLI Help and Arguments¶

You can view all available CLI arguments using --help. Because the full output is long, the snippet below prints the first section only.

In [7]:
# Print a shorter CLI help snippet
help_output = subprocess.run(
    ["python", "-m", "balance.cli", "--help"],
    check=False,
    capture_output=True,
    text=True,
).stdout
print("\n".join(help_output.splitlines()[:40]))
balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

usage: cli.py [-h] --input_file INPUT_FILE --output_file OUTPUT_FILE
              [--diagnostics_output_file DIAGNOSTICS_OUTPUT_FILE]
              [--method METHOD] [--sample_column SAMPLE_COLUMN]
              [--id_column ID_COLUMN] [--weight_column WEIGHT_COLUMN]
              --covariate_columns COVARIATE_COLUMNS
              [--outcome_columns OUTCOME_COLUMNS]
              [--covariate_columns_for_diagnostics COVARIATE_COLUMNS_FOR_DIAGNOSTICS]
              [--rows_to_keep_for_diagnostics ROWS_TO_KEEP_FOR_DIAGNOSTICS]
              [--weights_impact_on_outcome_method WEIGHTS_IMPACT_ON_OUTCOME_METHOD]
              [--batch_columns BATCH_COLUMNS] [--keep_columns KEEP_COLUMNS]
              [--keep_row_column KEEP_ROW_COLUMN]
              [--sep_input_file SEP_INPUT_FILE]
              [--sep_output_file SEP_OUTPUT_FILE]
              [--sep_diagnostics_output_file SEP_DIAGNOSTICS_OUTPUT_FILE]
              [--no_output_header] [--succeed_on_weighting_failure]
              [--max_de MAX_DE] [--lambda_min LAMBDA_MIN]
              [--lambda_max LAMBDA_MAX] [--num_lambdas NUM_LAMBDAS]
              [--ipw_logistic_regression_kwargs IPW_LOGISTIC_REGRESSION_KWARGS]
              [--weight_trimming_mean_ratio WEIGHT_TRIMMING_MEAN_RATIO]
              [--one_hot_encoding ONE_HOT_ENCODING]
              [--transformations TRANSFORMATIONS] [--formula FORMULA]
              [--return_df_with_original_dtypes]
              [--standardize_types STANDARDIZE_TYPES]

options:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE
                        Path to input sample/target
  --output_file OUTPUT_FILE

Key CLI Arguments Summary¶

Here are the most commonly used arguments:

Argument Default Description
--method ipw Adjustment method: ipw, cbps, or rake
--max_de 1.5 Maximum design effect. Set to None to use lambda_1se instead
--lambda_min 1e-05 Lower bound for L1 penalty (IPW only)
--lambda_max 10 Upper bound for L1 penalty (IPW only)
--num_lambdas 250 Number of lambda values to search (IPW only)
--weight_trimming_mean_ratio 20.0 Trim weights above mean(weights) * ratio
--transformations default Covariate transformations. Use None to disable
--formula None Custom model formula (e.g., "gender + income")
--one_hot_encoding True One-hot encode categorical features
--batch_columns None Columns to group by for batch processing
--keep_columns None Subset of columns to include in output
--outcome_columns None Columns treated as outcomes (not covariates)
--ipw_logistic_regression_kwargs None JSON string of kwargs for sklearn LogisticRegression
--succeed_on_weighting_failure False Return null weights instead of failing on errors

Example: Tuning IPW parameters¶

Below we run the CLI with custom regularization settings and a custom logistic regression solver:

In [8]:
with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input.csv")
    output_path = os.path.join(tmpdir, "weights_tuned.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_tuned.csv")

    load_data_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file", input_path,
        "--output_file", output_path,
        "--diagnostics_output_file", diagnostics_path,
        "--covariate_columns", "gender,age_group,income",
        "--method", "ipw",
        # Tuning parameters
        "--max_de", "2.0",
        "--lambda_min", "1e-06",
        "--lambda_max", "100",
        "--num_lambdas", "500",
        "--weight_trimming_mean_ratio", "10.0",
        # Custom logistic regression settings
        "--ipw_logistic_regression_kwargs", '{"solver": "liblinear", "max_iter": 500}',
    ]

    print("CLI command:")
    print(" ".join(cmd))
    subprocess.check_call(cmd)

    tuned_adjusted_df = pd.read_csv(output_path)

tuned_adjusted_df.head()
CLI command:
python -m balance.cli --input_file /tmp/tmp__4to2tm/input.csv --output_file /tmp/tmp__4to2tm/weights_tuned.csv --diagnostics_output_file /tmp/tmp__4to2tm/diagnostics_tuned.csv --covariate_columns gender,age_group,income --method ipw --max_de 2.0 --lambda_min 1e-06 --lambda_max 100 --num_lambdas 500 --weight_trimming_mean_ratio 10.0 --ipw_logistic_regression_kwargs {"solver": "liblinear", "max_iter": 500}
INFO (2026-03-12 17:27:31,704) [__init__/<module> (line 72)]: Using balance version 0.16.1
INFO (2026-03-12 17:27:31,706) [cli/main (line 1095)]: Running cli.main() using balance version 0.16.1
INFO (2026-03-12 17:27:31,706) [cli/main (line 1130)]: Attributes used by main() for running adjust: {'transformations': 'default', 'formula': None, 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 2.0, 'lambda_min': 1e-06, 'lambda_max': 100.0, 'num_lambdas': 500, 'weight_trimming_mean_ratio': 10.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.16.1'}
INFO (2026-03-12 17:27:31,717) [cli/load_and_check_input (line 926)]: Number of rows in input file: 11000
INFO (2026-03-12 17:27:31,717) [cli/load_and_check_input (line 932)]: Number of columns in input file: 7
WARNING (2026-03-12 17:27:31,870) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:31,881) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:31,881) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:31,882) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:27:31,882) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:31,883) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:27:31,884) [cli/process_batch (line 747)]: balance sample object: 
        balance Sample object
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
WARNING (2026-03-12 17:27:31,892) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:31,905) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:31,906) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:31,906) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:27:31,907) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:31,907) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:27:31,909) [cli/process_batch (line 758)]: balance target object: 
        balance Sample object
        10000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
INFO (2026-03-12 17:27:31,914) [ipw/ipw (line 703)]: Starting ipw function
INFO (2026-03-12 17:27:31,916) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-03-12 17:27:31,916) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group', 'income']
INFO (2026-03-12 17:27:31,923) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group', 'income']
INFO (2026-03-12 17:27:31,930) [ipw/ipw (line 738)]: Building model matrix
INFO (2026-03-12 17:27:32,039) [ipw/ipw (line 764)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']
INFO (2026-03-12 17:27:32,039) [ipw/ipw (line 767)]: The number of columns in the model matrix: 18
INFO (2026-03-12 17:27:32,039) [ipw/ipw (line 768)]: The number of rows in the model matrix: 11000
INFO (2026-03-12 17:27:32,066) [ipw/ipw (line 990)]: Done with sklearn
INFO (2026-03-12 17:27:32,066) [ipw/ipw (line 992)]: max_de: 2.0
INFO (2026-03-12 17:27:32,066) [ipw/choose_regularization (line 368)]: Starting choosing regularisation parameters
INFO (2026-03-12 17:27:36,743) [ipw/choose_regularization (line 454)]: Best regularisation: 
     s  s_index  trim  design_effect  asmd_improvement      asmd
6 NaN        0   2.5       1.714071          0.634917  0.071337
INFO (2026-03-12 17:27:36,745) [ipw/ipw (line 1047)]: Chosen lambda: nan
INFO (2026-03-12 17:27:36,745) [ipw/ipw (line 1065)]: Proportion null deviance explained 0.18280833369391136
INFO (2026-03-12 17:27:36,748) [cli/process_batch (line 781)]: Succeeded with adjusting sample to target
INFO (2026-03-12 17:27:36,750) [cli/process_batch (line 782)]: balance adjusted object: 
        Adjusted balance Sample object with target set using ipw
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
        adjustment details:
            method: ipw
            weight trimming mean ratio: 2.5
            design effect (Deff): 1.714
            effective sample size proportion (ESSP): 0.583
            effective sample size (ESS): 583.4
                
            target:
                 
	        balance Sample object
	        10000 observations x 3 variables: gender,age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: None
	        
            3 common variables: gender,age_group,income
            
INFO (2026-03-12 17:27:36,750) [cli/process_batch (line 784)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-03-12 17:27:36,750) [cli/process_batch (line 788)]: Names of columns to keep for diagnostics: None 
INFO (2026-03-12 17:27:36,750) [sample_class/diagnostics (line 1826)]: Starting computation of diagnostics of the fitting
INFO (2026-03-12 17:27:37,027) [sample_class/diagnostics (line 2069)]: Done computing diagnostics
INFO (2026-03-12 17:27:37,031) [cli/process_batch (line 799)]: balance diagnostics object:                          metric           val            var
0                          size   1000.000000     sample_obs
1                          size      3.000000  sample_covars
2                          size  10000.000000     target_obs
3                          size      3.000000  target_covars
4           weights_diagnostics      1.714071  design_effect
..                          ...           ...            ...
91  covar_main_asmd_improvement      0.225463         income
92     covar_main_asmd_adjusted      0.143344     mean(asmd)
93   covar_main_asmd_unadjusted      0.326799     mean(asmd)
94  covar_main_asmd_improvement      0.183455     mean(asmd)
95           adjustment_failure      0.000000            NaN

[96 rows x 3 columns]
INFO (2026-03-12 17:27:37,034) [cli/main (line 1184)]: Done fitting the model, writing output
balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Out[8]:
id gender age_group income weight happiness is_respondent
0 0 Male 25-34 6.428659 6.714531 26.043029 1.0
1 1 Female 18-24 9.940280 8.721215 66.885485 1.0
2 2 Male 18-24 2.673623 2.537674 37.091922 1.0
3 3 NaN 18-24 10.550308 5.587013 49.394050 1.0
4 4 NaN 18-24 2.689994 3.883128 72.304208 1.0

Example: Using a Custom Formula¶

The --formula argument allows you to specify a custom model formula, including interaction terms. When using --formula, you should typically also set --transformations=None to prevent automatic transformations from interfering with your custom formula.

The formula uses patsy/R-style syntax:

  • gender + income: additive terms (no interaction)
  • gender * income: equivalent to gender + income + gender:income (main effects + interaction)
  • gender:income: only the interaction term
In [9]:
with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input.csv")
    output_path = os.path.join(tmpdir, "weights_formula.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_formula.csv")

    # Use the demo data for the formula example
    load_data_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file", input_path,
        "--output_file", output_path,
        "--diagnostics_output_file", diagnostics_path,
        "--covariate_columns", "gender,age_group,income",
        "--method", "ipw",
        # Disable transformations to use raw covariates in formula
        "--transformations", "None",
        # Use a formula with interaction term
        "--formula", "gender*income",
    ]

    print("CLI command with custom formula:")
    print(" ".join(cmd))
    subprocess.check_call(cmd)

    formula_diagnostics_df = pd.read_csv(diagnostics_path)

# Check model coefficients to verify formula was applied
print("\nModel coefficients (showing interaction term):")
print(formula_diagnostics_df.query("metric == 'model_coef'")[["var", "val"]])
CLI command with custom formula:
python -m balance.cli --input_file /tmp/tmpzk20yf2u/input.csv --output_file /tmp/tmpzk20yf2u/weights_formula.csv --diagnostics_output_file /tmp/tmpzk20yf2u/diagnostics_formula.csv --covariate_columns gender,age_group,income --method ipw --transformations None --formula gender*income
INFO (2026-03-12 17:27:39,494) [__init__/<module> (line 72)]: Using balance version 0.16.1
INFO (2026-03-12 17:27:39,496) [cli/main (line 1095)]: Running cli.main() using balance version 0.16.1
INFO (2026-03-12 17:27:39,496) [cli/main (line 1130)]: Attributes used by main() for running adjust: {'transformations': None, 'formula': 'gender*income', 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 1.5, 'lambda_min': 1e-05, 'lambda_max': 10, 'num_lambdas': 250, 'weight_trimming_mean_ratio': 20.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.16.1'}
INFO (2026-03-12 17:27:39,507) [cli/load_and_check_input (line 926)]: Number of rows in input file: 11000
INFO (2026-03-12 17:27:39,507) [cli/load_and_check_input (line 932)]: Number of columns in input file: 7
WARNING (2026-03-12 17:27:39,660) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:39,671) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:39,671) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:39,672) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:27:39,672) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:39,673) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:27:39,674) [cli/process_batch (line 747)]: balance sample object: 
        balance Sample object
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
WARNING (2026-03-12 17:27:39,681) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:39,694) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:39,695) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:39,695) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:27:39,695) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:39,696) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:27:39,699) [cli/process_batch (line 758)]: balance target object: 
        balance Sample object
        10000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
INFO (2026-03-12 17:27:39,704) [ipw/ipw (line 703)]: Starting ipw function
INFO (2026-03-12 17:27:39,705) [ipw/ipw (line 738)]: Building model matrix
INFO (2026-03-12 17:27:39,753) [ipw/ipw (line 764)]: The formula used to build the model matrix: ['gender*income']
INFO (2026-03-12 17:27:39,753) [ipw/ipw (line 767)]: The number of columns in the model matrix: 7
INFO (2026-03-12 17:27:39,753) [ipw/ipw (line 768)]: The number of rows in the model matrix: 11000
INFO (2026-03-12 17:27:41,213) [ipw/ipw (line 990)]: Done with sklearn
INFO (2026-03-12 17:27:41,213) [ipw/ipw (line 992)]: max_de: 1.5
INFO (2026-03-12 17:27:41,213) [ipw/choose_regularization (line 368)]: Starting choosing regularisation parameters
INFO (2026-03-12 17:27:47,871) [ipw/choose_regularization (line 454)]: Best regularisation: 
           s  s_index  trim  design_effect  asmd_improvement      asmd
9  0.043507       98   5.0       1.495849          0.517118  0.157805
INFO (2026-03-12 17:27:47,873) [ipw/ipw (line 1047)]: Chosen lambda: 0.043506507030756265
INFO (2026-03-12 17:27:47,873) [ipw/ipw (line 1065)]: Proportion null deviance explained 0.09595118841373662
WARNING (2026-03-12 17:27:47,873) [ipw/ipw (line 1073)]: The propensity model has low fraction null deviance explained (0.09595118841373662). Results may not be accurate
INFO (2026-03-12 17:27:47,876) [cli/process_batch (line 781)]: Succeeded with adjusting sample to target
INFO (2026-03-12 17:27:47,878) [cli/process_batch (line 782)]: balance adjusted object: 
        Adjusted balance Sample object with target set using ipw
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
        adjustment details:
            method: ipw
            weight trimming mean ratio: 5.0
            design effect (Deff): 1.496
            effective sample size proportion (ESSP): 0.669
            effective sample size (ESS): 668.5
                
            target:
                 
	        balance Sample object
	        10000 observations x 3 variables: gender,age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: None
	        
            3 common variables: gender,age_group,income
            
INFO (2026-03-12 17:27:47,878) [cli/process_batch (line 784)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-03-12 17:27:47,878) [cli/process_batch (line 788)]: Names of columns to keep for diagnostics: None 
INFO (2026-03-12 17:27:47,878) [sample_class/diagnostics (line 1826)]: Starting computation of diagnostics of the fitting
INFO (2026-03-12 17:27:48,157) [sample_class/diagnostics (line 2069)]: Done computing diagnostics
INFO (2026-03-12 17:27:48,162) [cli/process_batch (line 799)]: balance diagnostics object:                          metric           val            var
0                          size   1000.000000     sample_obs
1                          size      3.000000  sample_covars
2                          size  10000.000000     target_obs
3                          size      3.000000  target_covars
4           weights_diagnostics      1.495849  design_effect
..                          ...           ...            ...
80  covar_main_asmd_improvement      0.301760         income
81     covar_main_asmd_adjusted      0.157805     mean(asmd)
82   covar_main_asmd_unadjusted      0.326799     mean(asmd)
83  covar_main_asmd_improvement      0.168993     mean(asmd)
84           adjustment_failure      0.000000            NaN

[85 rows x 3 columns]
INFO (2026-03-12 17:27:48,164) [cli/main (line 1184)]: Done fitting the model, writing output
balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Model coefficients (showing interaction term):
                                                  var       val
40                                          intercept  0.452742
41      C(gender, one_hot_encoding_greater_2)[Female] -0.186488
42  C(gender, one_hot_encoding_greater_2)[Female]:... -0.224843
43        C(gender, one_hot_encoding_greater_2)[Male]  0.181382
44  C(gender, one_hot_encoding_greater_2)[Male]:in... -0.198422
45         C(gender, one_hot_encoding_greater_2)[_NA]  0.008414
46  C(gender, one_hot_encoding_greater_2)[_NA]:income -0.091654
47                                             income -0.372709

Batch Processing Example¶

The --batch_columns argument allows you to run separate adjustments for each unique combination of values in the specified columns. This is useful when you want to compute weights independently for different subgroups (e.g., by gender or region).

In [10]:
# Create a dataset with a batch column for gender
batch_input_df = load_data_input_df.copy()

# The 'gender' column has values like 'Female', 'Male', and possibly NA
# Filter to only rows with non-null gender for this example
batch_input_df = batch_input_df[batch_input_df["gender"].notna()].copy()
print(f"Rows after filtering: {len(batch_input_df)}")
print(f"Gender distribution:\n{batch_input_df['gender'].value_counts()}")
Rows after filtering: 10014
Gender distribution:
gender
Male      5195
Female    4819
Name: count, dtype: int64
In [11]:
with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input_batch.csv")
    output_path = os.path.join(tmpdir, "weights_batch.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_batch.csv")

    batch_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file", input_path,
        "--output_file", output_path,
        "--diagnostics_output_file", diagnostics_path,
        "--covariate_columns", "age_group,income",  # Note: gender is now used as batch column
        "--outcome_columns", "happiness",
        "--batch_columns", "gender",  # Process each gender separately
        "--method", "ipw",
    ]

    print("CLI command with batch processing:")
    print(" ".join(cmd))
    subprocess.check_call(cmd)

    batch_adjusted_df = pd.read_csv(output_path)
    batch_diagnostics_df = pd.read_csv(diagnostics_path)

print(f"\nOutput rows: {len(batch_adjusted_df)}")
batch_adjusted_df.head()
CLI command with batch processing:
python -m balance.cli --input_file /tmp/tmppjmjw6pb/input_batch.csv --output_file /tmp/tmppjmjw6pb/weights_batch.csv --diagnostics_output_file /tmp/tmppjmjw6pb/diagnostics_batch.csv --covariate_columns age_group,income --outcome_columns happiness --batch_columns gender --method ipw
INFO (2026-03-12 17:27:50,670) [__init__/<module> (line 72)]: Using balance version 0.16.1
INFO (2026-03-12 17:27:50,672) [cli/main (line 1095)]: Running cli.main() using balance version 0.16.1
INFO (2026-03-12 17:27:50,672) [cli/main (line 1130)]: Attributes used by main() for running adjust: {'transformations': 'default', 'formula': None, 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 1.5, 'lambda_min': 1e-05, 'lambda_max': 10, 'num_lambdas': 250, 'weight_trimming_mean_ratio': 20.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.16.1'}
INFO (2026-03-12 17:27:50,682) [cli/load_and_check_input (line 926)]: Number of rows in input file: 10014
INFO (2026-03-12 17:27:50,682) [cli/load_and_check_input (line 932)]: Number of columns in input file: 7
INFO (2026-03-12 17:27:50,685) [cli/main (line 1141)]: Running weighting for batch = ('Female',) 
WARNING (2026-03-12 17:27:50,838) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:50,849) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:50,849) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:50,850) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:27:50,850) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:50,851) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:27:50,852) [cli/process_batch (line 747)]: balance sample object: 
        balance Sample object
        268 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
WARNING (2026-03-12 17:27:50,859) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:50,870) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:50,870) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:50,871) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:27:50,871) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:50,872) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:27:50,874) [cli/process_batch (line 758)]: balance target object: 
        balance Sample object
        4551 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
INFO (2026-03-12 17:27:50,877) [ipw/ipw (line 703)]: Starting ipw function
INFO (2026-03-12 17:27:50,878) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-03-12 17:27:50,878) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['age_group', 'income']
INFO (2026-03-12 17:27:50,883) [adjustment/apply_transformations (line 469)]: Final variables in output: ['age_group', 'income']
INFO (2026-03-12 17:27:50,887) [ipw/ipw (line 738)]: Building model matrix
INFO (2026-03-12 17:27:50,921) [ipw/ipw (line 764)]: The formula used to build the model matrix: ['income + age_group']
INFO (2026-03-12 17:27:50,921) [ipw/ipw (line 767)]: The number of columns in the model matrix: 14
INFO (2026-03-12 17:27:50,921) [ipw/ipw (line 768)]: The number of rows in the model matrix: 4819
INFO (2026-03-12 17:27:51,811) [ipw/ipw (line 990)]: Done with sklearn
INFO (2026-03-12 17:27:51,812) [ipw/ipw (line 992)]: max_de: 1.5
INFO (2026-03-12 17:27:51,812) [ipw/choose_regularization (line 368)]: Starting choosing regularisation parameters
INFO (2026-03-12 17:27:55,882) [ipw/choose_regularization (line 454)]: Best regularisation: 
           s  s_index  trim  design_effect  asmd_improvement     asmd
6  0.105705       82   5.0       1.489687           0.49424  0.09868
INFO (2026-03-12 17:27:55,884) [ipw/ipw (line 1047)]: Chosen lambda: 0.10570520810009826
INFO (2026-03-12 17:27:55,884) [ipw/ipw (line 1065)]: Proportion null deviance explained 0.14888521519197495
INFO (2026-03-12 17:27:55,888) [cli/process_batch (line 781)]: Succeeded with adjusting sample to target
INFO (2026-03-12 17:27:55,890) [cli/process_batch (line 782)]: balance adjusted object: 
        Adjusted balance Sample object with target set using ipw
        268 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
        adjustment details:
            method: ipw
            weight trimming mean ratio: 5.0
            design effect (Deff): 1.490
            effective sample size proportion (ESSP): 0.671
            effective sample size (ESS): 179.9
                
            target:
                 
	        balance Sample object
	        4551 observations x 2 variables: age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: happiness
	        
            2 common variables: age_group,income
            
INFO (2026-03-12 17:27:55,890) [cli/process_batch (line 784)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-03-12 17:27:55,890) [cli/process_batch (line 788)]: Names of columns to keep for diagnostics: None 
INFO (2026-03-12 17:27:55,890) [sample_class/diagnostics (line 1826)]: Starting computation of diagnostics of the fitting
INFO (2026-03-12 17:27:56,041) [sample_class/diagnostics (line 2069)]: Done computing diagnostics
INFO (2026-03-12 17:27:56,046) [cli/process_batch (line 799)]: balance diagnostics object:                          metric          val            var
0                          size   268.000000     sample_obs
1                          size     2.000000  sample_covars
2                          size  4551.000000     target_obs
3                          size     2.000000  target_covars
4           weights_diagnostics     1.489687  design_effect
..                          ...          ...            ...
86  covar_main_asmd_improvement     0.185596         income
87     covar_main_asmd_adjusted     0.220366     mean(asmd)
88   covar_main_asmd_unadjusted     0.422500     mean(asmd)
89  covar_main_asmd_improvement     0.202135     mean(asmd)
90           adjustment_failure     0.000000            NaN

[91 rows x 3 columns]
INFO (2026-03-12 17:27:56,048) [cli/main (line 1158)]: Done processing batch ('Female',)
INFO (2026-03-12 17:27:56,049) [cli/main (line 1141)]: Running weighting for batch = ('Male',) 
WARNING (2026-03-12 17:27:56,058) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:56,068) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:56,068) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:56,069) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:27:56,069) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:56,070) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:27:56,071) [cli/process_batch (line 747)]: balance sample object: 
        balance Sample object
        644 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
WARNING (2026-03-12 17:27:56,078) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:56,089) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:56,089) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:56,090) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:27:56,090) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:56,090) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:27:56,092) [cli/process_batch (line 758)]: balance target object: 
        balance Sample object
        4551 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
INFO (2026-03-12 17:27:56,095) [ipw/ipw (line 703)]: Starting ipw function
INFO (2026-03-12 17:27:56,097) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-03-12 17:27:56,097) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['age_group', 'income']
INFO (2026-03-12 17:27:56,101) [adjustment/apply_transformations (line 469)]: Final variables in output: ['age_group', 'income']
INFO (2026-03-12 17:27:56,105) [ipw/ipw (line 738)]: Building model matrix
INFO (2026-03-12 17:27:56,139) [ipw/ipw (line 764)]: The formula used to build the model matrix: ['income + age_group']
INFO (2026-03-12 17:27:56,139) [ipw/ipw (line 767)]: The number of columns in the model matrix: 14
INFO (2026-03-12 17:27:56,139) [ipw/ipw (line 768)]: The number of rows in the model matrix: 5195
INFO (2026-03-12 17:27:56,947) [ipw/ipw (line 990)]: Done with sklearn
INFO (2026-03-12 17:27:56,948) [ipw/ipw (line 992)]: max_de: 1.5
INFO (2026-03-12 17:27:56,948) [ipw/choose_regularization (line 368)]: Starting choosing regularisation parameters
INFO (2026-03-12 17:28:01,056) [ipw/choose_regularization (line 454)]: Best regularisation: 
           s  s_index  trim  design_effect  asmd_improvement      asmd
9  0.111736       81   5.0       1.495967          0.566287  0.087357
INFO (2026-03-12 17:28:01,057) [ipw/ipw (line 1047)]: Chosen lambda: 0.11173591019485084
INFO (2026-03-12 17:28:01,058) [ipw/ipw (line 1065)]: Proportion null deviance explained 0.1426717197478461
INFO (2026-03-12 17:28:01,061) [cli/process_batch (line 781)]: Succeeded with adjusting sample to target
INFO (2026-03-12 17:28:01,063) [cli/process_batch (line 782)]: balance adjusted object: 
        Adjusted balance Sample object with target set using ipw
        644 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
        adjustment details:
            method: ipw
            weight trimming mean ratio: 5.0
            design effect (Deff): 1.496
            effective sample size proportion (ESSP): 0.668
            effective sample size (ESS): 430.5
                
            target:
                 
	        balance Sample object
	        4551 observations x 2 variables: age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: happiness
	        
            2 common variables: age_group,income
            
INFO (2026-03-12 17:28:01,063) [cli/process_batch (line 784)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-03-12 17:28:01,063) [cli/process_batch (line 788)]: Names of columns to keep for diagnostics: None 
INFO (2026-03-12 17:28:01,063) [sample_class/diagnostics (line 1826)]: Starting computation of diagnostics of the fitting
INFO (2026-03-12 17:28:01,219) [sample_class/diagnostics (line 2069)]: Done computing diagnostics
INFO (2026-03-12 17:28:01,224) [cli/process_batch (line 799)]: balance diagnostics object:                          metric          val            var
0                          size   644.000000     sample_obs
1                          size     2.000000  sample_covars
2                          size  4551.000000     target_obs
3                          size     2.000000  target_covars
4           weights_diagnostics     1.495967  design_effect
..                          ...          ...            ...
86  covar_main_asmd_improvement     0.235830         income
87     covar_main_asmd_adjusted     0.192214     mean(asmd)
88   covar_main_asmd_unadjusted     0.430017     mean(asmd)
89  covar_main_asmd_improvement     0.237804     mean(asmd)
90           adjustment_failure     0.000000            NaN

[91 rows x 3 columns]
INFO (2026-03-12 17:28:01,226) [cli/main (line 1158)]: Done processing batch ('Male',)
INFO (2026-03-12 17:28:01,227) [cli/main (line 1184)]: Done fitting the model, writing output
balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Output rows: 912
Out[11]:
id age_group income happiness weight gender is_respondent
0 1 18-24 9.940280 66.885485 10.379924 Female 1.0
1 92 35-44 0.185097 84.464522 18.176360 Female 1.0
2 94 35-44 1.183696 65.742184 20.852704 Female 1.0
3 95 18-24 3.716007 67.624539 10.522336 Female 1.0
4 98 35-44 16.751931 44.868651 40.377325 Female 1.0
In [12]:
# Inspect weights by gender - each group was adjusted independently
print("Weight statistics by gender (sample only):")
sample_only = batch_adjusted_df[batch_adjusted_df["is_respondent"] == 1]
print(sample_only.groupby("gender")["weight"].describe().round(3))
Weight statistics by gender (sample only):
        count    mean     std    min    25%     50%     75%     max
gender                                                             
Female  268.0  16.981  11.905  6.785  9.566  13.702  19.158  85.648
Male    644.0   7.067   4.981  2.913  3.260   5.776   9.235  35.370

Alternative Weighting Methods¶

The CLI supports three adjustment methods:

  • IPW (Inverse Probability Weighting): The default method, uses logistic regression to estimate propensity scores
  • CBPS (Covariate Balancing Propensity Score): Balances covariates while estimating propensity scores
  • Rake (Raking/Iterative Proportional Fitting): Adjusts weights iteratively to match marginal distributions

Example: CBPS Method¶

CBPS simultaneously optimizes covariate balance and propensity score estimation:

In [13]:
with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input.csv")
    output_path = os.path.join(tmpdir, "weights_cbps.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_cbps.csv")

    load_data_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file", input_path,
        "--output_file", output_path,
        "--diagnostics_output_file", diagnostics_path,
        "--covariate_columns", "gender,age_group,income",
        "--method", "cbps",
    ]

    print("CLI command with CBPS method:")
    print(" ".join(cmd))
    subprocess.check_call(cmd)

    cbps_diagnostics_df = pd.read_csv(diagnostics_path)

# Verify the method used
print("\nAdjustment method used:")
print(cbps_diagnostics_df.query("metric == 'adjustment_method'")[["var", "val"]])
CLI command with CBPS method:
python -m balance.cli --input_file /tmp/tmp4k781k_q/input.csv --output_file /tmp/tmp4k781k_q/weights_cbps.csv --diagnostics_output_file /tmp/tmp4k781k_q/diagnostics_cbps.csv --covariate_columns gender,age_group,income --method cbps
INFO (2026-03-12 17:28:03,733) [__init__/<module> (line 72)]: Using balance version 0.16.1
INFO (2026-03-12 17:28:03,735) [cli/main (line 1095)]: Running cli.main() using balance version 0.16.1
INFO (2026-03-12 17:28:03,735) [cli/main (line 1130)]: Attributes used by main() for running adjust: {'transformations': 'default', 'formula': None, 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 1.5, 'lambda_min': 1e-05, 'lambda_max': 10, 'num_lambdas': 250, 'weight_trimming_mean_ratio': 20.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.16.1'}
INFO (2026-03-12 17:28:03,745) [cli/load_and_check_input (line 926)]: Number of rows in input file: 11000
INFO (2026-03-12 17:28:03,745) [cli/load_and_check_input (line 932)]: Number of columns in input file: 7
WARNING (2026-03-12 17:28:03,897) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:28:03,908) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:28:03,908) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:28:03,909) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
id               int64
is_respondent    int64
dtype: object
WARNING (2026-03-12 17:28:03,909) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:28:03,910) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
id                   str
is_respondent    float64
dtype: object
INFO (2026-03-12 17:28:03,911) [cli/process_batch (line 747)]: balance sample object: 
        balance Sample object
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
WARNING (2026-03-12 17:28:03,918) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:28:03,932) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:28:03,932) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:28:03,932) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
id               int64
is_respondent    int64
dtype: object
WARNING (2026-03-12 17:28:03,932) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:28:03,933) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
id                   str
is_respondent    float64
dtype: object
INFO (2026-03-12 17:28:03,936) [cli/process_batch (line 758)]: balance target object: 
        balance Sample object
        10000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
INFO (2026-03-12 17:28:03,941) [cbps/cbps (line 537)]: Starting cbps function
INFO (2026-03-12 17:28:03,942) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-03-12 17:28:03,942) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group', 'income']
INFO (2026-03-12 17:28:03,949) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group', 'income']
INFO (2026-03-12 17:28:04,064) [cbps/cbps (line 588)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']
INFO (2026-03-12 17:28:04,065) [cbps/cbps (line 599)]: The number of columns in the model matrix: 16
INFO (2026-03-12 17:28:04,065) [cbps/cbps (line 600)]: The number of rows in the model matrix: 11000
INFO (2026-03-12 17:28:04,074) [cbps/cbps (line 669)]: Finding initial estimator for GMM optimization
INFO (2026-03-12 17:28:04,254) [cbps/cbps (line 696)]: Finding initial estimator for GMM optimization that minimizes the balance loss
INFO (2026-03-12 17:28:05,696) [cbps/cbps (line 732)]: Running GMM optimization
INFO (2026-03-12 17:28:08,514) [cbps/cbps (line 859)]: Done cbps function
INFO (2026-03-12 17:28:08,517) [cli/process_batch (line 781)]: Succeeded with adjusting sample to target
INFO (2026-03-12 17:28:08,519) [cli/process_batch (line 782)]: balance adjusted object: 
        Adjusted balance Sample object with target set using cbps
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
        adjustment details:
            method: cbps
            design effect (Deff): 1.500
            effective sample size proportion (ESSP): 0.667
            effective sample size (ESS): 666.7
                
            target:
                 
	        balance Sample object
	        10000 observations x 3 variables: gender,age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: None
	        
            3 common variables: gender,age_group,income
            
INFO (2026-03-12 17:28:08,519) [cli/process_batch (line 784)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-03-12 17:28:08,519) [cli/process_batch (line 788)]: Names of columns to keep for diagnostics: None 
INFO (2026-03-12 17:28:08,519) [sample_class/diagnostics (line 1826)]: Starting computation of diagnostics of the fitting
INFO (2026-03-12 17:28:08,793) [sample_class/diagnostics (line 2069)]: Done computing diagnostics
INFO (2026-03-12 17:28:08,797) [cli/process_batch (line 799)]: balance diagnostics object:                          metric       val            var
0                          size    1000.0     sample_obs
1                          size       3.0  sample_covars
2                          size   10000.0     target_obs
3                          size       3.0  target_covars
4           weights_diagnostics       1.5  design_effect
..                          ...       ...            ...
86  covar_main_asmd_improvement  0.205323         income
87     covar_main_asmd_adjusted  0.175443     mean(asmd)
88   covar_main_asmd_unadjusted  0.326799     mean(asmd)
89  covar_main_asmd_improvement  0.151355     mean(asmd)
90           adjustment_failure         0            NaN

[91 rows x 3 columns]
INFO (2026-03-12 17:28:08,800) [cli/main (line 1184)]: Done fitting the model, writing output
balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Adjustment method used:
     var  val
28  cbps  0.0

Example: Rake Method¶

Raking iteratively adjusts weights to match target marginal distributions:

In [14]:
with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input.csv")
    output_path = os.path.join(tmpdir, "weights_rake.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_rake.csv")

    load_data_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file", input_path,
        "--output_file", output_path,
        "--diagnostics_output_file", diagnostics_path,
        "--covariate_columns", "gender,age_group,income",
        "--method", "rake",
    ]

    print("CLI command with rake method:")
    print(" ".join(cmd))
    subprocess.check_call(cmd)

    rake_diagnostics_df = pd.read_csv(diagnostics_path)

# Verify the method used
print("\nAdjustment method used:")
print(rake_diagnostics_df.query("metric == 'adjustment_method'")[["var", "val"]])
CLI command with rake method:
python -m balance.cli --input_file /tmp/tmp1ykvcp1o/input.csv --output_file /tmp/tmp1ykvcp1o/weights_rake.csv --diagnostics_output_file /tmp/tmp1ykvcp1o/diagnostics_rake.csv --covariate_columns gender,age_group,income --method rake
INFO (2026-03-12 17:28:11,239) [__init__/<module> (line 72)]: Using balance version 0.16.1
INFO (2026-03-12 17:28:11,241) [cli/main (line 1095)]: Running cli.main() using balance version 0.16.1
INFO (2026-03-12 17:28:11,241) [cli/main (line 1130)]: Attributes used by main() for running adjust: {'transformations': 'default', 'formula': None, 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 1.5, 'lambda_min': 1e-05, 'lambda_max': 10, 'num_lambdas': 250, 'weight_trimming_mean_ratio': 20.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.16.1'}
INFO (2026-03-12 17:28:11,251) [cli/load_and_check_input (line 926)]: Number of rows in input file: 11000
INFO (2026-03-12 17:28:11,251) [cli/load_and_check_input (line 932)]: Number of columns in input file: 7
WARNING (2026-03-12 17:28:11,406) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:28:11,417) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:28:11,417) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:28:11,418) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:28:11,418) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:28:11,419) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:28:11,420) [cli/process_batch (line 747)]: balance sample object: 
        balance Sample object
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
WARNING (2026-03-12 17:28:11,428) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:28:11,441) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:28:11,441) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:28:11,442) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:28:11,442) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:28:11,443) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:28:11,445) [cli/process_batch (line 758)]: balance target object: 
        balance Sample object
        10000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
INFO (2026-03-12 17:28:11,452) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-03-12 17:28:11,452) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group', 'income']
INFO (2026-03-12 17:28:11,459) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group', 'income']
INFO (2026-03-12 17:28:11,501) [rake/rake (line 274)]: Final covariates and levels that will be used in raking: {'age_group': ['18-24', '25-34', '35-44', '45+'], 'gender': ['Female', 'Male', '__NaN__'], 'income': ['(-0.0009997440000000001, 0.44]', '(0.44, 1.664]', '(1.664, 3.472]', '(11.312, 15.139]', '(15.139, 20.567]', '(20.567, 29.504]', '(29.504, 128.536]', '(3.472, 5.663]', '(5.663, 8.211]', '(8.211, 11.312]']}.
INFO (2026-03-12 17:28:11,522) [cli/process_batch (line 781)]: Succeeded with adjusting sample to target
INFO (2026-03-12 17:28:11,524) [cli/process_batch (line 782)]: balance adjusted object: 
        Adjusted balance Sample object with target set using rake
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
        adjustment details:
            method: rake
            design effect (Deff): 3.774
            effective sample size proportion (ESSP): 0.265
            effective sample size (ESS): 265.0
                
            target:
                 
	        balance Sample object
	        10000 observations x 3 variables: gender,age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: None
	        
            3 common variables: gender,age_group,income
            
INFO (2026-03-12 17:28:11,524) [cli/process_batch (line 784)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-03-12 17:28:11,524) [cli/process_batch (line 788)]: Names of columns to keep for diagnostics: None 
INFO (2026-03-12 17:28:11,524) [sample_class/diagnostics (line 1826)]: Starting computation of diagnostics of the fitting
INFO (2026-03-12 17:28:11,794) [sample_class/diagnostics (line 2069)]: Done computing diagnostics
INFO (2026-03-12 17:28:11,798) [cli/process_batch (line 799)]: balance diagnostics object:                          metric           val            var
0                          size   1000.000000     sample_obs
1                          size      3.000000  sample_covars
2                          size  10000.000000     target_obs
3                          size      3.000000  target_covars
4           weights_diagnostics      3.773786  design_effect
..                          ...           ...            ...
61  covar_main_asmd_improvement      0.462436         income
62     covar_main_asmd_adjusted      0.014651     mean(asmd)
63   covar_main_asmd_unadjusted      0.326799     mean(asmd)
64  covar_main_asmd_improvement      0.312147     mean(asmd)
65           adjustment_failure      0.000000            NaN

[66 rows x 3 columns]
INFO (2026-03-12 17:28:11,800) [cli/main (line 1184)]: Done fitting the model, writing output
balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Adjustment method used:
     var  val
28  rake  0.0

Next steps¶

  • Try --method cbps or --method rake for alternative weighting approaches.
  • Use --outcome_columns to control which columns are treated as outcomes.
  • Supply --ipw_logistic_regression_kwargs to tune the IPW model.
  • Use --succeed_on_weighting_failure for pipelines where you want null weights instead of errors.
  • Explore --covariate_columns_for_diagnostics and --rows_to_keep_for_diagnostics to customize diagnostic output.

Session info¶

For reproducibility, here is the session information:

In [15]:
import session_info
session_info.show(html=False, dependencies=True)
-----
balance             0.16.1
pandas              3.0.1
session_info        v1.0.1
-----
81d243bd2c585b0f4821__mypyc NA
PIL                         12.1.1
anyio                       NA
arrow                       1.4.0
asttokens                   NA
attr                        25.4.0
attrs                       25.4.0
babel                       2.18.0
certifi                     2026.02.25
charset_normalizer          3.4.5
comm                        0.2.3
cycler                      0.12.1
cython_runtime              NA
dateutil                    2.9.0.post0
debugpy                     1.8.20
decorator                   5.2.1
defusedxml                  0.7.1
executing                   2.2.1
fastjsonschema              NA
fqdn                        NA
idna                        3.11
ipykernel                   7.2.0
isoduration                 NA
jedi                        0.19.2
jinja2                      3.1.6
joblib                      1.5.3
json5                       0.13.0
jsonpointer                 3.0.0
jsonschema                  4.26.0
jsonschema_specifications   NA
jupyter_events              0.12.0
jupyter_server              2.17.0
jupyterlab_server           2.28.0
kiwisolver                  1.5.0
lark                        1.3.1
markupsafe                  3.0.3
matplotlib                  3.10.8
mpl_toolkits                NA
narwhals                    2.18.0
nbformat                    5.10.4
numpy                       2.4.3
packaging                   26.0
parso                       0.8.6
patsy                       1.0.2
platformdirs                4.9.4
plotly                      6.6.0
prometheus_client           NA
prompt_toolkit              3.0.52
psutil                      7.2.2
pure_eval                   0.2.3
pydev_ipython               NA
pydevconsole                NA
pydevd                      3.2.3
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.19.2
pyparsing                   3.3.2
pythonjsonlogger            NA
referencing                 NA
requests                    2.32.5
rfc3339_validator           0.1.4
rfc3986_validator           0.1.1
rfc3987_syntax              NA
rpds                        NA
scipy                       1.17.1
seaborn                     0.13.2
send2trash                  NA
six                         1.17.0
sklearn                     1.8.0
sphinxcontrib               NA
stack_data                  0.6.3
statsmodels                 0.14.6
threadpoolctl               3.6.0
tornado                     6.5.5
traitlets                   5.14.3
typing_extensions           NA
uri_template                NA
urllib3                     2.6.3
wcwidth                     0.6.0
webcolors                   NA
websocket                   1.9.0
yaml                        6.0.3
zmq                         27.1.0
zoneinfo                    NA
-----
IPython             9.11.0
jupyter_client      8.8.0
jupyter_core        5.9.1
jupyterlab          4.5.6
notebook            7.5.5
-----
Python 3.12.12 (main, Oct 10 2025, 01:01:16) [GCC 13.3.0]
Linux-6.14.0-1017-azure-x86_64-with-glibc2.39
-----
Session information updated at 2026-03-12 17:28