CLI tutorial¶

This tutorial walks through using the balance command-line interface (CLI) to adjust a sample dataset to a target. We will build a small demo dataset, run the CLI, and inspect the outputs.

The real power of a CLI lies in how seamlessly it integrates into the broader ecosystem of automation and data workflows. A CLI command can be invoked directly from shell scripts, scheduled via cron jobs, embedded in CI/CD pipelines, or orchestrated through tools like Airflow - all with minimal overhead. This composability means you can chain balance operations with other command-line tools using pipes, process batches of files in a loop, or trigger analyses based on events, all while maintaining a clear audit trail since the command itself documents exactly what was run. The non-zero exit codes that CLIs return on failure integrate naturally with automated systems that need to halt pipelines or send alerts when something goes wrong. In short, a CLI transforms balance from something you use interactively into a building block for production-grade, reproducible workflows.

Prerequisites¶

Make sure balance is installed and the balance CLI is on your PATH. You can also run the CLI via python -m balance.cli from a checkout of the repository.

In [1]:
import os
import subprocess
import tempfile

import pandas as pd

from balance import load_data
INFO (2026-01-20 20:37:23,324) [__init__/<module> (line 72)]: Using balance version 0.15.0
balance (Version 0.15.0) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Use the bundled demo data¶

Balance ships with a small demo dataset via load_data(). You can build the CLI input by adding a sample indicator and weight columns, then concatenate sample and target frames.

In [2]:
target_df, sample_df = load_data()

sample_df = sample_df.copy()
target_df = target_df.copy()
sample_df["is_respondent"] = 1
target_df["is_respondent"] = 0
sample_df["weight"] = 1.0
target_df["weight"] = 1.0

load_data_input_df = pd.concat([sample_df, target_df], ignore_index=True)
load_data_input_df.head()
Out[2]:
id gender age_group income happiness is_respondent weight
0 0 Male 25-34 6.428659 26.043029 1 1.0
1 1 Female 18-24 9.940280 66.885485 1 1.0
2 2 Male 18-24 2.673623 37.091922 1 1.0
3 3 NaN 18-24 10.550308 49.394050 1 1.0
4 4 NaN 18-24 2.689994 72.304208 1 1.0

Run the CLI¶

We'll write the input dataset to disk, then call the CLI to compute weights and diagnostics.

In [3]:
with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input.csv")
    output_path = os.path.join(tmpdir, "weights_out.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_out.csv")

    load_data_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file",
        input_path,
        "--output_file",
        output_path,
        "--diagnostics_output_file",
        diagnostics_path,
        "--covariate_columns",
        "gender,age_group,income",
        "--method",
        "ipw",
    ]

    print("CLI command:", " ".join(cmd))
    subprocess.check_call(cmd)

    load_data_adjusted_df = pd.read_csv(output_path)
    load_data_diagnostics_df = pd.read_csv(diagnostics_path)

load_data_adjusted_df.head()
CLI command: python -m balance.cli --input_file /tmp/tmp6mblcj1g/input.csv --output_file /tmp/tmp6mblcj1g/weights_out.csv --diagnostics_output_file /tmp/tmp6mblcj1g/diagnostics_out.csv --covariate_columns gender,age_group,income --method ipw
INFO (2026-01-20 20:37:24,837) [__init__/<module> (line 72)]: Using balance version 0.15.0
INFO (2026-01-20 20:37:24,839) [cli/main (line 1039)]: Running cli.main() using balance version 0.15.0
INFO (2026-01-20 20:37:24,839) [cli/main (line 1074)]: Attributes used by main() for running adjust: {'transformations': 'default', 'formula': None, 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 1.5, 'lambda_min': 1e-05, 'lambda_max': 10, 'num_lambdas': 250, 'weight_trimming_mean_ratio': 20.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.15.0'}
INFO (2026-01-20 20:37:24,849) [cli/load_and_check_input (line 869)]: Number of rows in input file: 11000
INFO (2026-01-20 20:37:24,849) [cli/load_and_check_input (line 875)]: Number of columns in input file: 7
WARNING (2026-01-20 20:37:24,998) [sample_class/from_frame (line 457)]: Casting id column to string
WARNING (2026-01-20 20:37:25,007) [pandas_utils/_warn_of_df_dtypes_change (line 492)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-01-20 20:37:25,007) [pandas_utils/_warn_of_df_dtypes_change (line 503)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-01-20 20:37:25,008) [pandas_utils/_warn_of_df_dtypes_change (line 506)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-01-20 20:37:25,008) [pandas_utils/_warn_of_df_dtypes_change (line 507)]: The (new) dtypes saved in df (after the change):
WARNING (2026-01-20 20:37:25,009) [pandas_utils/_warn_of_df_dtypes_change (line 508)]: 
is_respondent    float64
id                object
dtype: object
INFO (2026-01-20 20:37:25,010) [cli/process_batch (line 691)]: balance sample object: 
        balance Sample object
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness,is_respondent
        
WARNING (2026-01-20 20:37:25,012) [sample_class/from_frame (line 457)]: Casting id column to string
WARNING (2026-01-20 20:37:25,026) [pandas_utils/_warn_of_df_dtypes_change (line 492)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-01-20 20:37:25,026) [pandas_utils/_warn_of_df_dtypes_change (line 503)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-01-20 20:37:25,027) [pandas_utils/_warn_of_df_dtypes_change (line 506)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-01-20 20:37:25,027) [pandas_utils/_warn_of_df_dtypes_change (line 507)]: The (new) dtypes saved in df (after the change):
WARNING (2026-01-20 20:37:25,027) [pandas_utils/_warn_of_df_dtypes_change (line 508)]: 
is_respondent    float64
id                object
dtype: object
INFO (2026-01-20 20:37:25,029) [cli/process_batch (line 702)]: balance target object: 
        balance Sample object
        10000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness,is_respondent
        
INFO (2026-01-20 20:37:25,034) [ipw/ipw (line 694)]: Starting ipw function
INFO (2026-01-20 20:37:25,035) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-01-20 20:37:25,035) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group', 'income']
INFO (2026-01-20 20:37:25,044) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group', 'income']
INFO (2026-01-20 20:37:25,051) [ipw/ipw (line 728)]: Building model matrix
INFO (2026-01-20 20:37:25,148) [ipw/ipw (line 750)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']
INFO (2026-01-20 20:37:25,148) [ipw/ipw (line 753)]: The number of columns in the model matrix: 18
INFO (2026-01-20 20:37:25,148) [ipw/ipw (line 754)]: The number of rows in the model matrix: 11000
INFO (2026-01-20 20:37:26,643) [ipw/ipw (line 915)]: Done with sklearn
INFO (2026-01-20 20:37:26,643) [ipw/ipw (line 917)]: max_de: 1.5
INFO (2026-01-20 20:37:26,643) [ipw/choose_regularization (line 371)]: Starting choosing regularisation parameters
INFO (2026-01-20 20:37:34,450) [ipw/choose_regularization (line 457)]: Best regularisation: 
           s  s_index  trim  design_effect  asmd_improvement      asmd
9  0.064155       91   2.5       1.495568          0.535793  0.090706
INFO (2026-01-20 20:37:34,451) [ipw/ipw (line 972)]: Chosen lambda: 0.06415476458273757
INFO (2026-01-20 20:37:34,452) [ipw/ipw (line 990)]: Proportion null deviance explained 0.17451470039667905
INFO (2026-01-20 20:37:34,454) [cli/process_batch (line 725)]: Succeeded with adjusting sample to target
INFO (2026-01-20 20:37:34,456) [cli/process_batch (line 726)]: balance adjusted object: 
        Adjusted balance Sample object with target set using ipw
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness,is_respondent
        
        adjustment details:
            method: ipw
            weight trimming mean ratio: 2.5
            design effect (Deff): 1.496
            effective sample size proportion (ESSP): 0.669
            effective sample size (ESS): 668.6
                
            target:
                 
	        balance Sample object
	        10000 observations x 3 variables: gender,age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: happiness,is_respondent
	        
            3 common variables: gender,age_group,income
            
INFO (2026-01-20 20:37:34,456) [cli/process_batch (line 728)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-01-20 20:37:34,456) [cli/process_batch (line 732)]: Names of columns to keep for diagnostics: None 
INFO (2026-01-20 20:37:34,456) [sample_class/diagnostics (line 1792)]: Starting computation of diagnostics of the fitting
INFO (2026-01-20 20:37:34,700) [sample_class/diagnostics (line 2013)]: Done computing diagnostics
INFO (2026-01-20 20:37:34,704) [cli/process_batch (line 741)]: balance diagnostics object:                          metric           val            var
0                          size   1000.000000     sample_obs
1                          size      3.000000  sample_covars
2                          size  10000.000000     target_obs
3                          size      3.000000  target_covars
4           weights_diagnostics      1.495568  design_effect
..                          ...           ...            ...
91  covar_main_asmd_improvement      0.182901         income
92     covar_main_asmd_adjusted      0.173284     mean(asmd)
93   covar_main_asmd_unadjusted      0.326799     mean(asmd)
94  covar_main_asmd_improvement      0.153514     mean(asmd)
95           adjustment_failure      0.000000            NaN

[96 rows x 3 columns]
INFO (2026-01-20 20:37:34,706) [cli/main (line 1128)]: Done fitting the model, writing output
balance (Version 0.15.0) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Out[3]:
id gender age_group income happiness is_respondent weight
0 0 Male 25-34 6.428659 26.043029 1.0 7.602327
1 1 Female 18-24 9.940280 66.885485 1.0 9.398526
2 2 Male 18-24 2.673623 37.091922 1.0 3.433026
3 3 NaN 18-24 10.550308 49.394050 1.0 6.491044
4 4 NaN 18-24 2.689994 72.304208 1.0 4.886635

Inspect diagnostics¶

The diagnostics output is a flat table that includes adjustment metadata and balance metrics. The metric column identifies the type of diagnostic, while var indicates the variable (or NaN for overall summaries). It is most useful to inspect var in the context of the metric it belongs to. The cells below use the diagnostics from the previous CLI run (load_data_diagnostics_df).

In [4]:
(
    load_data_diagnostics_df.groupby("metric")["var"]
    .apply(lambda col: sorted(col.dropna().unique()))
    .sort_index()
)
Out[4]:
metric
adjustment_failure                                                            []
adjustment_method                                                          [ipw]
covar_asmd_adjusted            [age_group[T.25-34], age_group[T.35-44], age_g...
covar_asmd_improvement         [age_group[T.25-34], age_group[T.35-44], age_g...
covar_asmd_unadjusted          [age_group[T.25-34], age_group[T.35-44], age_g...
covar_main_asmd_adjusted                 [age_group, gender, income, mean(asmd)]
covar_main_asmd_improvement              [age_group, gender, income, mean(asmd)]
covar_main_asmd_unadjusted               [age_group, gender, income, mean(asmd)]
ipw_model_glance                                           [intercept_, n_iter_]
ipw_multi_class                                                           [auto]
ipw_penalty                                                                 [l2]
ipw_solver                                                               [lbfgs]
model_coef                     [C(_is_na_gender, one_hot_encoding_greater_2)[...
model_glance                   [deviance, l1_ratio, lambda, null_deviance, pr...
size                           [sample_covars, sample_obs, target_covars, tar...
weights_diagnostics            [describe_25%, describe_50%, describe_75%, des...
Name: var, dtype: object
In [5]:
load_data_diagnostics_df.query("metric == 'adjustment_method'")
Out[5]:
metric val var
28 adjustment_method 0.0 ipw

CLI Help and Arguments¶

You can view all available CLI arguments using --help. Because the full output is long, the snippet below prints the first section only.

In [6]:
# Print a shorter CLI help snippet
help_output = subprocess.run(
    ["python", "-m", "balance.cli", "--help"],
    check=False,
    capture_output=True,
    text=True,
).stdout
print("\n".join(help_output.splitlines()[:40]))
balance (Version 0.15.0) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

usage: cli.py [-h] --input_file INPUT_FILE --output_file OUTPUT_FILE
              [--diagnostics_output_file DIAGNOSTICS_OUTPUT_FILE]
              [--method METHOD] [--sample_column SAMPLE_COLUMN]
              [--id_column ID_COLUMN] [--weight_column WEIGHT_COLUMN]
              --covariate_columns COVARIATE_COLUMNS
              [--outcome_columns OUTCOME_COLUMNS]
              [--covariate_columns_for_diagnostics COVARIATE_COLUMNS_FOR_DIAGNOSTICS]
              [--rows_to_keep_for_diagnostics ROWS_TO_KEEP_FOR_DIAGNOSTICS]
              [--batch_columns BATCH_COLUMNS] [--keep_columns KEEP_COLUMNS]
              [--keep_row_column KEEP_ROW_COLUMN]
              [--sep_input_file SEP_INPUT_FILE]
              [--sep_output_file SEP_OUTPUT_FILE]
              [--sep_diagnostics_output_file SEP_DIAGNOSTICS_OUTPUT_FILE]
              [--no_output_header] [--succeed_on_weighting_failure]
              [--max_de MAX_DE] [--lambda_min LAMBDA_MIN]
              [--lambda_max LAMBDA_MAX] [--num_lambdas NUM_LAMBDAS]
              [--ipw_logistic_regression_kwargs IPW_LOGISTIC_REGRESSION_KWARGS]
              [--weight_trimming_mean_ratio WEIGHT_TRIMMING_MEAN_RATIO]
              [--one_hot_encoding ONE_HOT_ENCODING]
              [--transformations TRANSFORMATIONS] [--formula FORMULA]
              [--return_df_with_original_dtypes]
              [--standardize_types STANDARDIZE_TYPES]

optional arguments:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE
                        Path to input sample/target
  --output_file OUTPUT_FILE
                        Path to write output weights

Key CLI Arguments Summary¶

Here are the most commonly used arguments:

Argument Default Description
--method ipw Adjustment method: ipw, cbps, or rake
--max_de 1.5 Maximum design effect. Set to None to use lambda_1se instead
--lambda_min 1e-05 Lower bound for L1 penalty (IPW only)
--lambda_max 10 Upper bound for L1 penalty (IPW only)
--num_lambdas 250 Number of lambda values to search (IPW only)
--weight_trimming_mean_ratio 20.0 Trim weights above mean(weights) * ratio
--transformations default Covariate transformations. Use None to disable
--formula None Custom model formula (e.g., "gender + income")
--one_hot_encoding True One-hot encode categorical features
--batch_columns None Columns to group by for batch processing
--keep_columns None Subset of columns to include in output
--outcome_columns None Columns treated as outcomes (not covariates)
--ipw_logistic_regression_kwargs None JSON string of kwargs for sklearn LogisticRegression
--succeed_on_weighting_failure False Return null weights instead of failing on errors

Example: Tuning IPW parameters¶

Below we run the CLI with custom regularization settings and a custom logistic regression solver:

In [7]:
with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input.csv")
    output_path = os.path.join(tmpdir, "weights_tuned.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_tuned.csv")

    load_data_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file", input_path,
        "--output_file", output_path,
        "--diagnostics_output_file", diagnostics_path,
        "--covariate_columns", "gender,age_group,income",
        "--method", "ipw",
        # Tuning parameters
        "--max_de", "2.0",
        "--lambda_min", "1e-06",
        "--lambda_max", "100",
        "--num_lambdas", "500",
        "--weight_trimming_mean_ratio", "10.0",
        # Custom logistic regression settings
        "--ipw_logistic_regression_kwargs", '{"solver": "liblinear", "max_iter": 500}',
    ]

    print("CLI command:")
    print(" ".join(cmd))
    subprocess.check_call(cmd)

    tuned_adjusted_df = pd.read_csv(output_path)

tuned_adjusted_df.head()
CLI command:
python -m balance.cli --input_file /tmp/tmpe0eif4oj/input.csv --output_file /tmp/tmpe0eif4oj/weights_tuned.csv --diagnostics_output_file /tmp/tmpe0eif4oj/diagnostics_tuned.csv --covariate_columns gender,age_group,income --method ipw --max_de 2.0 --lambda_min 1e-06 --lambda_max 100 --num_lambdas 500 --weight_trimming_mean_ratio 10.0 --ipw_logistic_regression_kwargs {"solver": "liblinear", "max_iter": 500}
INFO (2026-01-20 20:37:38,128) [__init__/<module> (line 72)]: Using balance version 0.15.0
INFO (2026-01-20 20:37:38,129) [cli/main (line 1039)]: Running cli.main() using balance version 0.15.0
INFO (2026-01-20 20:37:38,130) [cli/main (line 1074)]: Attributes used by main() for running adjust: {'transformations': 'default', 'formula': None, 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 2.0, 'lambda_min': 1e-06, 'lambda_max': 100.0, 'num_lambdas': 500, 'weight_trimming_mean_ratio': 10.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.15.0'}
INFO (2026-01-20 20:37:38,140) [cli/load_and_check_input (line 869)]: Number of rows in input file: 11000
INFO (2026-01-20 20:37:38,140) [cli/load_and_check_input (line 875)]: Number of columns in input file: 7
WARNING (2026-01-20 20:37:38,288) [sample_class/from_frame (line 457)]: Casting id column to string
WARNING (2026-01-20 20:37:38,296) [pandas_utils/_warn_of_df_dtypes_change (line 492)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-01-20 20:37:38,297) [pandas_utils/_warn_of_df_dtypes_change (line 503)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-01-20 20:37:38,297) [pandas_utils/_warn_of_df_dtypes_change (line 506)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-01-20 20:37:38,298) [pandas_utils/_warn_of_df_dtypes_change (line 507)]: The (new) dtypes saved in df (after the change):
WARNING (2026-01-20 20:37:38,298) [pandas_utils/_warn_of_df_dtypes_change (line 508)]: 
is_respondent    float64
id                object
dtype: object
INFO (2026-01-20 20:37:38,300) [cli/process_batch (line 691)]: balance sample object: 
        balance Sample object
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness,is_respondent
        
WARNING (2026-01-20 20:37:38,302) [sample_class/from_frame (line 457)]: Casting id column to string
WARNING (2026-01-20 20:37:38,316) [pandas_utils/_warn_of_df_dtypes_change (line 492)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-01-20 20:37:38,316) [pandas_utils/_warn_of_df_dtypes_change (line 503)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-01-20 20:37:38,317) [pandas_utils/_warn_of_df_dtypes_change (line 506)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-01-20 20:37:38,317) [pandas_utils/_warn_of_df_dtypes_change (line 507)]: The (new) dtypes saved in df (after the change):
WARNING (2026-01-20 20:37:38,317) [pandas_utils/_warn_of_df_dtypes_change (line 508)]: 
is_respondent    float64
id                object
dtype: object
INFO (2026-01-20 20:37:38,319) [cli/process_batch (line 702)]: balance target object: 
        balance Sample object
        10000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness,is_respondent
        
INFO (2026-01-20 20:37:38,324) [ipw/ipw (line 694)]: Starting ipw function
INFO (2026-01-20 20:37:38,325) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-01-20 20:37:38,325) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group', 'income']
INFO (2026-01-20 20:37:38,335) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group', 'income']
INFO (2026-01-20 20:37:38,342) [ipw/ipw (line 728)]: Building model matrix
INFO (2026-01-20 20:37:38,438) [ipw/ipw (line 750)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']
INFO (2026-01-20 20:37:38,438) [ipw/ipw (line 753)]: The number of columns in the model matrix: 18
INFO (2026-01-20 20:37:38,438) [ipw/ipw (line 754)]: The number of rows in the model matrix: 11000
INFO (2026-01-20 20:37:38,465) [ipw/ipw (line 915)]: Done with sklearn
INFO (2026-01-20 20:37:38,465) [ipw/ipw (line 917)]: max_de: 2.0
INFO (2026-01-20 20:37:38,465) [ipw/choose_regularization (line 371)]: Starting choosing regularisation parameters
INFO (2026-01-20 20:37:42,541) [ipw/choose_regularization (line 457)]: Best regularisation: 
     s  s_index  trim  design_effect  asmd_improvement      asmd
6 NaN        0   2.5       1.714071          0.634917  0.071337
INFO (2026-01-20 20:37:42,542) [ipw/ipw (line 972)]: Chosen lambda: nan
INFO (2026-01-20 20:37:42,543) [ipw/ipw (line 990)]: Proportion null deviance explained 0.18280833369391158
INFO (2026-01-20 20:37:42,545) [cli/process_batch (line 725)]: Succeeded with adjusting sample to target
INFO (2026-01-20 20:37:42,547) [cli/process_batch (line 726)]: balance adjusted object: 
        Adjusted balance Sample object with target set using ipw
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness,is_respondent
        
        adjustment details:
            method: ipw
            weight trimming mean ratio: 2.5
            design effect (Deff): 1.714
            effective sample size proportion (ESSP): 0.583
            effective sample size (ESS): 583.4
                
            target:
                 
	        balance Sample object
	        10000 observations x 3 variables: gender,age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: happiness,is_respondent
	        
            3 common variables: gender,age_group,income
            
INFO (2026-01-20 20:37:42,547) [cli/process_batch (line 728)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-01-20 20:37:42,547) [cli/process_batch (line 732)]: Names of columns to keep for diagnostics: None 
INFO (2026-01-20 20:37:42,547) [sample_class/diagnostics (line 1792)]: Starting computation of diagnostics of the fitting
INFO (2026-01-20 20:37:42,791) [sample_class/diagnostics (line 2013)]: Done computing diagnostics
INFO (2026-01-20 20:37:42,795) [cli/process_batch (line 741)]: balance diagnostics object:                          metric           val            var
0                          size   1000.000000     sample_obs
1                          size      3.000000  sample_covars
2                          size  10000.000000     target_obs
3                          size      3.000000  target_covars
4           weights_diagnostics      1.714071  design_effect
..                          ...           ...            ...
91  covar_main_asmd_improvement      0.225463         income
92     covar_main_asmd_adjusted      0.143344     mean(asmd)
93   covar_main_asmd_unadjusted      0.326799     mean(asmd)
94  covar_main_asmd_improvement      0.183455     mean(asmd)
95           adjustment_failure      0.000000            NaN

[96 rows x 3 columns]
INFO (2026-01-20 20:37:42,797) [cli/main (line 1128)]: Done fitting the model, writing output
balance (Version 0.15.0) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Out[7]:
id gender age_group income happiness is_respondent weight
0 0 Male 25-34 6.428659 26.043029 1.0 6.714531
1 1 Female 18-24 9.940280 66.885485 1.0 8.721215
2 2 Male 18-24 2.673623 37.091922 1.0 2.537674
3 3 NaN 18-24 10.550308 49.394050 1.0 5.587013
4 4 NaN 18-24 2.689994 72.304208 1.0 3.883128

Example: Using a Custom Formula¶

The --formula argument allows you to specify a custom model formula, including interaction terms. When using --formula, you should typically also set --transformations=None to prevent automatic transformations from interfering with your custom formula.

The formula uses patsy/R-style syntax:

  • gender + income: additive terms (no interaction)
  • gender * income: equivalent to gender + income + gender:income (main effects + interaction)
  • gender:income: only the interaction term
In [8]:
with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input.csv")
    output_path = os.path.join(tmpdir, "weights_formula.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_formula.csv")

    # Use the demo data for the formula example
    load_data_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file", input_path,
        "--output_file", output_path,
        "--diagnostics_output_file", diagnostics_path,
        "--covariate_columns", "gender,age_group,income",
        "--method", "ipw",
        # Disable transformations to use raw covariates in formula
        "--transformations", "None",
        # Use a formula with interaction term
        "--formula", "gender*income",
    ]

    print("CLI command with custom formula:")
    print(" ".join(cmd))
    subprocess.check_call(cmd)

    formula_diagnostics_df = pd.read_csv(diagnostics_path)

# Check model coefficients to verify formula was applied
print("\nModel coefficients (showing interaction term):")
print(formula_diagnostics_df.query("metric == 'model_coef'")[["var", "val"]])
CLI command with custom formula:
python -m balance.cli --input_file /tmp/tmpw3y1r_ra/input.csv --output_file /tmp/tmpw3y1r_ra/weights_formula.csv --diagnostics_output_file /tmp/tmpw3y1r_ra/diagnostics_formula.csv --covariate_columns gender,age_group,income --method ipw --transformations None --formula gender*income
INFO (2026-01-20 20:37:44,511) [__init__/<module> (line 72)]: Using balance version 0.15.0
INFO (2026-01-20 20:37:44,512) [cli/main (line 1039)]: Running cli.main() using balance version 0.15.0
INFO (2026-01-20 20:37:44,513) [cli/main (line 1074)]: Attributes used by main() for running adjust: {'transformations': None, 'formula': 'gender*income', 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 1.5, 'lambda_min': 1e-05, 'lambda_max': 10, 'num_lambdas': 250, 'weight_trimming_mean_ratio': 20.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.15.0'}
INFO (2026-01-20 20:37:44,523) [cli/load_and_check_input (line 869)]: Number of rows in input file: 11000
INFO (2026-01-20 20:37:44,523) [cli/load_and_check_input (line 875)]: Number of columns in input file: 7
WARNING (2026-01-20 20:37:44,672) [sample_class/from_frame (line 457)]: Casting id column to string
WARNING (2026-01-20 20:37:44,681) [pandas_utils/_warn_of_df_dtypes_change (line 492)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-01-20 20:37:44,681) [pandas_utils/_warn_of_df_dtypes_change (line 503)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-01-20 20:37:44,682) [pandas_utils/_warn_of_df_dtypes_change (line 506)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-01-20 20:37:44,682) [pandas_utils/_warn_of_df_dtypes_change (line 507)]: The (new) dtypes saved in df (after the change):
WARNING (2026-01-20 20:37:44,683) [pandas_utils/_warn_of_df_dtypes_change (line 508)]: 
is_respondent    float64
id                object
dtype: object
INFO (2026-01-20 20:37:44,684) [cli/process_batch (line 691)]: balance sample object: 
        balance Sample object
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness,is_respondent
        
WARNING (2026-01-20 20:37:44,686) [sample_class/from_frame (line 457)]: Casting id column to string
WARNING (2026-01-20 20:37:44,700) [pandas_utils/_warn_of_df_dtypes_change (line 492)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-01-20 20:37:44,700) [pandas_utils/_warn_of_df_dtypes_change (line 503)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-01-20 20:37:44,701) [pandas_utils/_warn_of_df_dtypes_change (line 506)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-01-20 20:37:44,701) [pandas_utils/_warn_of_df_dtypes_change (line 507)]: The (new) dtypes saved in df (after the change):
WARNING (2026-01-20 20:37:44,702) [pandas_utils/_warn_of_df_dtypes_change (line 508)]: 
is_respondent    float64
id                object
dtype: object
INFO (2026-01-20 20:37:44,704) [cli/process_batch (line 702)]: balance target object: 
        balance Sample object
        10000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness,is_respondent
        
INFO (2026-01-20 20:37:44,708) [ipw/ipw (line 694)]: Starting ipw function
INFO (2026-01-20 20:37:44,709) [ipw/ipw (line 728)]: Building model matrix
INFO (2026-01-20 20:37:44,751) [ipw/ipw (line 750)]: The formula used to build the model matrix: ['gender*income']
INFO (2026-01-20 20:37:44,752) [ipw/ipw (line 753)]: The number of columns in the model matrix: 7
INFO (2026-01-20 20:37:44,752) [ipw/ipw (line 754)]: The number of rows in the model matrix: 11000
INFO (2026-01-20 20:37:46,079) [ipw/ipw (line 915)]: Done with sklearn
INFO (2026-01-20 20:37:46,079) [ipw/ipw (line 917)]: max_de: 1.5
INFO (2026-01-20 20:37:46,079) [ipw/choose_regularization (line 371)]: Starting choosing regularisation parameters
INFO (2026-01-20 20:37:52,076) [ipw/choose_regularization (line 457)]: Best regularisation: 
           s  s_index  trim  design_effect  asmd_improvement      asmd
9  0.043507       98   5.0       1.496216          0.517269  0.157756
INFO (2026-01-20 20:37:52,078) [ipw/ipw (line 972)]: Chosen lambda: 0.043506507030756265
INFO (2026-01-20 20:37:52,078) [ipw/ipw (line 990)]: Proportion null deviance explained 0.09595811553953071
WARNING (2026-01-20 20:37:52,078) [ipw/ipw (line 998)]: The propensity model has low fraction null deviance explained (0.09595811553953071). Results may not be accurate
INFO (2026-01-20 20:37:52,081) [cli/process_batch (line 725)]: Succeeded with adjusting sample to target
INFO (2026-01-20 20:37:52,083) [cli/process_batch (line 726)]: balance adjusted object: 
        Adjusted balance Sample object with target set using ipw
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness,is_respondent
        
        adjustment details:
            method: ipw
            weight trimming mean ratio: 5.0
            design effect (Deff): 1.496
            effective sample size proportion (ESSP): 0.668
            effective sample size (ESS): 668.4
                
            target:
                 
	        balance Sample object
	        10000 observations x 3 variables: gender,age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: happiness,is_respondent
	        
            3 common variables: gender,age_group,income
            
INFO (2026-01-20 20:37:52,083) [cli/process_batch (line 728)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-01-20 20:37:52,083) [cli/process_batch (line 732)]: Names of columns to keep for diagnostics: None 
INFO (2026-01-20 20:37:52,083) [sample_class/diagnostics (line 1792)]: Starting computation of diagnostics of the fitting
INFO (2026-01-20 20:37:52,329) [sample_class/diagnostics (line 2013)]: Done computing diagnostics
INFO (2026-01-20 20:37:52,333) [cli/process_batch (line 741)]: balance diagnostics object:                          metric           val            var
0                          size   1000.000000     sample_obs
1                          size      3.000000  sample_covars
2                          size  10000.000000     target_obs
3                          size      3.000000  target_covars
4           weights_diagnostics      1.496216  design_effect
..                          ...           ...            ...
80  covar_main_asmd_improvement      0.301914         income
81     covar_main_asmd_adjusted      0.157756     mean(asmd)
82   covar_main_asmd_unadjusted      0.326799     mean(asmd)
83  covar_main_asmd_improvement      0.169043     mean(asmd)
84           adjustment_failure      0.000000            NaN

[85 rows x 3 columns]
INFO (2026-01-20 20:37:52,335) [cli/main (line 1128)]: Done fitting the model, writing output
balance (Version 0.15.0) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Model coefficients (showing interaction term):
                                                  var       val
40                                          intercept  0.458758
41      C(gender, one_hot_encoding_greater_2)[Female] -0.189079
42  C(gender, one_hot_encoding_greater_2)[Female]:... -0.225013
43        C(gender, one_hot_encoding_greater_2)[Male]  0.178582
44  C(gender, one_hot_encoding_greater_2)[Male]:in... -0.198394
45         C(gender, one_hot_encoding_greater_2)[_NA]  0.007039
46  C(gender, one_hot_encoding_greater_2)[_NA]:income -0.091899
47                                             income -0.372909

Batch Processing Example¶

The --batch_columns argument allows you to run separate adjustments for each unique combination of values in the specified columns. This is useful when you want to compute weights independently for different subgroups (e.g., by gender or region).

In [9]:
# Create a dataset with a batch column for gender
batch_input_df = load_data_input_df.copy()

# The 'gender' column has values like 'Female', 'Male', and possibly NA
# Filter to only rows with non-null gender for this example
batch_input_df = batch_input_df[batch_input_df["gender"].notna()].copy()
print(f"Rows after filtering: {len(batch_input_df)}")
print(f"Gender distribution:\n{batch_input_df['gender'].value_counts()}")
Rows after filtering: 10014
Gender distribution:
gender
Male      5195
Female    4819
Name: count, dtype: int64
In [10]:
with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input_batch.csv")
    output_path = os.path.join(tmpdir, "weights_batch.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_batch.csv")

    batch_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file", input_path,
        "--output_file", output_path,
        "--diagnostics_output_file", diagnostics_path,
        "--covariate_columns", "age_group,income",  # Note: gender is now used as batch column
        "--outcome_columns", "happiness",
        "--batch_columns", "gender",  # Process each gender separately
        "--method", "ipw",
    ]

    print("CLI command with batch processing:")
    print(" ".join(cmd))
    subprocess.check_call(cmd)

    batch_adjusted_df = pd.read_csv(output_path)
    batch_diagnostics_df = pd.read_csv(diagnostics_path)

print(f"\nOutput rows: {len(batch_adjusted_df)}")
batch_adjusted_df.head()
CLI command with batch processing:
python -m balance.cli --input_file /tmp/tmpzxn5xte8/input_batch.csv --output_file /tmp/tmpzxn5xte8/weights_batch.csv --diagnostics_output_file /tmp/tmpzxn5xte8/diagnostics_batch.csv --covariate_columns age_group,income --outcome_columns happiness --batch_columns gender --method ipw
INFO (2026-01-20 20:37:54,059) [__init__/<module> (line 72)]: Using balance version 0.15.0
INFO (2026-01-20 20:37:54,061) [cli/main (line 1039)]: Running cli.main() using balance version 0.15.0
INFO (2026-01-20 20:37:54,061) [cli/main (line 1074)]: Attributes used by main() for running adjust: {'transformations': 'default', 'formula': None, 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 1.5, 'lambda_min': 1e-05, 'lambda_max': 10, 'num_lambdas': 250, 'weight_trimming_mean_ratio': 20.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.15.0'}
INFO (2026-01-20 20:37:54,071) [cli/load_and_check_input (line 869)]: Number of rows in input file: 10014
INFO (2026-01-20 20:37:54,071) [cli/load_and_check_input (line 875)]: Number of columns in input file: 7
INFO (2026-01-20 20:37:54,072) [cli/main (line 1085)]: Running weighting for batch = ('Female',) 
WARNING (2026-01-20 20:37:54,223) [sample_class/from_frame (line 457)]: Casting id column to string
WARNING (2026-01-20 20:37:54,230) [pandas_utils/_warn_of_df_dtypes_change (line 492)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-01-20 20:37:54,230) [pandas_utils/_warn_of_df_dtypes_change (line 503)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-01-20 20:37:54,231) [pandas_utils/_warn_of_df_dtypes_change (line 506)]: 
id               int64
is_respondent    int64
dtype: object
WARNING (2026-01-20 20:37:54,231) [pandas_utils/_warn_of_df_dtypes_change (line 507)]: The (new) dtypes saved in df (after the change):
WARNING (2026-01-20 20:37:54,232) [pandas_utils/_warn_of_df_dtypes_change (line 508)]: 
id                object
is_respondent    float64
dtype: object
INFO (2026-01-20 20:37:54,233) [cli/process_batch (line 691)]: balance sample object: 
        balance Sample object
        268 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
WARNING (2026-01-20 20:37:54,234) [sample_class/from_frame (line 457)]: Casting id column to string
WARNING (2026-01-20 20:37:54,244) [pandas_utils/_warn_of_df_dtypes_change (line 492)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-01-20 20:37:54,244) [pandas_utils/_warn_of_df_dtypes_change (line 503)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-01-20 20:37:54,245) [pandas_utils/_warn_of_df_dtypes_change (line 506)]: 
id               int64
is_respondent    int64
dtype: object
WARNING (2026-01-20 20:37:54,245) [pandas_utils/_warn_of_df_dtypes_change (line 507)]: The (new) dtypes saved in df (after the change):
WARNING (2026-01-20 20:37:54,246) [pandas_utils/_warn_of_df_dtypes_change (line 508)]: 
id                object
is_respondent    float64
dtype: object
INFO (2026-01-20 20:37:54,247) [cli/process_batch (line 702)]: balance target object: 
        balance Sample object
        4551 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
INFO (2026-01-20 20:37:54,250) [ipw/ipw (line 694)]: Starting ipw function
INFO (2026-01-20 20:37:54,251) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-01-20 20:37:54,251) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['age_group', 'income']
INFO (2026-01-20 20:37:54,256) [adjustment/apply_transformations (line 469)]: Final variables in output: ['age_group', 'income']
INFO (2026-01-20 20:37:54,260) [ipw/ipw (line 728)]: Building model matrix
INFO (2026-01-20 20:37:54,293) [ipw/ipw (line 750)]: The formula used to build the model matrix: ['income + age_group']
INFO (2026-01-20 20:37:54,293) [ipw/ipw (line 753)]: The number of columns in the model matrix: 14
INFO (2026-01-20 20:37:54,293) [ipw/ipw (line 754)]: The number of rows in the model matrix: 4819
INFO (2026-01-20 20:37:55,233) [ipw/ipw (line 915)]: Done with sklearn
INFO (2026-01-20 20:37:55,233) [ipw/ipw (line 917)]: max_de: 1.5
INFO (2026-01-20 20:37:55,233) [ipw/choose_regularization (line 371)]: Starting choosing regularisation parameters
INFO (2026-01-20 20:37:59,335) [ipw/choose_regularization (line 457)]: Best regularisation: 
           s  s_index  trim  design_effect  asmd_improvement      asmd
6  0.105705       82   5.0         1.4897          0.494125  0.098702
INFO (2026-01-20 20:37:59,336) [ipw/ipw (line 972)]: Chosen lambda: 0.10570520810009826
INFO (2026-01-20 20:37:59,337) [ipw/ipw (line 990)]: Proportion null deviance explained 0.14889612147544162
INFO (2026-01-20 20:37:59,339) [cli/process_batch (line 725)]: Succeeded with adjusting sample to target
INFO (2026-01-20 20:37:59,341) [cli/process_batch (line 726)]: balance adjusted object: 
        Adjusted balance Sample object with target set using ipw
        268 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
        adjustment details:
            method: ipw
            weight trimming mean ratio: 5.0
            design effect (Deff): 1.490
            effective sample size proportion (ESSP): 0.671
            effective sample size (ESS): 179.9
                
            target:
                 
	        balance Sample object
	        4551 observations x 2 variables: age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: happiness
	        
            2 common variables: age_group,income
            
INFO (2026-01-20 20:37:59,341) [cli/process_batch (line 728)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-01-20 20:37:59,341) [cli/process_batch (line 732)]: Names of columns to keep for diagnostics: None 
INFO (2026-01-20 20:37:59,341) [sample_class/diagnostics (line 1792)]: Starting computation of diagnostics of the fitting
INFO (2026-01-20 20:37:59,470) [sample_class/diagnostics (line 2013)]: Done computing diagnostics
INFO (2026-01-20 20:37:59,474) [cli/process_batch (line 741)]: balance diagnostics object:                          metric          val            var
0                          size   268.000000     sample_obs
1                          size     2.000000  sample_covars
2                          size  4551.000000     target_obs
3                          size     2.000000  target_covars
4           weights_diagnostics     1.489700  design_effect
..                          ...          ...            ...
78  covar_main_asmd_improvement     0.185597         income
79     covar_main_asmd_adjusted     0.220390     mean(asmd)
80   covar_main_asmd_unadjusted     0.422500     mean(asmd)
81  covar_main_asmd_improvement     0.202110     mean(asmd)
82           adjustment_failure     0.000000            NaN

[83 rows x 3 columns]
INFO (2026-01-20 20:37:59,476) [cli/main (line 1102)]: Done processing batch ('Female',)
INFO (2026-01-20 20:37:59,476) [cli/main (line 1085)]: Running weighting for batch = ('Male',) 
WARNING (2026-01-20 20:37:59,477) [sample_class/from_frame (line 457)]: Casting id column to string
WARNING (2026-01-20 20:37:59,484) [pandas_utils/_warn_of_df_dtypes_change (line 492)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-01-20 20:37:59,484) [pandas_utils/_warn_of_df_dtypes_change (line 503)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-01-20 20:37:59,485) [pandas_utils/_warn_of_df_dtypes_change (line 506)]: 
id               int64
is_respondent    int64
dtype: object
WARNING (2026-01-20 20:37:59,485) [pandas_utils/_warn_of_df_dtypes_change (line 507)]: The (new) dtypes saved in df (after the change):
WARNING (2026-01-20 20:37:59,486) [pandas_utils/_warn_of_df_dtypes_change (line 508)]: 
id                object
is_respondent    float64
dtype: object
INFO (2026-01-20 20:37:59,487) [cli/process_batch (line 691)]: balance sample object: 
        balance Sample object
        644 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
WARNING (2026-01-20 20:37:59,488) [sample_class/from_frame (line 457)]: Casting id column to string
WARNING (2026-01-20 20:37:59,497) [pandas_utils/_warn_of_df_dtypes_change (line 492)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-01-20 20:37:59,497) [pandas_utils/_warn_of_df_dtypes_change (line 503)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-01-20 20:37:59,498) [pandas_utils/_warn_of_df_dtypes_change (line 506)]: 
id               int64
is_respondent    int64
dtype: object
WARNING (2026-01-20 20:37:59,498) [pandas_utils/_warn_of_df_dtypes_change (line 507)]: The (new) dtypes saved in df (after the change):
WARNING (2026-01-20 20:37:59,498) [pandas_utils/_warn_of_df_dtypes_change (line 508)]: 
id                object
is_respondent    float64
dtype: object
INFO (2026-01-20 20:37:59,500) [cli/process_batch (line 702)]: balance target object: 
        balance Sample object
        4551 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
INFO (2026-01-20 20:37:59,502) [ipw/ipw (line 694)]: Starting ipw function
INFO (2026-01-20 20:37:59,503) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-01-20 20:37:59,503) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['age_group', 'income']
INFO (2026-01-20 20:37:59,508) [adjustment/apply_transformations (line 469)]: Final variables in output: ['age_group', 'income']
INFO (2026-01-20 20:37:59,511) [ipw/ipw (line 728)]: Building model matrix
INFO (2026-01-20 20:37:59,543) [ipw/ipw (line 750)]: The formula used to build the model matrix: ['income + age_group']
INFO (2026-01-20 20:37:59,543) [ipw/ipw (line 753)]: The number of columns in the model matrix: 14
INFO (2026-01-20 20:37:59,543) [ipw/ipw (line 754)]: The number of rows in the model matrix: 5195
INFO (2026-01-20 20:38:00,393) [ipw/ipw (line 915)]: Done with sklearn
INFO (2026-01-20 20:38:00,393) [ipw/ipw (line 917)]: max_de: 1.5
INFO (2026-01-20 20:38:00,393) [ipw/choose_regularization (line 371)]: Starting choosing regularisation parameters
INFO (2026-01-20 20:38:04,420) [ipw/choose_regularization (line 457)]: Best regularisation: 
           s  s_index  trim  design_effect  asmd_improvement      asmd
9  0.111736       81   5.0       1.495973          0.566289  0.087357
INFO (2026-01-20 20:38:04,422) [ipw/ipw (line 972)]: Chosen lambda: 0.11173591019485084
INFO (2026-01-20 20:38:04,422) [ipw/ipw (line 990)]: Proportion null deviance explained 0.14267734119400044
INFO (2026-01-20 20:38:04,425) [cli/process_batch (line 725)]: Succeeded with adjusting sample to target
INFO (2026-01-20 20:38:04,427) [cli/process_batch (line 726)]: balance adjusted object: 
        Adjusted balance Sample object with target set using ipw
        644 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
        adjustment details:
            method: ipw
            weight trimming mean ratio: 5.0
            design effect (Deff): 1.496
            effective sample size proportion (ESSP): 0.668
            effective sample size (ESS): 430.5
                
            target:
                 
	        balance Sample object
	        4551 observations x 2 variables: age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: happiness
	        
            2 common variables: age_group,income
            
INFO (2026-01-20 20:38:04,427) [cli/process_batch (line 728)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-01-20 20:38:04,427) [cli/process_batch (line 732)]: Names of columns to keep for diagnostics: None 
INFO (2026-01-20 20:38:04,427) [sample_class/diagnostics (line 1792)]: Starting computation of diagnostics of the fitting
INFO (2026-01-20 20:38:04,559) [sample_class/diagnostics (line 2013)]: Done computing diagnostics
INFO (2026-01-20 20:38:04,563) [cli/process_batch (line 741)]: balance diagnostics object:                          metric          val            var
0                          size   644.000000     sample_obs
1                          size     2.000000  sample_covars
2                          size  4551.000000     target_obs
3                          size     2.000000  target_covars
4           weights_diagnostics     1.495973  design_effect
..                          ...          ...            ...
78  covar_main_asmd_improvement     0.235896         income
79     covar_main_asmd_adjusted     0.192202     mean(asmd)
80   covar_main_asmd_unadjusted     0.430017     mean(asmd)
81  covar_main_asmd_improvement     0.237816     mean(asmd)
82           adjustment_failure     0.000000            NaN

[83 rows x 3 columns]
INFO (2026-01-20 20:38:04,565) [cli/main (line 1102)]: Done processing batch ('Male',)
INFO (2026-01-20 20:38:04,565) [cli/main (line 1128)]: Done fitting the model, writing output
balance (Version 0.15.0) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Output rows: 912
Out[10]:
id age_group income happiness weight gender is_respondent
0 1 18-24 9.940280 66.885485 10.380592 Female 1.0
1 92 35-44 0.185097 84.464522 18.177351 Female 1.0
2 94 35-44 1.183696 65.742184 20.852401 Female 1.0
3 95 18-24 3.716007 67.624539 10.522912 Female 1.0
4 98 35-44 16.751931 44.868651 40.368284 Female 1.0
In [11]:
# Inspect weights by gender - each group was adjusted independently
print("Weight statistics by gender (sample only):")
sample_only = batch_adjusted_df[batch_adjusted_df["is_respondent"] == 1]
print(sample_only.groupby("gender")["weight"].describe().round(3))
Weight statistics by gender (sample only):
        count    mean     std    min    25%     50%     75%     max
gender                                                             
Female  268.0  16.981  11.906  6.787  9.567  13.703  19.155  85.647
Male    644.0   7.067   4.981  2.913  3.260   5.775   9.234  35.371

Alternative Weighting Methods¶

The CLI supports three adjustment methods:

  • IPW (Inverse Probability Weighting): The default method, uses logistic regression to estimate propensity scores
  • CBPS (Covariate Balancing Propensity Score): Balances covariates while estimating propensity scores
  • Rake (Raking/Iterative Proportional Fitting): Adjusts weights iteratively to match marginal distributions

Example: CBPS Method¶

CBPS simultaneously optimizes covariate balance and propensity score estimation:

In [12]:
with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input.csv")
    output_path = os.path.join(tmpdir, "weights_cbps.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_cbps.csv")

    load_data_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file", input_path,
        "--output_file", output_path,
        "--diagnostics_output_file", diagnostics_path,
        "--covariate_columns", "gender,age_group,income",
        "--method", "cbps",
    ]

    print("CLI command with CBPS method:")
    print(" ".join(cmd))
    subprocess.check_call(cmd)

    cbps_diagnostics_df = pd.read_csv(diagnostics_path)

# Verify the method used
print("\nAdjustment method used:")
print(cbps_diagnostics_df.query("metric == 'adjustment_method'")[["var", "val"]])
CLI command with CBPS method:
python -m balance.cli --input_file /tmp/tmpvme2x18i/input.csv --output_file /tmp/tmpvme2x18i/weights_cbps.csv --diagnostics_output_file /tmp/tmpvme2x18i/diagnostics_cbps.csv --covariate_columns gender,age_group,income --method cbps
INFO (2026-01-20 20:38:06,308) [__init__/<module> (line 72)]: Using balance version 0.15.0
INFO (2026-01-20 20:38:06,310) [cli/main (line 1039)]: Running cli.main() using balance version 0.15.0
INFO (2026-01-20 20:38:06,310) [cli/main (line 1074)]: Attributes used by main() for running adjust: {'transformations': 'default', 'formula': None, 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 1.5, 'lambda_min': 1e-05, 'lambda_max': 10, 'num_lambdas': 250, 'weight_trimming_mean_ratio': 20.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.15.0'}
INFO (2026-01-20 20:38:06,320) [cli/load_and_check_input (line 869)]: Number of rows in input file: 11000
INFO (2026-01-20 20:38:06,320) [cli/load_and_check_input (line 875)]: Number of columns in input file: 7
WARNING (2026-01-20 20:38:06,469) [sample_class/from_frame (line 457)]: Casting id column to string
WARNING (2026-01-20 20:38:06,477) [pandas_utils/_warn_of_df_dtypes_change (line 492)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-01-20 20:38:06,477) [pandas_utils/_warn_of_df_dtypes_change (line 503)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-01-20 20:38:06,478) [pandas_utils/_warn_of_df_dtypes_change (line 506)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-01-20 20:38:06,478) [pandas_utils/_warn_of_df_dtypes_change (line 507)]: The (new) dtypes saved in df (after the change):
WARNING (2026-01-20 20:38:06,479) [pandas_utils/_warn_of_df_dtypes_change (line 508)]: 
is_respondent    float64
id                object
dtype: object
INFO (2026-01-20 20:38:06,480) [cli/process_batch (line 691)]: balance sample object: 
        balance Sample object
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness,is_respondent
        
WARNING (2026-01-20 20:38:06,482) [sample_class/from_frame (line 457)]: Casting id column to string
WARNING (2026-01-20 20:38:06,496) [pandas_utils/_warn_of_df_dtypes_change (line 492)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-01-20 20:38:06,497) [pandas_utils/_warn_of_df_dtypes_change (line 503)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-01-20 20:38:06,497) [pandas_utils/_warn_of_df_dtypes_change (line 506)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-01-20 20:38:06,497) [pandas_utils/_warn_of_df_dtypes_change (line 507)]: The (new) dtypes saved in df (after the change):
WARNING (2026-01-20 20:38:06,498) [pandas_utils/_warn_of_df_dtypes_change (line 508)]: 
is_respondent    float64
id                object
dtype: object
INFO (2026-01-20 20:38:06,500) [cli/process_batch (line 702)]: balance target object: 
        balance Sample object
        10000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness,is_respondent
        
INFO (2026-01-20 20:38:06,504) [cbps/cbps (line 537)]: Starting cbps function
INFO (2026-01-20 20:38:06,506) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-01-20 20:38:06,506) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group', 'income']
INFO (2026-01-20 20:38:06,515) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group', 'income']
INFO (2026-01-20 20:38:06,616) [cbps/cbps (line 588)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']
INFO (2026-01-20 20:38:06,617) [cbps/cbps (line 599)]: The number of columns in the model matrix: 16
INFO (2026-01-20 20:38:06,617) [cbps/cbps (line 600)]: The number of rows in the model matrix: 11000
INFO (2026-01-20 20:38:06,624) [cbps/cbps (line 669)]: Finding initial estimator for GMM optimization
INFO (2026-01-20 20:38:06,757) [cbps/cbps (line 696)]: Finding initial estimator for GMM optimization that minimizes the balance loss
INFO (2026-01-20 20:38:07,356) [cbps/cbps (line 732)]: Running GMM optimization
INFO (2026-01-20 20:38:08,711) [cbps/cbps (line 859)]: Done cbps function
INFO (2026-01-20 20:38:08,714) [cli/process_batch (line 725)]: Succeeded with adjusting sample to target
INFO (2026-01-20 20:38:08,717) [cli/process_batch (line 726)]: balance adjusted object: 
        Adjusted balance Sample object with target set using cbps
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness,is_respondent
        
        adjustment details:
            method: cbps
            design effect (Deff): 1.500
            effective sample size proportion (ESSP): 0.667
            effective sample size (ESS): 666.7
                
            target:
                 
	        balance Sample object
	        10000 observations x 3 variables: gender,age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: happiness,is_respondent
	        
            3 common variables: gender,age_group,income
            
INFO (2026-01-20 20:38:08,717) [cli/process_batch (line 728)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-01-20 20:38:08,717) [cli/process_batch (line 732)]: Names of columns to keep for diagnostics: None 
INFO (2026-01-20 20:38:08,717) [sample_class/diagnostics (line 1792)]: Starting computation of diagnostics of the fitting
INFO (2026-01-20 20:38:08,995) [sample_class/diagnostics (line 2013)]: Done computing diagnostics
INFO (2026-01-20 20:38:08,999) [cli/process_batch (line 741)]: balance diagnostics object:                          metric       val            var
0                          size    1000.0     sample_obs
1                          size       3.0  sample_covars
2                          size   10000.0     target_obs
3                          size       3.0  target_covars
4           weights_diagnostics       1.5  design_effect
..                          ...       ...            ...
86  covar_main_asmd_improvement  0.205326         income
87     covar_main_asmd_adjusted  0.175442     mean(asmd)
88   covar_main_asmd_unadjusted  0.326799     mean(asmd)
89  covar_main_asmd_improvement  0.151357     mean(asmd)
90           adjustment_failure         0            NaN

[91 rows x 3 columns]
INFO (2026-01-20 20:38:09,001) [cli/main (line 1128)]: Done fitting the model, writing output
balance (Version 0.15.0) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Adjustment method used:
     var  val
28  cbps  0.0

Example: Rake Method¶

Raking iteratively adjusts weights to match target marginal distributions:

In [13]:
with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input.csv")
    output_path = os.path.join(tmpdir, "weights_rake.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_rake.csv")

    load_data_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file", input_path,
        "--output_file", output_path,
        "--diagnostics_output_file", diagnostics_path,
        "--covariate_columns", "gender,age_group,income",
        "--method", "rake",
    ]

    print("CLI command with rake method:")
    print(" ".join(cmd))
    subprocess.check_call(cmd)

    rake_diagnostics_df = pd.read_csv(diagnostics_path)

# Verify the method used
print("\nAdjustment method used:")
print(rake_diagnostics_df.query("metric == 'adjustment_method'")[["var", "val"]])
CLI command with rake method:
python -m balance.cli --input_file /tmp/tmptuaq2qin/input.csv --output_file /tmp/tmptuaq2qin/weights_rake.csv --diagnostics_output_file /tmp/tmptuaq2qin/diagnostics_rake.csv --covariate_columns gender,age_group,income --method rake
INFO (2026-01-20 20:38:10,720) [__init__/<module> (line 72)]: Using balance version 0.15.0
INFO (2026-01-20 20:38:10,722) [cli/main (line 1039)]: Running cli.main() using balance version 0.15.0
INFO (2026-01-20 20:38:10,722) [cli/main (line 1074)]: Attributes used by main() for running adjust: {'transformations': 'default', 'formula': None, 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 1.5, 'lambda_min': 1e-05, 'lambda_max': 10, 'num_lambdas': 250, 'weight_trimming_mean_ratio': 20.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.15.0'}
INFO (2026-01-20 20:38:10,732) [cli/load_and_check_input (line 869)]: Number of rows in input file: 11000
INFO (2026-01-20 20:38:10,732) [cli/load_and_check_input (line 875)]: Number of columns in input file: 7
WARNING (2026-01-20 20:38:10,881) [sample_class/from_frame (line 457)]: Casting id column to string
WARNING (2026-01-20 20:38:10,890) [pandas_utils/_warn_of_df_dtypes_change (line 492)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-01-20 20:38:10,890) [pandas_utils/_warn_of_df_dtypes_change (line 503)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-01-20 20:38:10,891) [pandas_utils/_warn_of_df_dtypes_change (line 506)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-01-20 20:38:10,891) [pandas_utils/_warn_of_df_dtypes_change (line 507)]: The (new) dtypes saved in df (after the change):
WARNING (2026-01-20 20:38:10,892) [pandas_utils/_warn_of_df_dtypes_change (line 508)]: 
is_respondent    float64
id                object
dtype: object
INFO (2026-01-20 20:38:10,893) [cli/process_batch (line 691)]: balance sample object: 
        balance Sample object
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness,is_respondent
        
WARNING (2026-01-20 20:38:10,894) [sample_class/from_frame (line 457)]: Casting id column to string
WARNING (2026-01-20 20:38:10,909) [pandas_utils/_warn_of_df_dtypes_change (line 492)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-01-20 20:38:10,909) [pandas_utils/_warn_of_df_dtypes_change (line 503)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-01-20 20:38:10,909) [pandas_utils/_warn_of_df_dtypes_change (line 506)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-01-20 20:38:10,909) [pandas_utils/_warn_of_df_dtypes_change (line 507)]: The (new) dtypes saved in df (after the change):
WARNING (2026-01-20 20:38:10,910) [pandas_utils/_warn_of_df_dtypes_change (line 508)]: 
is_respondent    float64
id                object
dtype: object
INFO (2026-01-20 20:38:10,912) [cli/process_batch (line 702)]: balance target object: 
        balance Sample object
        10000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness,is_respondent
        
INFO (2026-01-20 20:38:10,918) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-01-20 20:38:10,918) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group', 'income']
INFO (2026-01-20 20:38:10,928) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group', 'income']
INFO (2026-01-20 20:38:10,963) [rake/rake (line 274)]: Final covariates and levels that will be used in raking: {'age_group': ['18-24', '25-34', '35-44', '45+'], 'gender': ['Female', 'Male', '__NaN__'], 'income': ['(-0.0009997440000000001, 0.44]', '(0.44, 1.664]', '(1.664, 3.472]', '(11.312, 15.139]', '(15.139, 20.567]', '(20.567, 29.504]', '(29.504, 128.536]', '(3.472, 5.663]', '(5.663, 8.211]', '(8.211, 11.312]']}.
INFO (2026-01-20 20:38:10,983) [cli/process_batch (line 725)]: Succeeded with adjusting sample to target
INFO (2026-01-20 20:38:10,985) [cli/process_batch (line 726)]: balance adjusted object: 
        Adjusted balance Sample object with target set using rake
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness,is_respondent
        
        adjustment details:
            method: rake
            design effect (Deff): 3.774
            effective sample size proportion (ESSP): 0.265
            effective sample size (ESS): 265.0
                
            target:
                 
	        balance Sample object
	        10000 observations x 3 variables: gender,age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: happiness,is_respondent
	        
            3 common variables: gender,age_group,income
            
INFO (2026-01-20 20:38:10,985) [cli/process_batch (line 728)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-01-20 20:38:10,985) [cli/process_batch (line 732)]: Names of columns to keep for diagnostics: None 
INFO (2026-01-20 20:38:10,985) [sample_class/diagnostics (line 1792)]: Starting computation of diagnostics of the fitting
INFO (2026-01-20 20:38:11,229) [sample_class/diagnostics (line 2013)]: Done computing diagnostics
INFO (2026-01-20 20:38:11,233) [cli/process_batch (line 741)]: balance diagnostics object:                          metric           val            var
0                          size   1000.000000     sample_obs
1                          size      3.000000  sample_covars
2                          size  10000.000000     target_obs
3                          size      3.000000  target_covars
4           weights_diagnostics      3.773786  design_effect
..                          ...           ...            ...
61  covar_main_asmd_improvement      0.462436         income
62     covar_main_asmd_adjusted      0.014651     mean(asmd)
63   covar_main_asmd_unadjusted      0.326799     mean(asmd)
64  covar_main_asmd_improvement      0.312147     mean(asmd)
65           adjustment_failure      0.000000            NaN

[66 rows x 3 columns]
INFO (2026-01-20 20:38:11,235) [cli/main (line 1128)]: Done fitting the model, writing output
balance (Version 0.15.0) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Adjustment method used:
     var  val
28  rake  0.0

Next steps¶

  • Try --method cbps or --method rake for alternative weighting approaches.
  • Use --outcome_columns to control which columns are treated as outcomes.
  • Supply --ipw_logistic_regression_kwargs to tune the IPW model.
  • Use --succeed_on_weighting_failure for pipelines where you want null weights instead of errors.
  • Explore --covariate_columns_for_diagnostics and --rows_to_keep_for_diagnostics to customize diagnostic output.

Session info¶

For reproducibility, here is the session information:

In [14]:
import session_info
session_info.show(html=False, dependencies=True)
-----
balance             0.15.0
pandas              2.3.3
session_info        v1.0.1
-----
PIL                         11.3.0
anyio                       NA
arrow                       1.4.0
asttokens                   NA
attr                        25.4.0
attrs                       25.4.0
babel                       2.17.0
certifi                     2026.01.04
charset_normalizer          3.4.4
comm                        0.2.3
cycler                      0.12.1
cython_runtime              NA
dateutil                    2.9.0.post0
debugpy                     1.8.19
decorator                   5.2.1
defusedxml                  0.7.1
exceptiongroup              1.3.1
executing                   2.2.1
fastjsonschema              NA
fqdn                        NA
idna                        3.11
importlib_metadata          NA
importlib_resources         NA
ipykernel                   6.31.0
isoduration                 NA
jedi                        0.19.2
jinja2                      3.1.6
joblib                      1.5.3
json5                       0.13.0
jsonpointer                 3.0.0
jsonschema                  4.25.1
jsonschema_specifications   NA
jupyter_events              0.12.0
jupyter_server              2.17.0
jupyterlab_server           2.28.0
kiwisolver                  1.4.7
lark                        1.3.1
markupsafe                  3.0.3
matplotlib                  3.9.4
mpl_toolkits                NA
narwhals                    2.15.0
nbformat                    5.10.4
numpy                       1.26.4
overrides                   NA
packaging                   25.0
parso                       0.8.5
patsy                       1.0.2
pexpect                     4.9.0
platformdirs                4.4.0
plotly                      6.5.2
prometheus_client           NA
prompt_toolkit              3.0.52
psutil                      7.2.1
ptyprocess                  0.7.0
pure_eval                   0.2.3
pydev_ipython               NA
pydevconsole                NA
pydevd                      3.2.3
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.19.2
pyparsing                   3.3.1
pythonjsonlogger            NA
pytz                        2025.2
referencing                 NA
requests                    2.32.5
rfc3339_validator           0.1.4
rfc3986_validator           0.1.1
rfc3987_syntax              NA
rpds                        NA
scipy                       1.13.1
seaborn                     0.13.2
send2trash                  NA
six                         1.17.0
sklearn                     1.3.2
sphinxcontrib               NA
stack_data                  0.6.3
statsmodels                 0.14.6
threadpoolctl               3.6.0
tornado                     6.5.4
traitlets                   5.14.3
typing_extensions           NA
uri_template                NA
urllib3                     2.6.3
wcwidth                     0.2.14
webcolors                   NA
websocket                   1.9.0
yaml                        6.0.3
zipp                        NA
zmq                         27.1.0
zoneinfo                    NA
-----
IPython             8.18.1
jupyter_client      8.6.3
jupyter_core        5.8.1
jupyterlab          4.5.2
notebook            7.5.2
-----
Python 3.9.25 (main, Nov  3 2025, 15:16:36) [GCC 13.3.0]
Linux-6.11.0-1018-azure-x86_64-with-glibc2.39
-----
Session information updated at 2026-01-20 20:38