import os
import subprocess
import tempfile

import pandas as pd

from balance import load_data

INFO (2026-03-12 17:27:14,077) [__init__/<module> (line 72)]: Using balance version 0.16.1

balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

target_df, sample_df = load_data()

sample_df = sample_df.copy()
target_df = target_df.copy()
sample_df["is_respondent"] = 1
target_df["is_respondent"] = 0
sample_df["weight"] = 1.0
target_df["weight"] = 1.0

load_data_input_df = pd.concat([sample_df, target_df], ignore_index=True)
load_data_input_df.head()

with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input.csv")
    output_path = os.path.join(tmpdir, "weights_out.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_out.csv")

    load_data_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file",
        input_path,
        "--output_file",
        output_path,
        "--diagnostics_output_file",
        diagnostics_path,
        "--covariate_columns",
        "gender,age_group,income",
        "--method",
        "ipw",
        "--weights_impact_on_outcome_method",
        "t_test",
    ]

    print("CLI command:", " ".join(cmd))
    subprocess.check_call(cmd)

    load_data_adjusted_df = pd.read_csv(output_path)
    load_data_diagnostics_df = pd.read_csv(diagnostics_path)

load_data_adjusted_df.head()

CLI command: python -m balance.cli --input_file /tmp/tmpzj_gpg8u/input.csv --output_file /tmp/tmpzj_gpg8u/weights_out.csv --diagnostics_output_file /tmp/tmpzj_gpg8u/diagnostics_out.csv --covariate_columns gender,age_group,income --method ipw --weights_impact_on_outcome_method t_test

INFO (2026-03-12 17:27:16,236) [__init__/<module> (line 72)]: Using balance version 0.16.1
INFO (2026-03-12 17:27:16,238) [cli/main (line 1095)]: Running cli.main() using balance version 0.16.1
INFO (2026-03-12 17:27:16,238) [cli/main (line 1130)]: Attributes used by main() for running adjust: {'transformations': 'default', 'formula': None, 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 1.5, 'lambda_min': 1e-05, 'lambda_max': 10, 'num_lambdas': 250, 'weight_trimming_mean_ratio': 20.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.16.1'}
INFO (2026-03-12 17:27:16,249) [cli/load_and_check_input (line 926)]: Number of rows in input file: 11000
INFO (2026-03-12 17:27:16,249) [cli/load_and_check_input (line 932)]: Number of columns in input file: 7
WARNING (2026-03-12 17:27:16,401) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:16,412) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:16,412) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:16,413) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
id               int64
is_respondent    int64
dtype: object
WARNING (2026-03-12 17:27:16,413) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:16,414) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
id                   str
is_respondent    float64
dtype: object
INFO (2026-03-12 17:27:16,415) [cli/process_batch (line 747)]: balance sample object: 
        balance Sample object
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
WARNING (2026-03-12 17:27:16,423) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:16,436) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:16,437) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):

WARNING (2026-03-12 17:27:16,437) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
id               int64
is_respondent    int64
dtype: object
WARNING (2026-03-12 17:27:16,437) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:16,438) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
id                   str
is_respondent    float64
dtype: object
INFO (2026-03-12 17:27:16,441) [cli/process_batch (line 758)]: balance target object: 
        balance Sample object
        10000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
INFO (2026-03-12 17:27:16,445) [ipw/ipw (line 703)]: Starting ipw function
INFO (2026-03-12 17:27:16,447) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-03-12 17:27:16,447) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group', 'income']
INFO (2026-03-12 17:27:16,454) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group', 'income']
INFO (2026-03-12 17:27:16,461) [ipw/ipw (line 738)]: Building model matrix
INFO (2026-03-12 17:27:16,570) [ipw/ipw (line 764)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']
INFO (2026-03-12 17:27:16,570) [ipw/ipw (line 767)]: The number of columns in the model matrix: 18
INFO (2026-03-12 17:27:16,570) [ipw/ipw (line 768)]: The number of rows in the model matrix: 11000

INFO (2026-03-12 17:27:18,039) [ipw/ipw (line 990)]: Done with sklearn
INFO (2026-03-12 17:27:18,039) [ipw/ipw (line 992)]: max_de: 1.5
INFO (2026-03-12 17:27:18,039) [ipw/choose_regularization (line 368)]: Starting choosing regularisation parameters

INFO (2026-03-12 17:27:26,575) [ipw/choose_regularization (line 454)]: Best regularisation: 
           s  s_index  trim  design_effect  asmd_improvement      asmd
9  0.064155       91   2.5        1.49551          0.535725  0.090719
INFO (2026-03-12 17:27:26,577) [ipw/ipw (line 1047)]: Chosen lambda: 0.06415476458273757
INFO (2026-03-12 17:27:26,577) [ipw/ipw (line 1065)]: Proportion null deviance explained 0.17450914016991492
INFO (2026-03-12 17:27:26,581) [cli/process_batch (line 781)]: Succeeded with adjusting sample to target
INFO (2026-03-12 17:27:26,583) [cli/process_batch (line 782)]: balance adjusted object: 
        Adjusted balance Sample object with target set using ipw
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
        adjustment details:
            method: ipw
            weight trimming mean ratio: 2.5
            design effect (Deff): 1.496
            effective sample size proportion (ESSP): 0.669
            effective sample size (ESS): 668.7
                
            target:
                 
	        balance Sample object
	        10000 observations x 3 variables: gender,age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: None
	        
            3 common variables: gender,age_group,income
            
INFO (2026-03-12 17:27:26,583) [cli/process_batch (line 784)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-03-12 17:27:26,583) [cli/process_batch (line 788)]: Names of columns to keep for diagnostics: None 
INFO (2026-03-12 17:27:26,583) [sample_class/diagnostics (line 1826)]: Starting computation of diagnostics of the fitting

INFO (2026-03-12 17:27:26,859) [sample_class/diagnostics (line 2069)]: Done computing diagnostics
INFO (2026-03-12 17:27:26,864) [cli/process_batch (line 799)]: balance diagnostics object:                          metric           val            var
0                          size   1000.000000     sample_obs
1                          size      3.000000  sample_covars
2                          size  10000.000000     target_obs
3                          size      3.000000  target_covars
4           weights_diagnostics      1.495510  design_effect
..                          ...           ...            ...
91  covar_main_asmd_improvement      0.182907         income
92     covar_main_asmd_adjusted      0.173301     mean(asmd)
93   covar_main_asmd_unadjusted      0.326799     mean(asmd)
94  covar_main_asmd_improvement      0.153497     mean(asmd)
95           adjustment_failure      0.000000            NaN

[96 rows x 3 columns]
INFO (2026-03-12 17:27:26,866) [cli/main (line 1184)]: Done fitting the model, writing output

balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

load_data_diagnostics_df[
    load_data_diagnostics_df["metric"].str.startswith("weights_impact_on_outcome_")
]

(
    load_data_diagnostics_df.groupby("metric")["var"]
    .apply(lambda col: sorted(col.dropna().unique()))
    .sort_index()
)

metric
adjustment_failure                                                            []
adjustment_method                                                          [ipw]
covar_asmd_adjusted            [age_group[T.25-34], age_group[T.35-44], age_g...
covar_asmd_improvement         [age_group[T.25-34], age_group[T.35-44], age_g...
covar_asmd_unadjusted          [age_group[T.25-34], age_group[T.35-44], age_g...
covar_main_asmd_adjusted                 [age_group, gender, income, mean(asmd)]
covar_main_asmd_improvement              [age_group, gender, income, mean(asmd)]
covar_main_asmd_unadjusted               [age_group, gender, income, mean(asmd)]
ipw_model_glance                                           [intercept_, n_iter_]
ipw_multi_class                                                           [auto]
ipw_penalty                                                         [deprecated]
ipw_solver                                                               [lbfgs]
model_coef                     [C(_is_na_gender, one_hot_encoding_greater_2)[...
model_glance                   [deviance, l1_ratio, lambda, null_deviance, pr...
size                           [sample_covars, sample_obs, target_covars, tar...
weights_diagnostics            [describe_25%, describe_50%, describe_75%, des...
Name: var, dtype: object

load_data_diagnostics_df.query("metric == 'adjustment_method'")

# Print a shorter CLI help snippet
help_output = subprocess.run(
    ["python", "-m", "balance.cli", "--help"],
    check=False,
    capture_output=True,
    text=True,
).stdout
print("\n".join(help_output.splitlines()[:40]))

balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

usage: cli.py [-h] --input_file INPUT_FILE --output_file OUTPUT_FILE
              [--diagnostics_output_file DIAGNOSTICS_OUTPUT_FILE]
              [--method METHOD] [--sample_column SAMPLE_COLUMN]
              [--id_column ID_COLUMN] [--weight_column WEIGHT_COLUMN]
              --covariate_columns COVARIATE_COLUMNS
              [--outcome_columns OUTCOME_COLUMNS]
              [--covariate_columns_for_diagnostics COVARIATE_COLUMNS_FOR_DIAGNOSTICS]
              [--rows_to_keep_for_diagnostics ROWS_TO_KEEP_FOR_DIAGNOSTICS]
              [--weights_impact_on_outcome_method WEIGHTS_IMPACT_ON_OUTCOME_METHOD]
              [--batch_columns BATCH_COLUMNS] [--keep_columns KEEP_COLUMNS]
              [--keep_row_column KEEP_ROW_COLUMN]
              [--sep_input_file SEP_INPUT_FILE]
              [--sep_output_file SEP_OUTPUT_FILE]
              [--sep_diagnostics_output_file SEP_DIAGNOSTICS_OUTPUT_FILE]
              [--no_output_header] [--succeed_on_weighting_failure]
              [--max_de MAX_DE] [--lambda_min LAMBDA_MIN]
              [--lambda_max LAMBDA_MAX] [--num_lambdas NUM_LAMBDAS]
              [--ipw_logistic_regression_kwargs IPW_LOGISTIC_REGRESSION_KWARGS]
              [--weight_trimming_mean_ratio WEIGHT_TRIMMING_MEAN_RATIO]
              [--one_hot_encoding ONE_HOT_ENCODING]
              [--transformations TRANSFORMATIONS] [--formula FORMULA]
              [--return_df_with_original_dtypes]
              [--standardize_types STANDARDIZE_TYPES]

options:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE
                        Path to input sample/target
  --output_file OUTPUT_FILE

with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input.csv")
    output_path = os.path.join(tmpdir, "weights_tuned.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_tuned.csv")

    load_data_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file", input_path,
        "--output_file", output_path,
        "--diagnostics_output_file", diagnostics_path,
        "--covariate_columns", "gender,age_group,income",
        "--method", "ipw",
        # Tuning parameters
        "--max_de", "2.0",
        "--lambda_min", "1e-06",
        "--lambda_max", "100",
        "--num_lambdas", "500",
        "--weight_trimming_mean_ratio", "10.0",
        # Custom logistic regression settings
        "--ipw_logistic_regression_kwargs", '{"solver": "liblinear", "max_iter": 500}',
    ]

    print("CLI command:")
    print(" ".join(cmd))
    subprocess.check_call(cmd)

    tuned_adjusted_df = pd.read_csv(output_path)

tuned_adjusted_df.head()

CLI command:
python -m balance.cli --input_file /tmp/tmp__4to2tm/input.csv --output_file /tmp/tmp__4to2tm/weights_tuned.csv --diagnostics_output_file /tmp/tmp__4to2tm/diagnostics_tuned.csv --covariate_columns gender,age_group,income --method ipw --max_de 2.0 --lambda_min 1e-06 --lambda_max 100 --num_lambdas 500 --weight_trimming_mean_ratio 10.0 --ipw_logistic_regression_kwargs {"solver": "liblinear", "max_iter": 500}

INFO (2026-03-12 17:27:31,704) [__init__/<module> (line 72)]: Using balance version 0.16.1
INFO (2026-03-12 17:27:31,706) [cli/main (line 1095)]: Running cli.main() using balance version 0.16.1
INFO (2026-03-12 17:27:31,706) [cli/main (line 1130)]: Attributes used by main() for running adjust: {'transformations': 'default', 'formula': None, 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 2.0, 'lambda_min': 1e-06, 'lambda_max': 100.0, 'num_lambdas': 500, 'weight_trimming_mean_ratio': 10.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.16.1'}
INFO (2026-03-12 17:27:31,717) [cli/load_and_check_input (line 926)]: Number of rows in input file: 11000
INFO (2026-03-12 17:27:31,717) [cli/load_and_check_input (line 932)]: Number of columns in input file: 7
WARNING (2026-03-12 17:27:31,870) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:31,881) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:31,881) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:31,882) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:27:31,882) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:31,883) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:27:31,884) [cli/process_batch (line 747)]: balance sample object: 
        balance Sample object
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
WARNING (2026-03-12 17:27:31,892) [sample_class/from_frame (line 469)]: Casting id column to string

WARNING (2026-03-12 17:27:31,905) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:31,906) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:31,906) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:27:31,907) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:31,907) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:27:31,909) [cli/process_batch (line 758)]: balance target object: 
        balance Sample object
        10000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
INFO (2026-03-12 17:27:31,914) [ipw/ipw (line 703)]: Starting ipw function
INFO (2026-03-12 17:27:31,916) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-03-12 17:27:31,916) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group', 'income']
INFO (2026-03-12 17:27:31,923) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group', 'income']
INFO (2026-03-12 17:27:31,930) [ipw/ipw (line 738)]: Building model matrix
INFO (2026-03-12 17:27:32,039) [ipw/ipw (line 764)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']
INFO (2026-03-12 17:27:32,039) [ipw/ipw (line 767)]: The number of columns in the model matrix: 18
INFO (2026-03-12 17:27:32,039) [ipw/ipw (line 768)]: The number of rows in the model matrix: 11000
INFO (2026-03-12 17:27:32,066) [ipw/ipw (line 990)]: Done with sklearn
INFO (2026-03-12 17:27:32,066) [ipw/ipw (line 992)]: max_de: 2.0
INFO (2026-03-12 17:27:32,066) [ipw/choose_regularization (line 368)]: Starting choosing regularisation parameters

INFO (2026-03-12 17:27:36,743) [ipw/choose_regularization (line 454)]: Best regularisation: 
     s  s_index  trim  design_effect  asmd_improvement      asmd
6 NaN        0   2.5       1.714071          0.634917  0.071337
INFO (2026-03-12 17:27:36,745) [ipw/ipw (line 1047)]: Chosen lambda: nan
INFO (2026-03-12 17:27:36,745) [ipw/ipw (line 1065)]: Proportion null deviance explained 0.18280833369391136
INFO (2026-03-12 17:27:36,748) [cli/process_batch (line 781)]: Succeeded with adjusting sample to target
INFO (2026-03-12 17:27:36,750) [cli/process_batch (line 782)]: balance adjusted object: 
        Adjusted balance Sample object with target set using ipw
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
        adjustment details:
            method: ipw
            weight trimming mean ratio: 2.5
            design effect (Deff): 1.714
            effective sample size proportion (ESSP): 0.583
            effective sample size (ESS): 583.4
                
            target:
                 
	        balance Sample object
	        10000 observations x 3 variables: gender,age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: None
	        
            3 common variables: gender,age_group,income
            
INFO (2026-03-12 17:27:36,750) [cli/process_batch (line 784)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-03-12 17:27:36,750) [cli/process_batch (line 788)]: Names of columns to keep for diagnostics: None 
INFO (2026-03-12 17:27:36,750) [sample_class/diagnostics (line 1826)]: Starting computation of diagnostics of the fitting

INFO (2026-03-12 17:27:37,027) [sample_class/diagnostics (line 2069)]: Done computing diagnostics
INFO (2026-03-12 17:27:37,031) [cli/process_batch (line 799)]: balance diagnostics object:                          metric           val            var
0                          size   1000.000000     sample_obs
1                          size      3.000000  sample_covars
2                          size  10000.000000     target_obs
3                          size      3.000000  target_covars
4           weights_diagnostics      1.714071  design_effect
..                          ...           ...            ...
91  covar_main_asmd_improvement      0.225463         income
92     covar_main_asmd_adjusted      0.143344     mean(asmd)
93   covar_main_asmd_unadjusted      0.326799     mean(asmd)
94  covar_main_asmd_improvement      0.183455     mean(asmd)
95           adjustment_failure      0.000000            NaN

[96 rows x 3 columns]
INFO (2026-03-12 17:27:37,034) [cli/main (line 1184)]: Done fitting the model, writing output

balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input.csv")
    output_path = os.path.join(tmpdir, "weights_formula.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_formula.csv")

    # Use the demo data for the formula example
    load_data_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file", input_path,
        "--output_file", output_path,
        "--diagnostics_output_file", diagnostics_path,
        "--covariate_columns", "gender,age_group,income",
        "--method", "ipw",
        # Disable transformations to use raw covariates in formula
        "--transformations", "None",
        # Use a formula with interaction term
        "--formula", "gender*income",
    ]

    print("CLI command with custom formula:")
    print(" ".join(cmd))
    subprocess.check_call(cmd)

    formula_diagnostics_df = pd.read_csv(diagnostics_path)

# Check model coefficients to verify formula was applied
print("\nModel coefficients (showing interaction term):")
print(formula_diagnostics_df.query("metric == 'model_coef'")[["var", "val"]])

CLI command with custom formula:
python -m balance.cli --input_file /tmp/tmpzk20yf2u/input.csv --output_file /tmp/tmpzk20yf2u/weights_formula.csv --diagnostics_output_file /tmp/tmpzk20yf2u/diagnostics_formula.csv --covariate_columns gender,age_group,income --method ipw --transformations None --formula gender*income

INFO (2026-03-12 17:27:39,494) [__init__/<module> (line 72)]: Using balance version 0.16.1
INFO (2026-03-12 17:27:39,496) [cli/main (line 1095)]: Running cli.main() using balance version 0.16.1
INFO (2026-03-12 17:27:39,496) [cli/main (line 1130)]: Attributes used by main() for running adjust: {'transformations': None, 'formula': 'gender*income', 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 1.5, 'lambda_min': 1e-05, 'lambda_max': 10, 'num_lambdas': 250, 'weight_trimming_mean_ratio': 20.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.16.1'}
INFO (2026-03-12 17:27:39,507) [cli/load_and_check_input (line 926)]: Number of rows in input file: 11000
INFO (2026-03-12 17:27:39,507) [cli/load_and_check_input (line 932)]: Number of columns in input file: 7
WARNING (2026-03-12 17:27:39,660) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:39,671) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:39,671) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:39,672) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:27:39,672) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:39,673) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:27:39,674) [cli/process_batch (line 747)]: balance sample object: 
        balance Sample object
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
WARNING (2026-03-12 17:27:39,681) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:39,694) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:39,695) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):

WARNING (2026-03-12 17:27:39,695) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:27:39,695) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:39,696) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:27:39,699) [cli/process_batch (line 758)]: balance target object: 
        balance Sample object
        10000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
INFO (2026-03-12 17:27:39,704) [ipw/ipw (line 703)]: Starting ipw function
INFO (2026-03-12 17:27:39,705) [ipw/ipw (line 738)]: Building model matrix
INFO (2026-03-12 17:27:39,753) [ipw/ipw (line 764)]: The formula used to build the model matrix: ['gender*income']
INFO (2026-03-12 17:27:39,753) [ipw/ipw (line 767)]: The number of columns in the model matrix: 7
INFO (2026-03-12 17:27:39,753) [ipw/ipw (line 768)]: The number of rows in the model matrix: 11000

INFO (2026-03-12 17:27:41,213) [ipw/ipw (line 990)]: Done with sklearn
INFO (2026-03-12 17:27:41,213) [ipw/ipw (line 992)]: max_de: 1.5
INFO (2026-03-12 17:27:41,213) [ipw/choose_regularization (line 368)]: Starting choosing regularisation parameters

INFO (2026-03-12 17:27:47,871) [ipw/choose_regularization (line 454)]: Best regularisation: 
           s  s_index  trim  design_effect  asmd_improvement      asmd
9  0.043507       98   5.0       1.495849          0.517118  0.157805
INFO (2026-03-12 17:27:47,873) [ipw/ipw (line 1047)]: Chosen lambda: 0.043506507030756265
INFO (2026-03-12 17:27:47,873) [ipw/ipw (line 1065)]: Proportion null deviance explained 0.09595118841373662
WARNING (2026-03-12 17:27:47,873) [ipw/ipw (line 1073)]: The propensity model has low fraction null deviance explained (0.09595118841373662). Results may not be accurate
INFO (2026-03-12 17:27:47,876) [cli/process_batch (line 781)]: Succeeded with adjusting sample to target
INFO (2026-03-12 17:27:47,878) [cli/process_batch (line 782)]: balance adjusted object: 
        Adjusted balance Sample object with target set using ipw
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
        adjustment details:
            method: ipw
            weight trimming mean ratio: 5.0
            design effect (Deff): 1.496
            effective sample size proportion (ESSP): 0.669
            effective sample size (ESS): 668.5
                
            target:
                 
	        balance Sample object
	        10000 observations x 3 variables: gender,age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: None
	        
            3 common variables: gender,age_group,income
            
INFO (2026-03-12 17:27:47,878) [cli/process_batch (line 784)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-03-12 17:27:47,878) [cli/process_batch (line 788)]: Names of columns to keep for diagnostics: None 
INFO (2026-03-12 17:27:47,878) [sample_class/diagnostics (line 1826)]: Starting computation of diagnostics of the fitting

INFO (2026-03-12 17:27:48,157) [sample_class/diagnostics (line 2069)]: Done computing diagnostics
INFO (2026-03-12 17:27:48,162) [cli/process_batch (line 799)]: balance diagnostics object:                          metric           val            var
0                          size   1000.000000     sample_obs
1                          size      3.000000  sample_covars
2                          size  10000.000000     target_obs
3                          size      3.000000  target_covars
4           weights_diagnostics      1.495849  design_effect
..                          ...           ...            ...
80  covar_main_asmd_improvement      0.301760         income
81     covar_main_asmd_adjusted      0.157805     mean(asmd)
82   covar_main_asmd_unadjusted      0.326799     mean(asmd)
83  covar_main_asmd_improvement      0.168993     mean(asmd)
84           adjustment_failure      0.000000            NaN

[85 rows x 3 columns]
INFO (2026-03-12 17:27:48,164) [cli/main (line 1184)]: Done fitting the model, writing output

balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Model coefficients (showing interaction term):
                                                  var       val
40                                          intercept  0.452742
41      C(gender, one_hot_encoding_greater_2)[Female] -0.186488
42  C(gender, one_hot_encoding_greater_2)[Female]:... -0.224843
43        C(gender, one_hot_encoding_greater_2)[Male]  0.181382
44  C(gender, one_hot_encoding_greater_2)[Male]:in... -0.198422
45         C(gender, one_hot_encoding_greater_2)[_NA]  0.008414
46  C(gender, one_hot_encoding_greater_2)[_NA]:income -0.091654
47                                             income -0.372709

# Create a dataset with a batch column for gender
batch_input_df = load_data_input_df.copy()

# The 'gender' column has values like 'Female', 'Male', and possibly NA
# Filter to only rows with non-null gender for this example
batch_input_df = batch_input_df[batch_input_df["gender"].notna()].copy()
print(f"Rows after filtering: {len(batch_input_df)}")
print(f"Gender distribution:\n{batch_input_df['gender'].value_counts()}")

Rows after filtering: 10014
Gender distribution:
gender
Male      5195
Female    4819
Name: count, dtype: int64

with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input_batch.csv")
    output_path = os.path.join(tmpdir, "weights_batch.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_batch.csv")

    batch_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file", input_path,
        "--output_file", output_path,
        "--diagnostics_output_file", diagnostics_path,
        "--covariate_columns", "age_group,income",  # Note: gender is now used as batch column
        "--outcome_columns", "happiness",
        "--batch_columns", "gender",  # Process each gender separately
        "--method", "ipw",
    ]

    print("CLI command with batch processing:")
    print(" ".join(cmd))
    subprocess.check_call(cmd)

    batch_adjusted_df = pd.read_csv(output_path)
    batch_diagnostics_df = pd.read_csv(diagnostics_path)

print(f"\nOutput rows: {len(batch_adjusted_df)}")
batch_adjusted_df.head()

CLI command with batch processing:
python -m balance.cli --input_file /tmp/tmppjmjw6pb/input_batch.csv --output_file /tmp/tmppjmjw6pb/weights_batch.csv --diagnostics_output_file /tmp/tmppjmjw6pb/diagnostics_batch.csv --covariate_columns age_group,income --outcome_columns happiness --batch_columns gender --method ipw

INFO (2026-03-12 17:27:50,670) [__init__/<module> (line 72)]: Using balance version 0.16.1
INFO (2026-03-12 17:27:50,672) [cli/main (line 1095)]: Running cli.main() using balance version 0.16.1
INFO (2026-03-12 17:27:50,672) [cli/main (line 1130)]: Attributes used by main() for running adjust: {'transformations': 'default', 'formula': None, 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 1.5, 'lambda_min': 1e-05, 'lambda_max': 10, 'num_lambdas': 250, 'weight_trimming_mean_ratio': 20.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.16.1'}
INFO (2026-03-12 17:27:50,682) [cli/load_and_check_input (line 926)]: Number of rows in input file: 10014
INFO (2026-03-12 17:27:50,682) [cli/load_and_check_input (line 932)]: Number of columns in input file: 7
INFO (2026-03-12 17:27:50,685) [cli/main (line 1141)]: Running weighting for batch = ('Female',) 
WARNING (2026-03-12 17:27:50,838) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:50,849) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:50,849) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:50,850) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:27:50,850) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:50,851) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:27:50,852) [cli/process_batch (line 747)]: balance sample object: 
        balance Sample object
        268 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
WARNING (2026-03-12 17:27:50,859) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:50,870) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:50,870) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):

WARNING (2026-03-12 17:27:50,871) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:27:50,871) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:50,872) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:27:50,874) [cli/process_batch (line 758)]: balance target object: 
        balance Sample object
        4551 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
INFO (2026-03-12 17:27:50,877) [ipw/ipw (line 703)]: Starting ipw function
INFO (2026-03-12 17:27:50,878) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-03-12 17:27:50,878) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['age_group', 'income']
INFO (2026-03-12 17:27:50,883) [adjustment/apply_transformations (line 469)]: Final variables in output: ['age_group', 'income']
INFO (2026-03-12 17:27:50,887) [ipw/ipw (line 738)]: Building model matrix
INFO (2026-03-12 17:27:50,921) [ipw/ipw (line 764)]: The formula used to build the model matrix: ['income + age_group']
INFO (2026-03-12 17:27:50,921) [ipw/ipw (line 767)]: The number of columns in the model matrix: 14
INFO (2026-03-12 17:27:50,921) [ipw/ipw (line 768)]: The number of rows in the model matrix: 4819

INFO (2026-03-12 17:27:51,811) [ipw/ipw (line 990)]: Done with sklearn
INFO (2026-03-12 17:27:51,812) [ipw/ipw (line 992)]: max_de: 1.5
INFO (2026-03-12 17:27:51,812) [ipw/choose_regularization (line 368)]: Starting choosing regularisation parameters

INFO (2026-03-12 17:27:55,882) [ipw/choose_regularization (line 454)]: Best regularisation: 
           s  s_index  trim  design_effect  asmd_improvement     asmd
6  0.105705       82   5.0       1.489687           0.49424  0.09868
INFO (2026-03-12 17:27:55,884) [ipw/ipw (line 1047)]: Chosen lambda: 0.10570520810009826
INFO (2026-03-12 17:27:55,884) [ipw/ipw (line 1065)]: Proportion null deviance explained 0.14888521519197495
INFO (2026-03-12 17:27:55,888) [cli/process_batch (line 781)]: Succeeded with adjusting sample to target
INFO (2026-03-12 17:27:55,890) [cli/process_batch (line 782)]: balance adjusted object: 
        Adjusted balance Sample object with target set using ipw
        268 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
        adjustment details:
            method: ipw
            weight trimming mean ratio: 5.0
            design effect (Deff): 1.490
            effective sample size proportion (ESSP): 0.671
            effective sample size (ESS): 179.9
                
            target:
                 
	        balance Sample object
	        4551 observations x 2 variables: age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: happiness
	        
            2 common variables: age_group,income
            
INFO (2026-03-12 17:27:55,890) [cli/process_batch (line 784)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-03-12 17:27:55,890) [cli/process_batch (line 788)]: Names of columns to keep for diagnostics: None 
INFO (2026-03-12 17:27:55,890) [sample_class/diagnostics (line 1826)]: Starting computation of diagnostics of the fitting
INFO (2026-03-12 17:27:56,041) [sample_class/diagnostics (line 2069)]: Done computing diagnostics
INFO (2026-03-12 17:27:56,046) [cli/process_batch (line 799)]: balance diagnostics object:                          metric          val            var
0                          size   268.000000     sample_obs
1                          size     2.000000  sample_covars
2                          size  4551.000000     target_obs
3                          size     2.000000  target_covars
4           weights_diagnostics     1.489687  design_effect
..                          ...          ...            ...
86  covar_main_asmd_improvement     0.185596         income
87     covar_main_asmd_adjusted     0.220366     mean(asmd)
88   covar_main_asmd_unadjusted     0.422500     mean(asmd)
89  covar_main_asmd_improvement     0.202135     mean(asmd)
90           adjustment_failure     0.000000            NaN

[91 rows x 3 columns]
INFO (2026-03-12 17:27:56,048) [cli/main (line 1158)]: Done processing batch ('Female',)
INFO (2026-03-12 17:27:56,049) [cli/main (line 1141)]: Running weighting for batch = ('Male',) 
WARNING (2026-03-12 17:27:56,058) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:27:56,068) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:56,068) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:56,069) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:27:56,069) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:56,070) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:27:56,071) [cli/process_batch (line 747)]: balance sample object: 
        balance Sample object
        644 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
WARNING (2026-03-12 17:27:56,078) [sample_class/from_frame (line 469)]: Casting id column to string

WARNING (2026-03-12 17:27:56,089) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:27:56,089) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:27:56,090) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:27:56,090) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:27:56,090) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:27:56,092) [cli/process_batch (line 758)]: balance target object: 
        balance Sample object
        4551 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
INFO (2026-03-12 17:27:56,095) [ipw/ipw (line 703)]: Starting ipw function
INFO (2026-03-12 17:27:56,097) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-03-12 17:27:56,097) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['age_group', 'income']
INFO (2026-03-12 17:27:56,101) [adjustment/apply_transformations (line 469)]: Final variables in output: ['age_group', 'income']
INFO (2026-03-12 17:27:56,105) [ipw/ipw (line 738)]: Building model matrix
INFO (2026-03-12 17:27:56,139) [ipw/ipw (line 764)]: The formula used to build the model matrix: ['income + age_group']
INFO (2026-03-12 17:27:56,139) [ipw/ipw (line 767)]: The number of columns in the model matrix: 14
INFO (2026-03-12 17:27:56,139) [ipw/ipw (line 768)]: The number of rows in the model matrix: 5195

INFO (2026-03-12 17:27:56,947) [ipw/ipw (line 990)]: Done with sklearn
INFO (2026-03-12 17:27:56,948) [ipw/ipw (line 992)]: max_de: 1.5
INFO (2026-03-12 17:27:56,948) [ipw/choose_regularization (line 368)]: Starting choosing regularisation parameters

INFO (2026-03-12 17:28:01,056) [ipw/choose_regularization (line 454)]: Best regularisation: 
           s  s_index  trim  design_effect  asmd_improvement      asmd
9  0.111736       81   5.0       1.495967          0.566287  0.087357
INFO (2026-03-12 17:28:01,057) [ipw/ipw (line 1047)]: Chosen lambda: 0.11173591019485084
INFO (2026-03-12 17:28:01,058) [ipw/ipw (line 1065)]: Proportion null deviance explained 0.1426717197478461
INFO (2026-03-12 17:28:01,061) [cli/process_batch (line 781)]: Succeeded with adjusting sample to target
INFO (2026-03-12 17:28:01,063) [cli/process_batch (line 782)]: balance adjusted object: 
        Adjusted balance Sample object with target set using ipw
        644 observations x 2 variables: age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness
        
        adjustment details:
            method: ipw
            weight trimming mean ratio: 5.0
            design effect (Deff): 1.496
            effective sample size proportion (ESSP): 0.668
            effective sample size (ESS): 430.5
                
            target:
                 
	        balance Sample object
	        4551 observations x 2 variables: age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: happiness
	        
            2 common variables: age_group,income
            
INFO (2026-03-12 17:28:01,063) [cli/process_batch (line 784)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-03-12 17:28:01,063) [cli/process_batch (line 788)]: Names of columns to keep for diagnostics: None 
INFO (2026-03-12 17:28:01,063) [sample_class/diagnostics (line 1826)]: Starting computation of diagnostics of the fitting
INFO (2026-03-12 17:28:01,219) [sample_class/diagnostics (line 2069)]: Done computing diagnostics
INFO (2026-03-12 17:28:01,224) [cli/process_batch (line 799)]: balance diagnostics object:                          metric          val            var
0                          size   644.000000     sample_obs
1                          size     2.000000  sample_covars
2                          size  4551.000000     target_obs
3                          size     2.000000  target_covars
4           weights_diagnostics     1.495967  design_effect
..                          ...          ...            ...
86  covar_main_asmd_improvement     0.235830         income
87     covar_main_asmd_adjusted     0.192214     mean(asmd)
88   covar_main_asmd_unadjusted     0.430017     mean(asmd)
89  covar_main_asmd_improvement     0.237804     mean(asmd)
90           adjustment_failure     0.000000            NaN

[91 rows x 3 columns]
INFO (2026-03-12 17:28:01,226) [cli/main (line 1158)]: Done processing batch ('Male',)
INFO (2026-03-12 17:28:01,227) [cli/main (line 1184)]: Done fitting the model, writing output

balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Output rows: 912

# Inspect weights by gender - each group was adjusted independently
print("Weight statistics by gender (sample only):")
sample_only = batch_adjusted_df[batch_adjusted_df["is_respondent"] == 1]
print(sample_only.groupby("gender")["weight"].describe().round(3))

Weight statistics by gender (sample only):
        count    mean     std    min    25%     50%     75%     max
gender                                                             
Female  268.0  16.981  11.905  6.785  9.566  13.702  19.158  85.648
Male    644.0   7.067   4.981  2.913  3.260   5.776   9.235  35.370

with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input.csv")
    output_path = os.path.join(tmpdir, "weights_cbps.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_cbps.csv")

    load_data_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file", input_path,
        "--output_file", output_path,
        "--diagnostics_output_file", diagnostics_path,
        "--covariate_columns", "gender,age_group,income",
        "--method", "cbps",
    ]

    print("CLI command with CBPS method:")
    print(" ".join(cmd))
    subprocess.check_call(cmd)

    cbps_diagnostics_df = pd.read_csv(diagnostics_path)

# Verify the method used
print("\nAdjustment method used:")
print(cbps_diagnostics_df.query("metric == 'adjustment_method'")[["var", "val"]])

CLI command with CBPS method:
python -m balance.cli --input_file /tmp/tmp4k781k_q/input.csv --output_file /tmp/tmp4k781k_q/weights_cbps.csv --diagnostics_output_file /tmp/tmp4k781k_q/diagnostics_cbps.csv --covariate_columns gender,age_group,income --method cbps

INFO (2026-03-12 17:28:03,733) [__init__/<module> (line 72)]: Using balance version 0.16.1
INFO (2026-03-12 17:28:03,735) [cli/main (line 1095)]: Running cli.main() using balance version 0.16.1
INFO (2026-03-12 17:28:03,735) [cli/main (line 1130)]: Attributes used by main() for running adjust: {'transformations': 'default', 'formula': None, 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 1.5, 'lambda_min': 1e-05, 'lambda_max': 10, 'num_lambdas': 250, 'weight_trimming_mean_ratio': 20.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.16.1'}
INFO (2026-03-12 17:28:03,745) [cli/load_and_check_input (line 926)]: Number of rows in input file: 11000
INFO (2026-03-12 17:28:03,745) [cli/load_and_check_input (line 932)]: Number of columns in input file: 7
WARNING (2026-03-12 17:28:03,897) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:28:03,908) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:28:03,908) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:28:03,909) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
id               int64
is_respondent    int64
dtype: object
WARNING (2026-03-12 17:28:03,909) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:28:03,910) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
id                   str
is_respondent    float64
dtype: object
INFO (2026-03-12 17:28:03,911) [cli/process_batch (line 747)]: balance sample object: 
        balance Sample object
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
WARNING (2026-03-12 17:28:03,918) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:28:03,932) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:28:03,932) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:28:03,932) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
id               int64
is_respondent    int64
dtype: object
WARNING (2026-03-12 17:28:03,932) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:28:03,933) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
id                   str
is_respondent    float64
dtype: object

INFO (2026-03-12 17:28:03,936) [cli/process_batch (line 758)]: balance target object: 
        balance Sample object
        10000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
INFO (2026-03-12 17:28:03,941) [cbps/cbps (line 537)]: Starting cbps function
INFO (2026-03-12 17:28:03,942) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-03-12 17:28:03,942) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group', 'income']
INFO (2026-03-12 17:28:03,949) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group', 'income']
INFO (2026-03-12 17:28:04,064) [cbps/cbps (line 588)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']
INFO (2026-03-12 17:28:04,065) [cbps/cbps (line 599)]: The number of columns in the model matrix: 16
INFO (2026-03-12 17:28:04,065) [cbps/cbps (line 600)]: The number of rows in the model matrix: 11000
INFO (2026-03-12 17:28:04,074) [cbps/cbps (line 669)]: Finding initial estimator for GMM optimization

INFO (2026-03-12 17:28:04,254) [cbps/cbps (line 696)]: Finding initial estimator for GMM optimization that minimizes the balance loss

INFO (2026-03-12 17:28:05,696) [cbps/cbps (line 732)]: Running GMM optimization

INFO (2026-03-12 17:28:08,514) [cbps/cbps (line 859)]: Done cbps function
INFO (2026-03-12 17:28:08,517) [cli/process_batch (line 781)]: Succeeded with adjusting sample to target
INFO (2026-03-12 17:28:08,519) [cli/process_batch (line 782)]: balance adjusted object: 
        Adjusted balance Sample object with target set using cbps
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
        adjustment details:
            method: cbps
            design effect (Deff): 1.500
            effective sample size proportion (ESSP): 0.667
            effective sample size (ESS): 666.7
                
            target:
                 
	        balance Sample object
	        10000 observations x 3 variables: gender,age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: None
	        
            3 common variables: gender,age_group,income
            
INFO (2026-03-12 17:28:08,519) [cli/process_batch (line 784)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-03-12 17:28:08,519) [cli/process_batch (line 788)]: Names of columns to keep for diagnostics: None 
INFO (2026-03-12 17:28:08,519) [sample_class/diagnostics (line 1826)]: Starting computation of diagnostics of the fitting

INFO (2026-03-12 17:28:08,793) [sample_class/diagnostics (line 2069)]: Done computing diagnostics
INFO (2026-03-12 17:28:08,797) [cli/process_batch (line 799)]: balance diagnostics object:                          metric       val            var
0                          size    1000.0     sample_obs
1                          size       3.0  sample_covars
2                          size   10000.0     target_obs
3                          size       3.0  target_covars
4           weights_diagnostics       1.5  design_effect
..                          ...       ...            ...
86  covar_main_asmd_improvement  0.205323         income
87     covar_main_asmd_adjusted  0.175443     mean(asmd)
88   covar_main_asmd_unadjusted  0.326799     mean(asmd)
89  covar_main_asmd_improvement  0.151355     mean(asmd)
90           adjustment_failure         0            NaN

[91 rows x 3 columns]
INFO (2026-03-12 17:28:08,800) [cli/main (line 1184)]: Done fitting the model, writing output

balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Adjustment method used:
     var  val
28  cbps  0.0

with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input.csv")
    output_path = os.path.join(tmpdir, "weights_rake.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_rake.csv")

    load_data_input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file", input_path,
        "--output_file", output_path,
        "--diagnostics_output_file", diagnostics_path,
        "--covariate_columns", "gender,age_group,income",
        "--method", "rake",
    ]

    print("CLI command with rake method:")
    print(" ".join(cmd))
    subprocess.check_call(cmd)

    rake_diagnostics_df = pd.read_csv(diagnostics_path)

# Verify the method used
print("\nAdjustment method used:")
print(rake_diagnostics_df.query("metric == 'adjustment_method'")[["var", "val"]])

CLI command with rake method:
python -m balance.cli --input_file /tmp/tmp1ykvcp1o/input.csv --output_file /tmp/tmp1ykvcp1o/weights_rake.csv --diagnostics_output_file /tmp/tmp1ykvcp1o/diagnostics_rake.csv --covariate_columns gender,age_group,income --method rake

INFO (2026-03-12 17:28:11,239) [__init__/<module> (line 72)]: Using balance version 0.16.1
INFO (2026-03-12 17:28:11,241) [cli/main (line 1095)]: Running cli.main() using balance version 0.16.1
INFO (2026-03-12 17:28:11,241) [cli/main (line 1130)]: Attributes used by main() for running adjust: {'transformations': 'default', 'formula': None, 'penalty_factor': None, 'one_hot_encoding': True, 'max_de': 1.5, 'lambda_min': 1e-05, 'lambda_max': 10, 'num_lambdas': 250, 'weight_trimming_mean_ratio': 20.0, 'sample_cls': <class 'balance.sample_class.Sample'>, 'sample_package_name': 'balance', 'sample_package_version': '0.16.1'}
INFO (2026-03-12 17:28:11,251) [cli/load_and_check_input (line 926)]: Number of rows in input file: 11000
INFO (2026-03-12 17:28:11,251) [cli/load_and_check_input (line 932)]: Number of columns in input file: 7
WARNING (2026-03-12 17:28:11,406) [sample_class/from_frame (line 469)]: Casting id column to string
WARNING (2026-03-12 17:28:11,417) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:28:11,417) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:28:11,418) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:28:11,418) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:28:11,419) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:28:11,420) [cli/process_batch (line 747)]: balance sample object: 
        balance Sample object
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
WARNING (2026-03-12 17:28:11,428) [sample_class/from_frame (line 469)]: Casting id column to string

WARNING (2026-03-12 17:28:11,441) [pandas_utils/_warn_of_df_dtypes_change (line 514)]: The dtypes of sample._df were changed from the original dtypes of the input df, here are the differences - 
WARNING (2026-03-12 17:28:11,441) [pandas_utils/_warn_of_df_dtypes_change (line 525)]: The (old) dtypes that changed for df (before the change):
WARNING (2026-03-12 17:28:11,442) [pandas_utils/_warn_of_df_dtypes_change (line 528)]: 
is_respondent    int64
id               int64
dtype: object
WARNING (2026-03-12 17:28:11,442) [pandas_utils/_warn_of_df_dtypes_change (line 529)]: The (new) dtypes saved in df (after the change):
WARNING (2026-03-12 17:28:11,443) [pandas_utils/_warn_of_df_dtypes_change (line 530)]: 
is_respondent    float64
id                   str
dtype: object
INFO (2026-03-12 17:28:11,445) [cli/process_batch (line 758)]: balance target object: 
        balance Sample object
        10000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
INFO (2026-03-12 17:28:11,452) [adjustment/apply_transformations (line 433)]: Adding the variables: []
INFO (2026-03-12 17:28:11,452) [adjustment/apply_transformations (line 434)]: Transforming the variables: ['gender', 'age_group', 'income']
INFO (2026-03-12 17:28:11,459) [adjustment/apply_transformations (line 469)]: Final variables in output: ['gender', 'age_group', 'income']
INFO (2026-03-12 17:28:11,501) [rake/rake (line 274)]: Final covariates and levels that will be used in raking: {'age_group': ['18-24', '25-34', '35-44', '45+'], 'gender': ['Female', 'Male', '__NaN__'], 'income': ['(-0.0009997440000000001, 0.44]', '(0.44, 1.664]', '(1.664, 3.472]', '(11.312, 15.139]', '(15.139, 20.567]', '(20.567, 29.504]', '(29.504, 128.536]', '(3.472, 5.663]', '(5.663, 8.211]', '(8.211, 11.312]']}.
INFO (2026-03-12 17:28:11,522) [cli/process_batch (line 781)]: Succeeded with adjusting sample to target
INFO (2026-03-12 17:28:11,524) [cli/process_batch (line 782)]: balance adjusted object: 
        Adjusted balance Sample object with target set using rake
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: None
        
        adjustment details:
            method: rake
            design effect (Deff): 3.774
            effective sample size proportion (ESSP): 0.265
            effective sample size (ESS): 265.0
                
            target:
                 
	        balance Sample object
	        10000 observations x 3 variables: gender,age_group,income
	        id_column: id, weight_column: weight,
	        outcome_columns: None
	        
            3 common variables: gender,age_group,income
            
INFO (2026-03-12 17:28:11,524) [cli/process_batch (line 784)]: Condition on which rows to keep for diagnostics: None 
INFO (2026-03-12 17:28:11,524) [cli/process_batch (line 788)]: Names of columns to keep for diagnostics: None 
INFO (2026-03-12 17:28:11,524) [sample_class/diagnostics (line 1826)]: Starting computation of diagnostics of the fitting

INFO (2026-03-12 17:28:11,794) [sample_class/diagnostics (line 2069)]: Done computing diagnostics
INFO (2026-03-12 17:28:11,798) [cli/process_batch (line 799)]: balance diagnostics object:                          metric           val            var
0                          size   1000.000000     sample_obs
1                          size      3.000000  sample_covars
2                          size  10000.000000     target_obs
3                          size      3.000000  target_covars
4           weights_diagnostics      3.773786  design_effect
..                          ...           ...            ...
61  covar_main_asmd_improvement      0.462436         income
62     covar_main_asmd_adjusted      0.014651     mean(asmd)
63   covar_main_asmd_unadjusted      0.326799     mean(asmd)
64  covar_main_asmd_improvement      0.312147     mean(asmd)
65           adjustment_failure      0.000000            NaN

[66 rows x 3 columns]
INFO (2026-03-12 17:28:11,800) [cli/main (line 1184)]: Done fitting the model, writing output

balance (Version 0.16.1) loaded:
    📖 Documentation: https://import-balance.org/
    🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
    📄 Citation:
        Sarig, T., Galili, T., & Eilat, R. (2023).
        balance - a Python package for balancing biased data samples.
        https://arxiv.org/abs/2307.06024

    Tip: You can view this message anytime with balance.help()

Adjustment method used:
     var  val
28  rake  0.0

import session_info
session_info.show(html=False, dependencies=True)

-----
balance             0.16.1
pandas              3.0.1
session_info        v1.0.1
-----
81d243bd2c585b0f4821__mypyc NA
PIL                         12.1.1
anyio                       NA
arrow                       1.4.0
asttokens                   NA
attr                        25.4.0
attrs                       25.4.0
babel                       2.18.0
certifi                     2026.02.25
charset_normalizer          3.4.5
comm                        0.2.3
cycler                      0.12.1
cython_runtime              NA
dateutil                    2.9.0.post0
debugpy                     1.8.20
decorator                   5.2.1
defusedxml                  0.7.1
executing                   2.2.1
fastjsonschema              NA
fqdn                        NA
idna                        3.11
ipykernel                   7.2.0
isoduration                 NA
jedi                        0.19.2
jinja2                      3.1.6
joblib                      1.5.3
json5                       0.13.0
jsonpointer                 3.0.0
jsonschema                  4.26.0
jsonschema_specifications   NA
jupyter_events              0.12.0
jupyter_server              2.17.0
jupyterlab_server           2.28.0
kiwisolver                  1.5.0
lark                        1.3.1
markupsafe                  3.0.3
matplotlib                  3.10.8
mpl_toolkits                NA
narwhals                    2.18.0
nbformat                    5.10.4
numpy                       2.4.3
packaging                   26.0
parso                       0.8.6
patsy                       1.0.2
platformdirs                4.9.4
plotly                      6.6.0
prometheus_client           NA
prompt_toolkit              3.0.52
psutil                      7.2.2
pure_eval                   0.2.3
pydev_ipython               NA
pydevconsole                NA
pydevd                      3.2.3
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.19.2
pyparsing                   3.3.2
pythonjsonlogger            NA
referencing                 NA
requests                    2.32.5
rfc3339_validator           0.1.4
rfc3986_validator           0.1.1
rfc3987_syntax              NA
rpds                        NA
scipy                       1.17.1
seaborn                     0.13.2
send2trash                  NA
six                         1.17.0
sklearn                     1.8.0
sphinxcontrib               NA
stack_data                  0.6.3
statsmodels                 0.14.6
threadpoolctl               3.6.0
tornado                     6.5.5
traitlets                   5.14.3
typing_extensions           NA
uri_template                NA
urllib3                     2.6.3
wcwidth                     0.6.0
webcolors                   NA
websocket                   1.9.0
yaml                        6.0.3
zmq                         27.1.0
zoneinfo                    NA
-----
IPython             9.11.0
jupyter_client      8.8.0
jupyter_core        5.9.1
jupyterlab          4.5.6
notebook            7.5.5
-----
Python 3.12.12 (main, Oct 10 2025, 01:01:16) [GCC 13.3.0]
Linux-6.14.0-1017-azure-x86_64-with-glibc2.39
-----
Session information updated at 2026-03-12 17:28

Argument	Default	Description
`--method`	`ipw`	Adjustment method: `ipw`, `cbps`, or `rake`
`--max_de`	`1.5`	Maximum design effect. Set to `None` to use `lambda_1se` instead
`--lambda_min`	`1e-05`	Lower bound for L1 penalty (IPW only)
`--lambda_max`	`10`	Upper bound for L1 penalty (IPW only)
`--num_lambdas`	`250`	Number of lambda values to search (IPW only)
`--weight_trimming_mean_ratio`	`20.0`	Trim weights above `mean(weights) * ratio`
`--transformations`	`default`	Covariate transformations. Use `None` to disable
`--formula`	`None`	Custom model formula (e.g., `"gender + income"`)
`--one_hot_encoding`	`True`	One-hot encode categorical features
`--batch_columns`	`None`	Columns to group by for batch processing
`--keep_columns`	`None`	Subset of columns to include in output
`--outcome_columns`	`None`	Columns treated as outcomes (not covariates)
`--ipw_logistic_regression_kwargs`	`None`	JSON string of kwargs for sklearn LogisticRegression
`--succeed_on_weighting_failure`	`False`	Return null weights instead of failing on errors

CLI tutorial¶

Prerequisites¶

Use the bundled demo data¶

Run the CLI¶

Inspect diagnostics¶

CLI Help and Arguments¶

Key CLI Arguments Summary¶

Example: Tuning IPW parameters¶

Example: Using a Custom Formula¶

Batch Processing Example¶

Alternative Weighting Methods¶

Example: CBPS Method¶

Example: Rake Method¶

Next steps¶

Session info¶

	id	gender	age_group	income	happiness	is_respondent	weight
0	0	Male	25-34	6.428659	26.043029	1	1.0
1	1	Female	18-24	9.940280	66.885485	1	1.0
2	2	Male	18-24	2.673623	37.091922	1	1.0
3	3	NaN	18-24	10.550308	49.394050	1	1.0
4	4	NaN	18-24	2.689994	72.304208	1	1.0