balance.cli

class balance.cli.BalanceCLI(args: Namespace)[source]

Helper class that encapsulates CLI argument handling and execution.

Examples

Tutorial:

https://import-balance.org/docs/tutorials/balance_cli_tutorial/

adapt_output(output_df: DataFrame) DataFrame[source]

Filter raw output dataframe to user’s requested rows/columns.

  • First we filter to the rows we are supposed to keep.

  • Next we select the columns that need to be returned.

Parameters:

output_df – DataFrame produced by the adjustment step.

Returns:

Filtered DataFrame containing requested rows and columns.

Examples

batch_columns() List[str][source]

Return the list of batch column names.

Returns:

Batch column names parsed from the CLI argument.

Examples

check_input_columns(columns: List[str] | pd.Index) None[source]

Validate the input frame includes required columns.

Parameters:

columns – Available column names in the input data.

Returns:

None.

Examples

covariate_columns() List[str][source]

Return the list of covariate column names.

Returns:

Covariate column names parsed from the CLI argument.

Examples

covariate_columns_for_diagnostics() List[str] | None[source]

Return covariate columns used for diagnostics reporting.

Returns:

List of columns to keep in diagnostics or None.

Examples

formula() str | None[source]

Return the formula string used for model matrices.

Returns:

Formula string or None if unset.

Examples

has_batch_columns() bool[source]

Return True when batch columns are supplied.

Returns:

True if batch columns are set, otherwise False.

Examples

has_keep_columns() bool[source]

Return True when output keep columns are supplied.

Returns:

True if keep columns are set, otherwise False.

Examples

has_keep_row_column() bool[source]

Return True when a keep-row column is supplied.

Returns:

True if a keep-row column is set, otherwise False.

Examples

has_outcome_columns() bool[source]

Return True when outcome columns are explicitly supplied.

Returns:

True if outcome columns are set, otherwise False.

Examples

id_column() str[source]

Return the identifier column name.

Returns:

Name of the ID column.

Examples

keep_columns() List[str] | None[source]

Return the subset of columns to keep in outputs.

Returns:

List of columns to keep or None if unspecified.

Examples

keep_row_column() str | None[source]

Return the keep-row indicator column name.

Returns:

Name of the keep-row indicator column.

Examples

lambda_max() float | None[source]

Return the maximum L1 penalty setting.

Returns:

Maximum L1 penalty value or None.

Examples

lambda_min() float | None[source]

Return the minimum L1 penalty setting.

Returns:

Minimum L1 penalty value or None.

Examples

load_and_check_input() DataFrame[source]

Read the input file and log basic information.

Returns:

DataFrame loaded from the input file.

Examples

logistic_regression_kwargs() Dict[str, Any] | None[source]

Parse JSON keyword arguments for the IPW logistic regression model.

Returns:

Parsed keyword arguments dictionary or None.

Examples

logistic_regression_model() ClassifierMixin | None[source]

Build a LogisticRegression model when IPW kwargs are supplied.

Returns:

Configured LogisticRegression instance or None.

Examples

main() None[source]

Run the CLI workflow from input loading to output writing.

Returns:

None.

Examples

max_de() float | None[source]

Return the max design effect setting.

Returns:

Maximum design effect or None if unset.

Examples

method() str[source]

Return the adjustment method name.

Returns:

The adjustment method string (for example, "ipw").

Examples

num_lambdas() int | None[source]

Return the number of lambda values to search over.

Returns:

Number of lambdas as an integer or None.

Examples

one_hot_encoding() bool | None[source]

Return the parsed one-hot encoding flag.

Returns:

True/False for one-hot encoding, or None if unset.

Examples

outcome_columns() List[str] | None[source]

Return the list of outcome columns if provided.

Returns:

List of outcome columns or None if unset.

Examples

process_batch(batch_df: pd.DataFrame, transformations: Dict[str, Any] | str | None = 'default', formula: str | None = None, penalty_factor: None = None, one_hot_encoding: bool = False, max_de: float | None = 1.5, lambda_min: float | None = 1e-05, lambda_max: float | None = 10, num_lambdas: int | None = 250, weight_trimming_mean_ratio: float | None = 20, sample_cls: Type[balance_sample_cls] = <class 'balance.sample_class.Sample'>, sample_package_name: str = 'balance') Dict[str, pd.DataFrame][source]

Run adjustment for a batch of data and return outputs.

Parameters:
  • batch_df – Input data for the current batch.

  • transformations – Transformations argument for Sample.adjust.

  • formula – Optional formula for model matrices.

  • penalty_factor – Optional penalty factor passed to adjust.

  • one_hot_encoding – Whether to one-hot encode categorical features.

  • max_de – Maximum design effect constraint.

  • lambda_min – Minimum penalty value for IPW.

  • lambda_max – Maximum penalty value for IPW.

  • num_lambdas – Number of penalty values to search.

  • weight_trimming_mean_ratio – Mean ratio for trimming weights.

  • sample_cls – Sample implementation used to build sample/target.

  • sample_package_name – Name used in logging.

Returns:

Dict with adjusted data and diagnostics frames.

Examples

rows_to_keep_for_diagnostics() str | None[source]

Return the diagnostics row-filter expression.

Returns:

The pandas expression string used to filter rows.

Examples

sample_column() str[source]

Return the column indicating sample membership.

Returns:

Name of the sample indicator column.

Examples

split_sample(df: DataFrame) Tuple[DataFrame, DataFrame][source]

Split the input frame into sample and target partitions.

Parameters:

df – Input DataFrame containing sample and target rows.

Returns:

A tuple of (sample_df, target_df).

Examples

standardize_types() bool[source]

Return whether to standardize input types in Sample.from_frame.

Returns:

True if standardization is enabled, otherwise False.

Examples

transformations() str | None[source]

Return the transformations config for adjustment.

Returns:

Transformations setting or None if disabled.

Examples

update_attributes_for_main_used_by_adjust() None[source]

Prepare cached attributes for main to use in adjustment.

Returns:

None.

Examples

weight_column() str[source]

Return the weight column name.

Returns:

Name of the weight column.

Examples

weight_trimming_mean_ratio() float | None[source]

Return the mean ratio used for trimming weights.

Returns:

Weight trimming ratio or None if unset.

Examples

weights_impact_on_outcome_method() str | None[source]

Return the outcome weight impact method for diagnostics.

Returns:

The method name or None if disabled.

Examples

write_outputs(output_df: DataFrame, diagnostics_df: DataFrame) None[source]

Write adjusted output and diagnostics CSV files.

Parameters:
  • output_df – Adjusted output DataFrame to write.

  • diagnostics_df – Diagnostics DataFrame to write.

Returns:

None.

Examples

balance.cli.add_arguments_to_parser(parser: ArgumentParser) ArgumentParser[source]

Register CLI arguments on an argparse parser.

Parameters:

parser – Parser to add arguments to.

Returns:

The parser instance with CLI arguments registered.

Examples

balance.cli.main() None[source]

Entry point for the balance CLI.

Returns:

None.

Examples

balance.cli.make_parser() ArgumentParser[source]

Create and return the CLI argument parser.

Returns:

A configured ArgumentParser for the balance CLI.

Examples