balance.cli

class balance.cli.BalanceCLI(args: Namespace)[source]

Helper class that encapsulates CLI argument handling and execution.

Examples

Tutorial:

https://import-balance.org/docs/tutorials/balance_cli_tutorial/

adapt_output(output_df: DataFrame) DataFrame[source]

Filter raw output dataframe to user’s requested rows/columns.

  • First we filter to the rows we are supposed to keep.

  • Next we select the columns that need to be returned.

Parameters:

output_df – DataFrame produced by the adjustment step.

Returns:

Filtered DataFrame containing requested rows and columns.

Examples

batch_columns() List[str][source]

Return the list of batch column names.

Returns:

Batch column names parsed from the CLI argument.

Examples

check_input_columns(columns: List[str] | Index) None[source]

Validate the input frame includes required columns.

Parameters:

columns – Available column names in the input data.

Returns:

None.

Examples

covariate_columns() List[str][source]

Return the list of covariate column names.

Returns:

Covariate column names parsed from the CLI argument.

Examples

covariate_columns_for_diagnostics() List[str] | None[source]

Return covariate columns used for diagnostics reporting.

Returns:

List of columns to keep in diagnostics or None.

Examples

formula() str | None[source]

Return the formula string used for model matrices.

Returns:

Formula string or None if unset.

Examples

has_batch_columns() bool[source]

Return True when batch columns are supplied.

Returns:

True if batch columns are set, otherwise False.

Examples

has_keep_columns() bool[source]

Return True when output keep columns are supplied.

Keep columns control which columns appear in the final output CSV. After adjustment, the output DataFrame is subsetted to contain only these columns (see adapt_output()).

Note that keep columns that are not the id, weight, a covariate, or an explicit outcome column will be placed into ignore_columns by process_batch(). They are still carried through the Sample and available in the output.

Returns:

True if keep columns are set, otherwise False.

Examples

has_keep_row_column() bool[source]

Return True when a keep-row column is supplied.

Returns:

True if a keep-row column is set, otherwise False.

Examples

has_outcome_columns() bool[source]

Return True when outcome columns are explicitly supplied.

Returns:

True if outcome columns are set, otherwise False.

Examples

id_column() str[source]

Return the identifier column name.

Returns:

Name of the ID column.

Examples

keep_columns() List[str] | None[source]

Return the subset of columns to keep in outputs.

These columns are used to filter the final output DataFrame. Keep columns that are not the id, weight, a covariate, or an explicit outcome column will be placed into ignore_columns during processing but are still retained by the Sample and included in the output.

Returns:

List of columns to keep or None if unspecified.

Examples

keep_row_column() str | None[source]

Return the keep-row indicator column name.

Returns:

Name of the keep-row indicator column.

Examples

lambda_max() float | None[source]

Return the maximum L1 penalty setting.

Returns:

Maximum L1 penalty value or None.

Examples

lambda_min() float | None[source]

Return the minimum L1 penalty setting.

Returns:

Minimum L1 penalty value or None.

Examples

load_and_check_input() DataFrame[source]

Read the input file and log basic information.

Returns:

DataFrame loaded from the input file.

Examples

logistic_regression_kwargs() Dict[str, Any] | None[source]

Parse JSON keyword arguments for the IPW logistic regression model.

Returns:

Parsed keyword arguments dictionary or None.

Examples

logistic_regression_model() ClassifierMixin | None[source]

Build a LogisticRegression model when IPW kwargs are supplied.

Returns:

Configured LogisticRegression instance or None.

Examples

main() None[source]

Run the CLI workflow from input loading to output writing.

Returns:

None.

Examples

max_de() float | None[source]

Return the max design effect setting.

Returns:

Maximum design effect or None if unset.

Examples

method() str[source]

Return the adjustment method name.

Returns:

The adjustment method string (for example, "ipw").

Examples

num_lambdas() int | None[source]

Return the number of lambda values to search over.

Returns:

Number of lambdas as an integer or None.

Examples

one_hot_encoding() bool | None[source]

Return the parsed one-hot encoding flag.

Returns:

True/False for one-hot encoding, or None if unset.

Examples

outcome_columns() List[str] | None[source]

Return the list of outcome columns if provided.

Returns:

List of outcome columns or None if unset.

Examples

process_batch(batch_df: DataFrame, transformations: Dict[str, ~typing.Any] | str | None='default', formula: str | None = None, penalty_factor: None = None, one_hot_encoding: bool = False, max_de: float | None = 1.5, lambda_min: float | None = 1e-05, lambda_max: float | None = 10, num_lambdas: int | None = 250, weight_trimming_mean_ratio: float | None = 20, sample_cls: Type[Sample] = <class 'balance.sample_class.Sample'>, sample_package_name: str = 'balance') Dict[str, DataFrame][source]

Run adjustment for a batch of data and return outputs.

Parameters:
  • batch_df – Input data for the current batch.

  • transformations – Transformations argument for Sample.adjust.

  • formula – Optional formula for model matrices.

  • penalty_factor – Optional penalty factor passed to adjust.

  • one_hot_encoding – Whether to one-hot encode categorical features.

  • max_de – Maximum design effect constraint.

  • lambda_min – Minimum penalty value for IPW.

  • lambda_max – Maximum penalty value for IPW.

  • num_lambdas – Number of penalty values to search.

  • weight_trimming_mean_ratio – Mean ratio for trimming weights.

  • sample_cls – Sample implementation used to build sample/target.

  • sample_package_name – Name used in logging.

Returns:

Dict with adjusted data and diagnostics frames.

Examples

rows_to_keep_for_diagnostics() str | None[source]

Return the diagnostics row-filter expression.

Returns:

The pandas expression string used to filter rows.

Examples

sample_column() str[source]

Return the column indicating sample membership.

Returns:

Name of the sample indicator column.

Examples

split_sample(df: DataFrame) Tuple[DataFrame, DataFrame][source]

Split the input frame into sample and target partitions.

Parameters:

df – Input DataFrame containing sample and target rows.

Returns:

A tuple of (sample_df, target_df).

Examples

standardize_types() bool[source]

Return whether to standardize input types in Sample.from_frame.

Returns:

True if standardization is enabled, otherwise False.

Examples

transformations() str | None[source]

Return the transformations config for adjustment.

Returns:

Transformations setting or None if disabled.

Examples

update_attributes_for_main_used_by_adjust() None[source]

Prepare cached attributes for main to use in adjustment.

Returns:

None.

Examples

weight_column() str[source]

Return the weight column name.

Returns:

Name of the weight column.

Examples

weight_trimming_mean_ratio() float | None[source]

Return the mean ratio used for trimming weights.

Returns:

Weight trimming ratio or None if unset.

Examples

weights_impact_on_outcome_method() str | None[source]

Return the outcome weight impact method for diagnostics.

Returns:

The method name or None if disabled.

Examples

write_outputs(output_df: DataFrame, diagnostics_df: DataFrame) None[source]

Write adjusted output and diagnostics CSV files.

Parameters:
  • output_df – Adjusted output DataFrame to write.

  • diagnostics_df – Diagnostics DataFrame to write.

Returns:

None.

Examples

balance.cli.add_arguments_to_parser(parser: ArgumentParser) ArgumentParser[source]

Register CLI arguments on an argparse parser.

Parameters:

parser – Parser to add arguments to.

Returns:

The parser instance with CLI arguments registered.

Examples

balance.cli.main() None[source]

Entry point for the balance CLI.

Returns:

None.

Examples

balance.cli.make_parser() ArgumentParser[source]

Create and return the CLI argument parser.

Returns:

A configured ArgumentParser for the balance CLI.

Examples