balance.cli¶
- class balance.cli.BalanceCLI(args: Namespace)[source]¶
Helper class that encapsulates CLI argument handling and execution.
Examples
- adapt_output(output_df: DataFrame) DataFrame[source]¶
Filter raw output dataframe to user’s requested rows/columns.
First we filter to the rows we are supposed to keep.
Next we select the columns that need to be returned.
- Parameters:
output_df – DataFrame produced by the adjustment step.
- Returns:
Filtered DataFrame containing requested rows and columns.
Examples
- batch_columns() List[str][source]¶
Return the list of batch column names.
- Returns:
Batch column names parsed from the CLI argument.
Examples
- check_input_columns(columns: List[str] | Index) None[source]¶
Validate the input frame includes required columns.
- Parameters:
columns – Available column names in the input data.
- Returns:
None.
Examples
- covariate_columns() List[str][source]¶
Return the list of covariate column names.
- Returns:
Covariate column names parsed from the CLI argument.
Examples
- covariate_columns_for_diagnostics() List[str] | None[source]¶
Return covariate columns used for diagnostics reporting.
- Returns:
List of columns to keep in diagnostics or
None.
Examples
- formula() str | None[source]¶
Return the formula string used for model matrices.
- Returns:
Formula string or
Noneif unset.
Examples
- has_batch_columns() bool[source]¶
Return True when batch columns are supplied.
- Returns:
Trueif batch columns are set, otherwiseFalse.
Examples
- has_keep_columns() bool[source]¶
Return True when output keep columns are supplied.
Keep columns control which columns appear in the final output CSV. After adjustment, the output DataFrame is subsetted to contain only these columns (see
adapt_output()).Note that keep columns that are not the id, weight, a covariate, or an explicit outcome column will be placed into
ignore_columnsbyprocess_batch(). They are still carried through theSampleand available in the output.- Returns:
Trueif keep columns are set, otherwiseFalse.
Examples
- has_keep_row_column() bool[source]¶
Return True when a keep-row column is supplied.
- Returns:
Trueif a keep-row column is set, otherwiseFalse.
Examples
- has_outcome_columns() bool[source]¶
Return True when outcome columns are explicitly supplied.
- Returns:
Trueif outcome columns are set, otherwiseFalse.
Examples
- id_column() str[source]¶
Return the identifier column name.
- Returns:
Name of the ID column.
Examples
- keep_columns() List[str] | None[source]¶
Return the subset of columns to keep in outputs.
These columns are used to filter the final output DataFrame. Keep columns that are not the id, weight, a covariate, or an explicit outcome column will be placed into
ignore_columnsduring processing but are still retained by theSampleand included in the output.- Returns:
List of columns to keep or
Noneif unspecified.
Examples
- keep_row_column() str | None[source]¶
Return the keep-row indicator column name.
- Returns:
Name of the keep-row indicator column.
Examples
- lambda_max() float | None[source]¶
Return the maximum L1 penalty setting.
- Returns:
Maximum L1 penalty value or
None.
Examples
- lambda_min() float | None[source]¶
Return the minimum L1 penalty setting.
- Returns:
Minimum L1 penalty value or
None.
Examples
- load_and_check_input() DataFrame[source]¶
Read the input file and log basic information.
- Returns:
DataFrame loaded from the input file.
Examples
- logistic_regression_kwargs() Dict[str, Any] | None[source]¶
Parse JSON keyword arguments for the IPW logistic regression model.
- Returns:
Parsed keyword arguments dictionary or
None.
Examples
- logistic_regression_model() ClassifierMixin | None[source]¶
Build a LogisticRegression model when IPW kwargs are supplied.
- Returns:
Configured LogisticRegression instance or
None.
Examples
- main() None[source]¶
Run the CLI workflow from input loading to output writing.
- Returns:
None.
Examples
- max_de() float | None[source]¶
Return the max design effect setting.
- Returns:
Maximum design effect or
Noneif unset.
Examples
- method() str[source]¶
Return the adjustment method name.
- Returns:
The adjustment method string (for example,
"ipw").
Examples
- num_lambdas() int | None[source]¶
Return the number of lambda values to search over.
- Returns:
Number of lambdas as an integer or
None.
Examples
- one_hot_encoding() bool | None[source]¶
Return the parsed one-hot encoding flag.
- Returns:
True/Falsefor one-hot encoding, orNoneif unset.
Examples
- outcome_columns() List[str] | None[source]¶
Return the list of outcome columns if provided.
- Returns:
List of outcome columns or
Noneif unset.
Examples
- process_batch(batch_df: DataFrame, transformations: Dict[str, ~typing.Any] | str | None='default', formula: str | None = None, penalty_factor: None = None, one_hot_encoding: bool = False, max_de: float | None = 1.5, lambda_min: float | None = 1e-05, lambda_max: float | None = 10, num_lambdas: int | None = 250, weight_trimming_mean_ratio: float | None = 20, sample_cls: Type[Sample] = <class 'balance.sample_class.Sample'>, sample_package_name: str = 'balance') Dict[str, DataFrame][source]¶
Run adjustment for a batch of data and return outputs.
- Parameters:
batch_df – Input data for the current batch.
transformations – Transformations argument for Sample.adjust.
formula – Optional formula for model matrices.
penalty_factor – Optional penalty factor passed to adjust.
one_hot_encoding – Whether to one-hot encode categorical features.
max_de – Maximum design effect constraint.
lambda_min – Minimum penalty value for IPW.
lambda_max – Maximum penalty value for IPW.
num_lambdas – Number of penalty values to search.
weight_trimming_mean_ratio – Mean ratio for trimming weights.
sample_cls – Sample implementation used to build sample/target.
sample_package_name – Name used in logging.
- Returns:
Dict with adjusted data and diagnostics frames.
Examples
- rows_to_keep_for_diagnostics() str | None[source]¶
Return the diagnostics row-filter expression.
- Returns:
The pandas expression string used to filter rows.
Examples
- sample_column() str[source]¶
Return the column indicating sample membership.
- Returns:
Name of the sample indicator column.
Examples
- split_sample(df: DataFrame) Tuple[DataFrame, DataFrame][source]¶
Split the input frame into sample and target partitions.
- Parameters:
df – Input DataFrame containing sample and target rows.
- Returns:
A tuple of (sample_df, target_df).
Examples
- standardize_types() bool[source]¶
Return whether to standardize input types in Sample.from_frame.
- Returns:
Trueif standardization is enabled, otherwiseFalse.
Examples
- transformations() str | None[source]¶
Return the transformations config for adjustment.
- Returns:
Transformations setting or
Noneif disabled.
Examples
- update_attributes_for_main_used_by_adjust() None[source]¶
Prepare cached attributes for main to use in adjustment.
- Returns:
None.
Examples
- weight_column() str[source]¶
Return the weight column name.
- Returns:
Name of the weight column.
Examples
- weight_trimming_mean_ratio() float | None[source]¶
Return the mean ratio used for trimming weights.
- Returns:
Weight trimming ratio or
Noneif unset.
Examples