balance.balance_frame¶

BalanceFrame: workflow orchestrator for survey/observational data reweighting.

Pairs a responder SampleFrame with a target SampleFrame and exposes an immutable adjust() method that returns a new, weight-augmented BalanceFrame.

class balance.balance_frame.BalanceFrame(sample: SampleFrame | None = None, target: SampleFrame | None = None)[source]¶

A pairing of responder and target SampleFrames for survey weighting.

BalanceFrame holds two SampleFrame instances — responders (the sample to be reweighted) and target (the population benchmark) — and provides methods for adjusting responder weights and computing diagnostics.

BalanceFrame is immutable by convention: adjust() returns a new BalanceFrame rather than modifying the existing one. This makes it safe to keep a reference to the pre-adjustment state.

Must be constructed via the public constructor BalanceFrame(sample=..., target=...) which delegates to the internal _create() factory.

responders¶

The responder sample.

Type:: SampleFrame

target¶

The target population.

Type:: SampleFrame

is_adjusted¶

Whether adjust() has been called.

Type:: bool

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]}))
>>> bf = BalanceFrame(sample=resp, target=tgt)
>>> bf.is_adjusted
False
>>> adjusted = bf.adjust(method="ipw")
>>> adjusted.is_adjusted
True
>>> bf.is_adjusted  # original unchanged
False

adjust(target: BalanceFrame | None = None, method: str | Callable[..., Any] = 'ipw', *args: Any, **kwargs: Any) → Self[source]¶

Adjust responder weights to match the target. Returns a NEW BalanceFrame.

The original BalanceFrame is not modified (immutable pattern). The returned BalanceFrame has is_adjusted == True and the pre-adjustment responders stored in unadjusted.

The active weight column always keeps its original name (e.g., "weight"). Its values are overwritten with the new adjusted weights. The full weight history is tracked via additional columns:

Weight columns after each adjustment¶
After	Weight columns in `responders`	Active (`"weight"`)
Before adjust	`weight`	original design weights
1st adjust	`weight`, `weight_pre_adjust`, `weight_adjusted_1`	= `weight_adjusted_1` values
2nd adjust	`weight_adjusted_2`	= `weight_adjusted_2` values
3rd adjust	`weight_adjusted_3`	= `weight_adjusted_3` values

Compound / sequential adjustments: adjust() can be called multiple times. Each call uses the current (previously adjusted) weights as design weights, so adjustments compound. For example, run IPW first to correct broad imbalances, then rake on a specific variable for fine-tuning:

adjusted_ipw = bf.adjust(method="ipw", max_de=2)
adjusted_final = adjusted_ipw.adjust(method="rake")

The original unadjusted baseline is always preserved:

_sf_sample_pre_adjust always points to the original (pre-first-adjustment) SampleFrame.
_links["unadjusted"] always points to the original unadjusted BalanceFrame, so 3-way comparisons (adjusted vs original vs target) and asmd_improvement() show total improvement across all adjustment steps.
model stores only the latest adjustment’s model dict.

Parameters:

target – Optional target BalanceFrame/Sample. If provided, calls set_target(target) first, then adjusts. If None, uses the already-set target.
method – The weighting method to use. Built-in options: "ipw", "cbps", "rake", "poststratify", "null". A callable with the same signature as the built-in methods is also accepted.
*args – Positional arguments (forwarded on recursive call only).
**kwargs – Additional keyword arguments forwarded to the adjustment function (e.g. max_de, transformations).

Returns:

A new, adjusted BalanceFrame.

Raises:

ValueError – If method is a string that doesn’t match any registered adjustment method.

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2, 3], "x": [10.0, 20.0, 30.0],
...                   "weight": [1.0, 1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [4, 5, 6], "x": [15.0, 25.0, 35.0],
...                   "weight": [1.0, 1.0, 1.0]}))
>>> bf = BalanceFrame(sample=resp, target=tgt)
>>> adjusted = bf.adjust(method="ipw")
>>> adjusted.is_adjusted
True
>>> adjusted2 = adjusted.adjust(method="null")
>>> adjusted2.is_adjusted
True

property adjustment_history: list[dict[str, Any]]¶

Chronological adjustment model history.

Returns a best-effort read-only copy of each recorded adjustment step so callers can inspect compound reweighting workflows without mutating internal state. The model property remains the latest adjustment model for backward compatibility.

Deep-copy is attempted per entry; when a payload is not deepcopy-safe, the method falls back to copying the step dictionary and nested model mapping (when present).

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [1.0, 2.0], "weight": [1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [1.5, 2.5], "weight": [1.0, 1.0]}))
>>> bf = BalanceFrame(sample=resp, target=tgt).adjust(method="null")
>>> len(bf.adjustment_history)
1

covars(formula: str | list[str] | None = None) → Any[source]¶

Return a BalanceDFCovars for the responders.

The returned object carries linked target (and unadjusted, if adjusted) views so that methods like .mean() and .asmd() automatically include comparisons across sources.

Parameters:: formula – Optional formula string (or list) for model matrix construction. Passed through to BalanceDFCovars.
Returns:: Covariate view with linked sources.
Return type:: BalanceDFCovars

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]}))
>>> bf = BalanceFrame(sample=resp, target=tgt)
>>> bf.covars().df.columns.tolist()
['x']

design_matrix(on: Literal['sample'], *, data: BalanceFrame | None = None) → DataFrame[source]¶

design_matrix(on: Literal['target'], *, data: BalanceFrame | None = None) → DataFrame

design_matrix(on: Literal['both'] = 'both', *, data: BalanceFrame | None = None) → tuple[DataFrame, DataFrame]

Return the IPW model’s design matrices.

Returns the model matrices (feature matrices) built by the stored preprocessing pipeline — after formula expansion, one-hot encoding, NA indicator addition, scaling, and penalty weighting.

When data is provided, the stored preprocessing is applied to data’s covariates and the result is returned without caching. When data is None (default), stored/cached matrices for this object’s own data are returned (original behavior).

Parameters:

on – Which population’s matrix to return. "sample" returns the respondent matrix, "target" returns the target matrix, and "both" returns (sample_matrix, target_matrix).
data – An optional BalanceFrame whose covariates are transformed using this object’s stored preprocessing pipeline. The data BalanceFrame does not need to be adjusted — it just provides covariates. Must have matching covariate column names.

Returns:

A model-matrix DataFrame, or a tuple of two DataFrames when on="both".

Raises:

ValueError – If the object is not IPW-adjusted, if target is missing for on in {"target", "both"}, if recomputation of sample-side artifacts is required but no target is available, if on is invalid, or if data has mismatched covariate columns.

Notes

When data is None and stored fit artifacts are stale for the current rows (e.g., after set_fitted_model()), this method recomputes and caches refreshed matrices. That cache update is an intentional in-memory mutation. When data is provided, no caching occurs.

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [0.0, 1.0], "weight": [1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [0.2, 0.8], "weight": [1.0, 1.0]}))
>>> adjusted = BalanceFrame(sample=resp, target=tgt).fit(method="ipw")
>>> x_s, x_t = adjusted.design_matrix(on="both")
>>> x_s.shape[0], x_t.shape[0]
(2, 2)

property df: DataFrame¶

Flat user-facing DataFrame from the responders.

Returns the responder data with columns ordered as: id → covariates → outcomes → weight → ignored.

Returns:: Ordered copy of the responder’s data.
Return type:: pd.DataFrame

property df_all: DataFrame¶

Combined DataFrame with all samples, distinguished by a "source" column.

Concatenates the responder, target, and (if adjusted) unadjusted DataFrames vertically, adding a "source" column with values "self", "target", and "unadjusted" respectively.

Returns:

A DataFrame with all rows from responder, target,: and optionally unadjusted SampleFrames, plus a "source" column.

Return type:

pd.DataFrame

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]}))
>>> bf = BalanceFrame(sample=resp, target=tgt)
>>> bf.df_all["source"].unique().tolist()
['self', 'target']

property df_ignored: DataFrame | None¶: Ignored columns from the responder SampleFrame, or None.

property df_responders: DataFrame¶: The responder data as a DataFrame.

property df_responders_unadjusted: DataFrame¶: The original (pre-adjustment) responder data as a DataFrame.

property df_target: DataFrame | None¶: The target data as a DataFrame, or None if not yet set.

diagnostics(weights_impact_on_outcome_method: str | None = 't_test', weights_impact_on_outcome_conf_level: float = 0.95) → DataFrame[source]¶

Table of diagnostics about the adjusted BalanceFrame.

Produces a DataFrame with columns ["metric", "val", "var"] containing size information, weight diagnostics, model details, covariate ASMD, and optionally outcome-weight impact statistics. Delegates to _build_diagnostics().

Parameters:

weights_impact_on_outcome_method – Method for computing outcome-weight impact. Pass None to skip. Defaults to "t_test".
weights_impact_on_outcome_conf_level – Confidence level for outcome impact intervals. Defaults to 0.95.

Returns:

Diagnostics table with columns: ["metric", "val", "var"].

Return type:

pd.DataFrame

Raises:

ValueError – If this BalanceFrame has not been adjusted.

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": ["1", "2"], "x": [0, 1],
...                   "weight": [1.0, 2.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": ["3", "4"], "x": [0, 1],
...                   "weight": [1.0, 1.0]}))
>>> bf = BalanceFrame(sample=resp, target=tgt)
>>> adjusted = bf.adjust(method="null")
>>> adjusted.diagnostics().columns.tolist()
['metric', 'val', 'var']

fit(*, target: BalanceFrame | SampleFrame | None = None, method: str | Callable[..., Any] = 'ipw', inplace: bool = True, **kwargs: Any) → Self[source]¶

Fit a weighting model and return the fitted BalanceFrame.

This is the sklearn-style entry point for survey weight adjustment. Like sklearn’s fit(), it learns model parameters, mutates self (by default), and returns self. In survey weighting, fitting the propensity model inherently produces adjusted weights (the two are inseparable), so the returned object contains both the fitted model and the adjusted weights — analogous to how KMeans.fit() stores labels_ on the fitted object.

Workflow — basic fitting (sklearn-style, inplace=True):

bf = BalanceFrame(sample=respondents, target=population)
bf.fit(method="ipw")       # mutates bf, returns bf
bf.weights().df            # the adjusted weights

Workflow — functional style (inplace=False):

adjusted = bf.fit(method="ipw", inplace=False)

Workflow — fit on subset, apply to holdout:

fitted = train_bf.fit(method="ipw")
scored = holdout_bf.set_fitted_model(fitted, inplace=False)
holdout_weights = scored.predict_weights()

Alternatively, design_matrix(), predict_proba(), and predict_weights() accept a data= argument so the holdout workflow becomes a single line: fitted.predict_weights(data=holdout_bf).

Parameters:

target – Optional target population to set before fitting. If provided, this method calls set_target(target, inplace=False) first, preserving immutability.
method – Adjustment method name ("ipw", "cbps", "rake", "poststratify", "null") or a custom callable with the weighting-method signature.
inplace – If True (default), mutate this object with the fitted state and return self — matching sklearn’s fit() convention. If False, return a new adjusted BalanceFrame without modifying self.
**kwargs – Keyword arguments forwarded to adjust().

Returns:

The fitted BalanceFrame — self when inplace=True, a new object when inplace=False.

Raises:

ValueError – If no target is available and none is provided, if method is invalid, or if na_action='drop' is combined with stored fit artifacts.

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [0.0, 1.0], "weight": [1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [0.2, 0.8], "weight": [1.0, 1.0]}))
>>> adjusted = BalanceFrame(sample=resp, target=tgt).fit(method="null")
>>> bool(adjusted.is_adjusted)
True

Notes

For the built-in IPW method, fit() enables store_fit_metadata=True and store_fit_matrices=True by default so design_matrix()/predict_proba()/predict_weights() can consume fit-time artifacts. This may increase memory usage for large inputs; pass these kwargs explicitly as False to opt out. For the built-in CBPS method, fit() enables store_fit_metadata=True by default so predict_weights() can reconstruct CBPS scoring artifacts. Pass store_fit_metadata=False to opt out. For the built-in poststratify method, fit() enables store_fit_metadata=True by default so predict_weights() can reconstruct poststratification cell-ratio artifacts, while direct adjust(method='poststratify') remains metadata-light unless store_fit_metadata=True is passed explicitly. For the built-in rake method, fit() enables store_fit_metadata=True by default so predict_weights() can replay/transfer fitted rake artifacts. This may increase memory usage because contingency tables and fit metadata are stored; pass store_fit_metadata=False to opt out. When metadata is stored, transformations must be pickleable (lambdas/closures are rejected at fit time so the resulting BalanceFrame remains serializable). Note: in-place predict_weights() works for any rake fit, including the default transformations='default'. Transfer scoring (predict_weights(data=...)), however, rejects models fitted with transformations='default' or with explicit dicts containing known data-dependent helpers (quantize, fct_lump) because those transformations recompute bins/levels from each scoring sample, which would silently invalidate the stored cell ratios. To use rake’s data=... transfer path, pass deterministic transformations at fit time or re-fit rake on the scoring data.

classmethod from_sample(sample: Any) → BalanceFrame[source]¶

Convert a Sample to a BalanceFrame.

The Sample must have a target set (via Sample.set_target). If the Sample is adjusted, the adjustment state (unadjusted responders, model) is preserved.

Parameters:

sample – A Sample instance with a target.

Returns:

A new BalanceFrame mirroring the Sample’s data,: target, and adjustment state.

Return type:

BalanceFrame

Raises:

TypeError – If sample is not a Sample instance.
ValueError – If sample does not have a target set.

Examples

>>> import pandas as pd
>>> from balance.sample_class import Sample
>>> from balance.balance_frame import BalanceFrame
>>> s = Sample.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]}))
>>> t = Sample.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]}))
>>> bf = BalanceFrame.from_sample(s.set_target(t))
>>> bf.is_adjusted
False

property has_target: _CallableBool¶

Whether this BalanceFrame has a target population set.

Returns a dual-use _CallableBool: both bf.has_target and bf.has_target() work (the latter for backward compatibility).

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]}))
>>> bf = BalanceFrame(sample=resp)
>>> bf.has_target
False
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]}))
>>> bf.set_target(tgt)
>>> bf.has_target
True

property id_column: str | None¶

The id column name, delegated to _sf_sample.

Changed in 0.20.0 to return the name (str) instead of data (pd.Series). Use id_series for data.

property id_series: Series | None¶: The id column as a Series, delegated to _sf_sample.

property is_adjusted: _CallableBool¶

Whether this BalanceFrame has been adjusted.

Returns a _CallableBool so both bf.is_adjusted (property) and bf.is_adjusted() (legacy call) work.

For compound adjustments (calling adjust() multiple times), is_adjusted is True after the first adjustment and remains True for all subsequent adjustments. The original unadjusted baseline is always preserved in _sf_sample_pre_adjust.

keep_only_some_rows_columns(rows_to_keep: str | None = None, columns_to_keep: list[str] | None = None) → BalanceFrame[source]¶

Return a new BalanceFrame with filtered rows and/or columns.

Returns a deep copy with the requested subset applied to the responder, target, and (if adjusted) unadjusted SampleFrames. The original BalanceFrame is unchanged (immutable pattern).

Parameters:

rows_to_keep – A boolean expression string evaluated via pd.DataFrame.eval to select rows. Applied to each SampleFrame’s underlying DataFrame. For example: 'x > 10' or 'gender == "Female"'. Defaults to None (all rows kept).
columns_to_keep – Covariate column names to retain. Special columns (id, weight) are always kept. Defaults to None (all columns kept).

Returns:

A new BalanceFrame with the filters applied.

Return type:

BalanceFrame

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2, 3], "x": [10.0, 20.0, 30.0],
...                   "weight": [1.0, 1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [4, 5, 6], "x": [15.0, 25.0, 35.0],
...                   "weight": [1.0, 1.0, 1.0]}))
>>> bf = BalanceFrame(sample=resp, target=tgt)
>>> filtered = bf.keep_only_some_rows_columns(rows_to_keep="x > 15")
>>> len(filtered._sf_sample._df)
2

property model: dict[str, Any] | None¶

The adjustment model dictionary, or None if not adjusted.

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]}))
>>> bf = BalanceFrame(sample=resp, target=tgt)
>>> bf.model is None
True

model_matrix() → DataFrame[source]¶

Return the model matrix of the responder covariates.

Constructs a model matrix using balance.util.model_matrix(), adding NA indicators for null values.

Returns:: The model matrix.
Return type:: pd.DataFrame

outcomes() → Any | None[source]¶

Return a BalanceDFOutcomes, or None.

Returns None if the responder SampleFrame has no outcome columns.

Returns:

Outcome view with linked sources,: or None if no outcomes are defined.

Return type:

BalanceDFOutcomes or None

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0],
...                   "y": [1.0, 0.0], "weight": [1.0, 1.0]}),
...     outcome_columns=["y"])
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]}))
>>> bf = BalanceFrame(sample=resp, target=tgt)
>>> bf.outcomes().df.columns.tolist()
['y']

predict_proba(on: Literal['sample'], output: Literal['probability', 'link'] = 'probability', *, data: BalanceFrame | None = None) → Series[source]¶

predict_proba(on: Literal['target'], output: Literal['probability', 'link'] = 'probability', *, data: BalanceFrame | None = None) → Series

predict_proba(on: Literal['both'] = 'both', output: Literal['probability', 'link'] = 'probability', *, data: BalanceFrame | None = None) → tuple[Series, Series]

Return IPW propensity scores.

Returns the propensity scores (predicted probabilities of being in the sample vs target) from the fitted IPW model. A target row with high propensity is well-represented in the sample; a low score indicates underrepresentation.

When data is provided, the stored model is applied to data’s covariates and fresh predictions are returned without caching. When data is None (default), stored/cached predictions for this object’s own data are returned (original behavior).

Parameters:

on – Which population to predict on ("sample", "target", or "both").
output – Output scale. "probability" returns class-1 propensity probabilities. "link" returns logit-transformed values.
data – An optional BalanceFrame whose covariates are scored using this object’s stored model. Must have matching covariate column names. The data BalanceFrame needs a target for on="target" or on="both".

Returns:

A prediction Series, or a tuple of two Series when on="both".

Raises:

ValueError – If the object is not IPW-adjusted, if target is missing for on in {"target", "both"}, if recomputation of sample-side predictions is required but no target is available, if on is invalid, or if data has mismatched covariate columns.

Notes

When data is None and stored fit-time predictions are stale for the current rows, this method may recompute and cache refreshed probabilities/links. When data is provided, no caching occurs.

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [0.0, 1.0], "weight": [1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [0.2, 0.8], "weight": [1.0, 1.0]}))
>>> adjusted = BalanceFrame(sample=resp, target=tgt).fit(method="ipw")
>>> p = adjusted.predict_proba(on="target", output="probability")
>>> int(p.shape[0])
2

predict_weights(*, data: BalanceFrame | None = None) → Series[source]¶

Predict responder weights from the fitted model’s artifacts.

Reconstructs adjusted survey weights from stored fit-time artifacts (propensity links, design weights, class balancing, trimming parameters). On the fitted object itself, the result is numerically equivalent to self.weights().df (within floating-point tolerance) and serves as a validation that the stored artifacts are sufficient to reproduce the adjustment.

When data is provided, computes weights for data’s sample using the stored model, without caching. This is the one-liner alternative to the set_fitted_model workflow:

fitted.predict_weights(data=holdout_bf)

When data is None (default), uses this object’s own data (original behavior).

Dispatches by the adjustment method stored in the model dict:

IPW: uses stored fit-time metadata (links, class balancing, trimming, and design weights) to reproduce fitted responder weights.
CBPS: rebuilds the CBPS scoring artifacts from stored metadata and supports both in-place and data=... holdout scoring.
Poststratify: replays fitted cell-ratio metadata in-place and supports data=... transfer scoring. Models fitted with transformations='default' or direct data-dependent helpers such as balance.utils.data_transformation.quantize() / balance.utils.data_transformation.fct_lump() are rejected for transfer; pass deterministic transformations explicitly or re-fit poststratify on the scoring data.
Rake: replays fitted cell-ratio artifacts in-place and supports data=... transfer scoring. See balance.weighting_methods.rake.rake() Notes for validity constraints and interpretation.
Other methods: not yet supported — will raise with guidance.

Parameters:: data – An optional BalanceFrame whose sample covariates are scored using this object’s stored model. Must have matching covariate column names and a target set. Supported for IPW, CBPS, rake, and poststratify.
Returns:: A Series of predicted responder weights.
Raises:: ValueError – If no fitted model is available, if the method is unsupported, if required target data is missing, or if data has mismatched covariate columns.

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [0.0, 1.0], "weight": [1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [0.2, 0.8], "weight": [1.0, 1.0]}))
>>> adjusted = BalanceFrame(sample=resp, target=tgt).fit(method="ipw")
>>> w = adjusted.predict_weights()
>>> int(w.shape[0])
2

property responders: SampleFrame¶: Alias for _sf_sample (backward compat, will be removed).

set_as_pre_adjust(*, inplace: bool = False) → Self[source]¶

Set the current responder state as the new pre-adjust baseline.

This “locks in” the current responder weights (which may already be adjusted and/or trimmed) as the baseline for future diagnostics and subsequent adjustments.

Parameters:: inplace – If True, mutate this object and return it. If False (default), return a new object with a deep-copied responder frame and reset baseline.
Returns:: BalanceFrame with _sf_sample_pre_adjust reset to the current responder SampleFrame state. In copy mode (inplace=False), only the responder frame is deep-copied and used to construct a new object (the full _links graph is not deep-copied). In inplace mode, the baseline is set to the existing responder frame object so baseline/current share identity, matching unadjusted-object semantics elsewhere in the API. Any current adjustment model is cleared because the object is no longer considered adjusted after this operation.

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]}))
>>> adjusted = BalanceFrame(sample=resp, target=tgt).adjust(method="null")
>>> baseline_locked = adjusted.set_as_pre_adjust()  # copy mode
>>> baseline_locked.is_adjusted
False
>>> _ = adjusted.set_as_pre_adjust(inplace=True)  # inplace mode

set_fitted_model(fitted: BalanceFrame, *, inplace: bool = True) → Self[source]¶

Apply a fitted model from another BalanceFrame, producing a fully adjusted result.

This enables fit-then-apply workflows: fit on one BalanceFrame (e.g., a 20k subset) and apply the fitted model to another BalanceFrame (e.g., the remaining 980k) with the same covariate schema. The returned object is fully adjusted (is_adjusted is True, model is set, summary() works with 3-way comparison).

Workflow (inplace=False — returns new adjusted object):

fitted = train_bf.fit(method="ipw")
scored = holdout_bf.set_fitted_model(fitted, inplace=False)
scored.summary()  # full diagnostics on holdout

Workflow (inplace=True, default — mutates self):

holdout_bf.set_fitted_model(fitted)
holdout_bf.summary()

Currently set_fitted_model applies fitted IPW models directly. Other methods may have fit artifacts but are routed through their dedicated predict_weights(data=...) workflows instead of this method-specific application path.

Parameters:

fitted – A BalanceFrame already adjusted with a supported method. Its fitted model is used to compute holdout weights.
inplace – If True (default), mutate this object and return self. If False, return a new BalanceFrame with computed weights, leaving self unchanged.

Returns:

A fully adjusted BalanceFrame with holdout weights applied. self when inplace=True, a new object when inplace=False.

Raises:

ValueError – If fitted has no stored model, if the model method is not yet supported, or if covariate column names differ between self and fitted.

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> train_resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [0.0, 1.0], "weight": [1.0, 1.0]}))
>>> train_tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [0.2, 0.8], "weight": [1.0, 1.0]}))
>>> holdout_resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [5, 6], "x": [0.1, 0.9], "weight": [1.0, 1.0]}))
>>> holdout_tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [7, 8], "x": [0.3, 0.7], "weight": [1.0, 1.0]}))
>>> train_bf = BalanceFrame(sample=train_resp, target=train_tgt)
>>> holdout_bf = BalanceFrame(sample=holdout_resp, target=holdout_tgt)
>>> fitted = train_bf.fit(method="ipw")
>>> scored = holdout_bf.set_fitted_model(fitted, inplace=False)
>>> scored.is_adjusted
True
>>> scored.model is not None
True

set_target(target: BalanceFrame | SampleFrame, inplace: bool | None = None) → Self[source]¶

Set or replace the target population.

When target is a BalanceFrame (or subclass such as Sample), a deep copy of self is returned with the target set (immutable pattern). When target is a raw SampleFrame, the behaviour depends on inplace: True mutates self, False returns a new BalanceFrame.

Parameters:

target – The target population — a BalanceFrame/Sample or a SampleFrame.
inplace – If True, mutates self (only valid for SampleFrame targets). If False, returns a new copy. Defaults to None which auto-selects: copy for BalanceFrame targets, inplace for SampleFrame targets.

Returns:

BalanceFrame with the new target set.

Raises:

TypeError / ValueError – If target is not a BalanceFrame or SampleFrame, or if they share no covariate columns.

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]}))
>>> bf = BalanceFrame(sample=resp)
>>> bf.set_target(tgt)
>>> bf.has_target()
True

set_unadjusted(second: BalanceFrame) → Self[source]¶

Set the unadjusted link for comparative analysis.

Returns a deep copy with _sf_sample_pre_adjust pointing at second’s responder SampleFrame, and _links["unadjusted"] pointing at second.

Parameters:: second – A BalanceFrame (or subclass) whose responder data becomes the unadjusted baseline.
Returns:: A new BalanceFrame with the unadjusted link set.
Raises:: TypeError – If second is not a BalanceFrame.

set_weights(weights: Series | float | None, *, use_index: bool = False) → None[source]¶

Set or replace the responder weights.

Delegates to the underlying SampleFrame’s set_weights.

When called on an unadjusted BalanceFrame (is_adjusted is False), _sf_sample and _sf_sample_pre_adjust share the same DataFrame, so the change is visible to both automatically — changing base weights is not an adjustment.

Warning

If this BalanceFrame has already been fitted (i.e., adjust() has been called), calling set_weights() changes the design weights but does not invalidate the stored fit artifacts (_adjustment_model). The link values in those artifacts were computed using the old weights, so predict_weights() will use new current_sample_weights with stale links, producing a mathematical inconsistency. Users should re-fit (call adjust() again) after changing weights on an already-fitted BalanceFrame.

Parameters:

weights – New weights. A Series, a scalar (broadcast to all rows), or None (sets all to 1.0).
use_index – If True, align weights by index instead of requiring matching length. See SampleFrame.set_weights().

summary() → str[source]¶

Consolidated summary of covariate balance, weight health, and outcomes.

Produces a multi-line summary combining covariate ASMD / KLD diagnostics, weight design effect, and outcome means. Delegates to _build_summary() after computing the necessary intermediate values.

When no target is set, returns a minimal summary with weight diagnostics and outcome means only.

Returns:: A human-readable multi-line summary string.
Return type:: str

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2, 3, 4], "x": [0, 1, 1, 0],
...                   "weight": [1.0, 2.0, 1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [5, 6, 7, 8], "x": [0, 0, 1, 1],
...                   "weight": [1.0, 1.0, 1.0, 1.0]}))
>>> bf = BalanceFrame(sample=resp, target=tgt)
>>> adjusted = bf.adjust(method="null")
>>> "Covariate diagnostics:" in adjusted.summary()
True

property target: SampleFrame | None¶: Alias for _sf_target (backward compat, will be removed).

to_csv(path_or_buf: str | Path | IO | None = None, **kwargs: Any) → str | None[source]¶

Write the combined DataFrame to CSV.

Writes the output of df (responder + target + unadjusted rows with a "source" column) to a CSV file or string. Delegates to to_csv_with_defaults().

Parameters:

path_or_buf – Destination. If None, returns the CSV as a string.
**kwargs – Additional keyword arguments passed to pd.DataFrame.to_csv().

Returns:

CSV string if path_or_buf is None, else None.

Return type:

str or None

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]}))
>>> bf = BalanceFrame(sample=resp, target=tgt)
>>> "id" in bf.to_csv()
True

to_download(tempdir: str | None = None) → Any[source]¶

Create a downloadable file link of the combined DataFrame.

Writes df to a temporary CSV file and returns an IPython FileLink for interactive download.

Parameters:: tempdir – Directory for the temp file. If None, uses tempfile.gettempdir().
Returns:: An IPython file link for downloading the CSV.
Return type:: FileLink

Examples

>>> import tempfile
>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]}))
>>> bf = BalanceFrame(sample=resp, target=tgt)
>>> link = bf.to_download(tempdir=tempfile.gettempdir())

to_sample() → Any[source]¶

Convert this BalanceFrame back to a Sample.

Reconstructs a Sample with the responder data and target set. If this BalanceFrame is adjusted, the returned Sample will also be adjusted — is_adjusted() returns True, has_target() returns True, and the original (unadjusted) weights are preserved via the "unadjusted" link.

Returns:

A Sample mirroring this BalanceFrame’s data, target,: and adjustment state.

Return type:

Sample

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2, 3], "x": [10.0, 20.0, 30.0],
...                   "weight": [1.0, 1.0, 1.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [4, 5, 6], "x": [15.0, 25.0, 35.0],
...                   "weight": [1.0, 1.0, 1.0]}))
>>> bf = BalanceFrame(sample=resp, target=tgt)
>>> s = bf.to_sample()
>>> s.has_target()
True

Trim extreme weights using mean-ratio clipping or percentile winsorization.

Delegates to SampleFrame.trim() for computation and weight history tracking, then wraps the result in a new BalanceFrame (preserving target, pre-adjust baseline, and links).

Parameters:

ratio – Mean-ratio upper bound. Mutually exclusive with percentile.
percentile – Percentile(s) for winsorization. Mutually exclusive with ratio.
keep_sum_of_weights – Whether to rescale after trimming to preserve the original sum of weights.
target_sum_weights – If provided, rescale trimmed weights so their sum equals this target.
inplace – If True, mutate this BalanceFrame’s weights and return it. If False (default), return a new BalanceFrame.

Returns:

The BalanceFrame with trimmed weights (self if inplace, else a new instance).

property unadjusted: SampleFrame | None¶: Alias for _sf_sample_pre_adjust if adjusted, else None (backward compat).

property weight_series: Series | None¶: The active weight as a Series, delegated to _sf_sample.

weights() → Any[source]¶

Return a BalanceDFWeights for the responders.

The returned object carries linked target (and unadjusted, if adjusted) views for comparative weight analysis.

Returns:: Weight view with linked sources.
Return type:: BalanceDFWeights

Examples

>>> import pandas as pd
>>> from balance.sample_frame import SampleFrame
>>> from balance.balance_frame import BalanceFrame
>>> resp = SampleFrame.from_frame(
...     pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 2.0]}))
>>> tgt = SampleFrame.from_frame(
...     pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]}))
>>> bf = BalanceFrame(sample=resp, target=tgt)
>>> bf.weights().df.columns.tolist()
['weight']

balance.balance_frame¶

Table of Contents

Previous topic

Next topic

This Page