balance.balance_frame¶
BalanceFrame: workflow orchestrator for survey/observational data reweighting.
Pairs a responder SampleFrame with a target SampleFrame and exposes an immutable adjust() method that returns a new, weight-augmented BalanceFrame.
- class balance.balance_frame.BalanceFrame(sample: SampleFrame | None = None, target: SampleFrame | None = None)[source]¶
A pairing of responder and target SampleFrames for survey weighting.
BalanceFrame holds two
SampleFrameinstances — responders (the sample to be reweighted) and target (the population benchmark) — and provides methods for adjusting responder weights and computing diagnostics.BalanceFrame is immutable by convention:
adjust()returns a new BalanceFrame rather than modifying the existing one. This makes it safe to keep a reference to the pre-adjustment state.Must be constructed via the public constructor
BalanceFrame(sample=..., target=...)which delegates to the internal_create()factory.- responders¶
The responder sample.
- Type:
- target¶
The target population.
- Type:
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]})) >>> bf = BalanceFrame(sample=resp, target=tgt) >>> bf.is_adjusted False >>> adjusted = bf.adjust(method="ipw") >>> adjusted.is_adjusted True >>> bf.is_adjusted # original unchanged False
- adjust(target: BalanceFrame | None = None, method: str | Callable[..., Any] = 'ipw', *args: Any, **kwargs: Any) Self[source]¶
Adjust responder weights to match the target. Returns a NEW BalanceFrame.
The original BalanceFrame is not modified (immutable pattern). The returned BalanceFrame has
is_adjusted == Trueand the pre-adjustment responders stored inunadjusted.The active weight column always keeps its original name (e.g.,
"weight"). Its values are overwritten with the new adjusted weights. The full weight history is tracked via additional columns:Weight columns after each adjustment¶ After
Weight columns in
respondersActive (
"weight")Before adjust
weightoriginal design weights
1st adjust
weight,weight_pre_adjust,weight_adjusted_1=
weight_adjusted_1values2nd adjust
weight_adjusted_2
=
weight_adjusted_2values3rd adjust
weight_adjusted_3
=
weight_adjusted_3valuesCompound / sequential adjustments:
adjust()can be called multiple times. Each call uses the current (previously adjusted) weights as design weights, so adjustments compound. For example, run IPW first to correct broad imbalances, then rake on a specific variable for fine-tuning:adjusted_ipw = bf.adjust(method="ipw", max_de=2) adjusted_final = adjusted_ipw.adjust(method="rake")
The original unadjusted baseline is always preserved:
_sf_sample_pre_adjustalways points to the original (pre-first-adjustment) SampleFrame._links["unadjusted"]always points to the original unadjusted BalanceFrame, so 3-way comparisons (adjusted vs original vs target) andasmd_improvement()show total improvement across all adjustment steps.modelstores only the latest adjustment’s model dict.
- Parameters:
target – Optional target BalanceFrame/Sample. If provided, calls
set_target(target)first, then adjusts. If None, uses the already-set target.method – The weighting method to use. Built-in options:
"ipw","cbps","rake","poststratify","null". A callable with the same signature as the built-in methods is also accepted.*args – Positional arguments (forwarded on recursive call only).
**kwargs – Additional keyword arguments forwarded to the adjustment function (e.g.
max_de,transformations).
- Returns:
A new, adjusted BalanceFrame.
- Raises:
ValueError – If method is a string that doesn’t match any registered adjustment method.
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2, 3], "x": [10.0, 20.0, 30.0], ... "weight": [1.0, 1.0, 1.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [4, 5, 6], "x": [15.0, 25.0, 35.0], ... "weight": [1.0, 1.0, 1.0]})) >>> bf = BalanceFrame(sample=resp, target=tgt) >>> adjusted = bf.adjust(method="ipw") >>> adjusted.is_adjusted True >>> adjusted2 = adjusted.adjust(method="null") >>> adjusted2.is_adjusted True
- covars(formula: str | list[str] | None = None) Any[source]¶
Return a
BalanceDFCovarsfor the responders.The returned object carries linked target (and unadjusted, if adjusted) views so that methods like
.mean()and.asmd()automatically include comparisons across sources.- Parameters:
formula – Optional formula string (or list) for model matrix construction. Passed through to BalanceDFCovars.
- Returns:
Covariate view with linked sources.
- Return type:
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]})) >>> bf = BalanceFrame(sample=resp, target=tgt) >>> bf.covars().df.columns.tolist() ['x']
- design_matrix(on: Literal['sample'], *, data: BalanceFrame | None = None) DataFrame[source]¶
- design_matrix(on: Literal['target'], *, data: BalanceFrame | None = None) DataFrame
- design_matrix(on: Literal['both'] = 'both', *, data: BalanceFrame | None = None) tuple[DataFrame, DataFrame]
Return the IPW model’s design matrices.
Returns the model matrices (feature matrices) built by the stored preprocessing pipeline — after formula expansion, one-hot encoding, NA indicator addition, scaling, and penalty weighting.
When
datais provided, the stored preprocessing is applied todata’s covariates and the result is returned without caching. Whendatais None (default), stored/cached matrices for this object’s own data are returned (original behavior).- Parameters:
on – Which population’s matrix to return.
"sample"returns the respondent matrix,"target"returns the target matrix, and"both"returns(sample_matrix, target_matrix).data – An optional BalanceFrame whose covariates are transformed using this object’s stored preprocessing pipeline. The
dataBalanceFrame does not need to be adjusted — it just provides covariates. Must have matching covariate column names.
- Returns:
A model-matrix DataFrame, or a tuple of two DataFrames when
on="both".- Raises:
ValueError – If the object is not IPW-adjusted, if target is missing for
on in {"target", "both"}, if recomputation of sample-side artifacts is required but no target is available, ifonis invalid, or ifdatahas mismatched covariate columns.
Notes
When
datais None and stored fit artifacts are stale for the current rows (e.g., afterset_fitted_model()), this method recomputes and caches refreshed matrices. That cache update is an intentional in-memory mutation. Whendatais provided, no caching occurs.Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [0.0, 1.0], "weight": [1.0, 1.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [0.2, 0.8], "weight": [1.0, 1.0]})) >>> adjusted = BalanceFrame(sample=resp, target=tgt).fit(method="ipw") >>> x_s, x_t = adjusted.design_matrix(on="both") >>> x_s.shape[0], x_t.shape[0] (2, 2)
- property df: DataFrame¶
Flat user-facing DataFrame from the responders.
Returns the responder data with columns ordered as: id → covariates → outcomes → weight → ignored.
- Returns:
Ordered copy of the responder’s data.
- Return type:
pd.DataFrame
- property df_all: DataFrame¶
Combined DataFrame with all samples, distinguished by a
"source"column.Concatenates the responder, target, and (if adjusted) unadjusted DataFrames vertically, adding a
"source"column with values"self","target", and"unadjusted"respectively.- Returns:
- A DataFrame with all rows from responder, target,
and optionally unadjusted SampleFrames, plus a
"source"column.
- Return type:
pd.DataFrame
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]})) >>> bf = BalanceFrame(sample=resp, target=tgt) >>> bf.df_all["source"].unique().tolist() ['self', 'target']
- property df_ignored: DataFrame | None¶
Ignored columns from the responder SampleFrame, or None.
- property df_responders: DataFrame¶
The responder data as a DataFrame.
- property df_responders_unadjusted: DataFrame¶
The original (pre-adjustment) responder data as a DataFrame.
- property df_target: DataFrame | None¶
The target data as a DataFrame, or None if not yet set.
- diagnostics(weights_impact_on_outcome_method: str | None = 't_test', weights_impact_on_outcome_conf_level: float = 0.95) DataFrame[source]¶
Table of diagnostics about the adjusted BalanceFrame.
Produces a DataFrame with columns
["metric", "val", "var"]containing size information, weight diagnostics, model details, covariate ASMD, and optionally outcome-weight impact statistics. Delegates to_build_diagnostics().- Parameters:
weights_impact_on_outcome_method – Method for computing outcome-weight impact. Pass
Noneto skip. Defaults to"t_test".weights_impact_on_outcome_conf_level – Confidence level for outcome impact intervals. Defaults to
0.95.
- Returns:
- Diagnostics table with columns
["metric", "val", "var"].
- Return type:
pd.DataFrame
- Raises:
ValueError – If this BalanceFrame has not been adjusted.
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": ["1", "2"], "x": [0, 1], ... "weight": [1.0, 2.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": ["3", "4"], "x": [0, 1], ... "weight": [1.0, 1.0]})) >>> bf = BalanceFrame(sample=resp, target=tgt) >>> adjusted = bf.adjust(method="null") >>> adjusted.diagnostics().columns.tolist() ['metric', 'val', 'var']
- fit(*, target: BalanceFrame | SampleFrame | None = None, method: str | Callable[..., Any] = 'ipw', inplace: bool = True, **kwargs: Any) Self[source]¶
Fit a weighting model and return the fitted BalanceFrame.
This is the sklearn-style entry point for survey weight adjustment. Like sklearn’s
fit(), it learns model parameters, mutatesself(by default), and returnsself. In survey weighting, fitting the propensity model inherently produces adjusted weights (the two are inseparable), so the returned object contains both the fitted model and the adjusted weights — analogous to howKMeans.fit()storeslabels_on the fitted object.Workflow — basic fitting (sklearn-style, inplace=True):
bf = BalanceFrame(sample=respondents, target=population) bf.fit(method="ipw") # mutates bf, returns bf bf.weights().df # the adjusted weights
Workflow — functional style (inplace=False):
adjusted = bf.fit(method="ipw", inplace=False)
Workflow — fit on subset, apply to holdout:
fitted = train_bf.fit(method="ipw") scored = holdout_bf.set_fitted_model(fitted, inplace=False) holdout_weights = scored.predict_weights()
Alternatively,
design_matrix(),predict_proba(), andpredict_weights()accept adata=argument so the holdout workflow becomes a single line:fitted.predict_weights(data=holdout_bf).- Parameters:
target – Optional target population to set before fitting. If provided, this method calls
set_target(target, inplace=False)first, preserving immutability.method – Adjustment method name (
"ipw","cbps","rake","poststratify","null") or a custom callable with the weighting-method signature.inplace – If True (default), mutate this object with the fitted state and return
self— matching sklearn’sfit()convention. If False, return a new adjusted BalanceFrame without modifyingself.**kwargs – Keyword arguments forwarded to
adjust().
- Returns:
The fitted BalanceFrame —
selfwheninplace=True, a new object wheninplace=False.- Raises:
ValueError – If no target is available and none is provided, if
methodis invalid, or ifna_action='drop'is combined with stored fit artifacts.
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [0.0, 1.0], "weight": [1.0, 1.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [0.2, 0.8], "weight": [1.0, 1.0]})) >>> adjusted = BalanceFrame(sample=resp, target=tgt).fit(method="null") >>> bool(adjusted.is_adjusted) True
Notes
For the built-in IPW method,
fit()enablesstore_fit_metadata=Trueandstore_fit_matrices=Trueby default sodesign_matrix()/predict_proba()/predict_weights()can consume fit-time artifacts. This may increase memory usage for large inputs; pass these kwargs explicitly asFalseto opt out.
- classmethod from_sample(sample: Any) BalanceFrame[source]¶
Convert a
Sampleto a BalanceFrame.The Sample must have a target set (via
Sample.set_target). If the Sample is adjusted, the adjustment state (unadjusted responders, model) is preserved.- Parameters:
sample – A
Sampleinstance with a target.- Returns:
- A new BalanceFrame mirroring the Sample’s data,
target, and adjustment state.
- Return type:
- Raises:
TypeError – If sample is not a Sample instance.
ValueError – If sample does not have a target set.
Examples
>>> import pandas as pd >>> from balance.sample_class import Sample >>> from balance.balance_frame import BalanceFrame >>> s = Sample.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]})) >>> t = Sample.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]})) >>> bf = BalanceFrame.from_sample(s.set_target(t)) >>> bf.is_adjusted False
- property has_target: _CallableBool¶
Whether this BalanceFrame has a target population set.
Returns a dual-use
_CallableBool: bothbf.has_targetandbf.has_target()work (the latter for backward compatibility).Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]})) >>> bf = BalanceFrame(sample=resp) >>> bf.has_target False >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]})) >>> bf.set_target(tgt) >>> bf.has_target True
- property id_column: str | None¶
The id column name, delegated to
_sf_sample.Changed in 0.20.0 to return the name (str) instead of data (pd.Series). Use
id_seriesfor data.
- property id_series: Series | None¶
The id column as a Series, delegated to
_sf_sample.
- property is_adjusted: _CallableBool¶
Whether this BalanceFrame has been adjusted.
Returns a
_CallableBoolso bothbf.is_adjusted(property) andbf.is_adjusted()(legacy call) work.For compound adjustments (calling
adjust()multiple times),is_adjustedis True after the first adjustment and remains True for all subsequent adjustments. The original unadjusted baseline is always preserved in_sf_sample_pre_adjust.
- keep_only_some_rows_columns(rows_to_keep: str | None = None, columns_to_keep: list[str] | None = None) BalanceFrame[source]¶
Return a new BalanceFrame with filtered rows and/or columns.
Returns a deep copy with the requested subset applied to the responder, target, and (if adjusted) unadjusted SampleFrames. The original BalanceFrame is unchanged (immutable pattern).
- Parameters:
rows_to_keep – A boolean expression string evaluated via
pd.DataFrame.evalto select rows. Applied to each SampleFrame’s underlying DataFrame. For example:'x > 10'or'gender == "Female"'. Defaults to None (all rows kept).columns_to_keep – Covariate column names to retain. Special columns (id, weight) are always kept. Defaults to None (all columns kept).
- Returns:
A new BalanceFrame with the filters applied.
- Return type:
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2, 3], "x": [10.0, 20.0, 30.0], ... "weight": [1.0, 1.0, 1.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [4, 5, 6], "x": [15.0, 25.0, 35.0], ... "weight": [1.0, 1.0, 1.0]})) >>> bf = BalanceFrame(sample=resp, target=tgt) >>> filtered = bf.keep_only_some_rows_columns(rows_to_keep="x > 15") >>> len(filtered._sf_sample._df) 2
- property model: dict[str, Any] | None¶
The adjustment model dictionary, or None if not adjusted.
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]})) >>> bf = BalanceFrame(sample=resp, target=tgt) >>> bf.model is None True
- model_matrix() DataFrame[source]¶
Return the model matrix of the responder covariates.
Constructs a model matrix using
balance.util.model_matrix(), adding NA indicators for null values.- Returns:
The model matrix.
- Return type:
pd.DataFrame
- outcomes() Any | None[source]¶
Return a
BalanceDFOutcomes, or None.Returns
Noneif the responder SampleFrame has no outcome columns.- Returns:
- Outcome view with linked sources,
or
Noneif no outcomes are defined.
- Return type:
BalanceDFOutcomes or None
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], ... "y": [1.0, 0.0], "weight": [1.0, 1.0]}), ... outcome_columns=["y"]) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]})) >>> bf = BalanceFrame(sample=resp, target=tgt) >>> bf.outcomes().df.columns.tolist() ['y']
- predict_proba(on: Literal['sample'], output: Literal['probability', 'link'] = 'probability', *, data: BalanceFrame | None = None) Series[source]¶
- predict_proba(on: Literal['target'], output: Literal['probability', 'link'] = 'probability', *, data: BalanceFrame | None = None) Series
- predict_proba(on: Literal['both'] = 'both', output: Literal['probability', 'link'] = 'probability', *, data: BalanceFrame | None = None) tuple[Series, Series]
Return IPW propensity scores.
Returns the propensity scores (predicted probabilities of being in the sample vs target) from the fitted IPW model. A target row with high propensity is well-represented in the sample; a low score indicates underrepresentation.
When
datais provided, the stored model is applied todata’s covariates and fresh predictions are returned without caching. Whendatais None (default), stored/cached predictions for this object’s own data are returned (original behavior).- Parameters:
on – Which population to predict on (
"sample","target", or"both").output – Output scale.
"probability"returns class-1 propensity probabilities."link"returns logit-transformed values.data – An optional BalanceFrame whose covariates are scored using this object’s stored model. Must have matching covariate column names. The
dataBalanceFrame needs a target foron="target"oron="both".
- Returns:
A prediction Series, or a tuple of two Series when
on="both".- Raises:
ValueError – If the object is not IPW-adjusted, if target is missing for
on in {"target", "both"}, if recomputation of sample-side predictions is required but no target is available, ifonis invalid, or ifdatahas mismatched covariate columns.
Notes
When
datais None and stored fit-time predictions are stale for the current rows, this method may recompute and cache refreshed probabilities/links. Whendatais provided, no caching occurs.Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [0.0, 1.0], "weight": [1.0, 1.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [0.2, 0.8], "weight": [1.0, 1.0]})) >>> adjusted = BalanceFrame(sample=resp, target=tgt).fit(method="ipw") >>> p = adjusted.predict_proba(on="target", output="probability") >>> int(p.shape[0]) 2
- predict_weights(*, data: BalanceFrame | None = None) Series[source]¶
Predict responder weights from the fitted model’s artifacts.
Reconstructs adjusted survey weights from stored fit-time artifacts (propensity links, design weights, class balancing, trimming parameters). On the fitted object itself, the result is numerically equivalent to
self.weights().df(within floating-point tolerance) and serves as a validation that the stored artifacts are sufficient to reproduce the adjustment.When
datais provided, computes weights fordata’s sample using the stored model, without caching. This is the one-liner alternative to theset_fitted_modelworkflow:fitted.predict_weights(data=holdout_bf)
When
datais None (default), uses this object’s own data (original behavior).Dispatches by the adjustment method stored in the model dict:
IPW: uses stored fit-time metadata (links, class balancing, trimming, and design weights) to reproduce fitted responder weights.
Other methods: not yet supported — will raise with guidance.
- Parameters:
data – An optional BalanceFrame whose sample covariates are scored using this object’s stored model. Must have matching covariate column names and a target set.
- Returns:
A Series of predicted responder weights.
- Raises:
ValueError – If no fitted model is available, if the method is unsupported, if required target data is missing, or if
datahas mismatched covariate columns.
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [0.0, 1.0], "weight": [1.0, 1.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [0.2, 0.8], "weight": [1.0, 1.0]})) >>> adjusted = BalanceFrame(sample=resp, target=tgt).fit(method="ipw") >>> w = adjusted.predict_weights() >>> int(w.shape[0]) 2
- property responders: SampleFrame¶
Alias for
_sf_sample(backward compat, will be removed).
- set_as_pre_adjust(*, inplace: bool = False) Self[source]¶
Set the current responder state as the new pre-adjust baseline.
This “locks in” the current responder weights (which may already be adjusted and/or trimmed) as the baseline for future diagnostics and subsequent adjustments.
- Parameters:
inplace – If True, mutate this object and return it. If False (default), return a new object with a deep-copied responder frame and reset baseline.
- Returns:
BalanceFrame with
_sf_sample_pre_adjustreset to the current responder SampleFrame state. In copy mode (inplace=False), only the responder frame is deep-copied and used to construct a new object (the full_linksgraph is not deep-copied). In inplace mode, the baseline is set to the existing responder frame object so baseline/current share identity, matching unadjusted-object semantics elsewhere in the API. Any current adjustment model is cleared because the object is no longer considered adjusted after this operation.
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]})) >>> adjusted = BalanceFrame(sample=resp, target=tgt).adjust(method="null") >>> baseline_locked = adjusted.set_as_pre_adjust() # copy mode >>> baseline_locked.is_adjusted False >>> _ = adjusted.set_as_pre_adjust(inplace=True) # inplace mode
- set_fitted_model(fitted: BalanceFrame, *, inplace: bool = True) Self[source]¶
Apply a fitted model from another BalanceFrame, producing a fully adjusted result.
This enables fit-then-apply workflows: fit on one BalanceFrame (e.g., a 20k subset) and apply the fitted model to another BalanceFrame (e.g., the remaining 980k) with the same covariate schema. The returned object is fully adjusted (
is_adjustedis True,modelis set,summary()works with 3-way comparison).Workflow (inplace=False — returns new adjusted object):
fitted = train_bf.fit(method="ipw") scored = holdout_bf.set_fitted_model(fitted, inplace=False) scored.summary() # full diagnostics on holdout
Workflow (inplace=True, default — mutates self):
holdout_bf.set_fitted_model(fitted) holdout_bf.summary()
Currently supports IPW models. Other methods (CBPS, rake, poststratify) will be supported once they store fit-time artifacts.
- Parameters:
fitted – A BalanceFrame already adjusted with a supported method. Its fitted model is used to compute holdout weights.
inplace – If True (default), mutate this object and return
self. If False, return a new BalanceFrame with computed weights, leavingselfunchanged.
- Returns:
A fully adjusted BalanceFrame with holdout weights applied.
selfwheninplace=True, a new object wheninplace=False.- Raises:
ValueError – If
fittedhas no stored model, if the model method is not yet supported, or if covariate column names differ betweenselfandfitted.
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> train_resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [0.0, 1.0], "weight": [1.0, 1.0]})) >>> train_tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [0.2, 0.8], "weight": [1.0, 1.0]})) >>> holdout_resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [5, 6], "x": [0.1, 0.9], "weight": [1.0, 1.0]})) >>> holdout_tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [7, 8], "x": [0.3, 0.7], "weight": [1.0, 1.0]})) >>> train_bf = BalanceFrame(sample=train_resp, target=train_tgt) >>> holdout_bf = BalanceFrame(sample=holdout_resp, target=holdout_tgt) >>> fitted = train_bf.fit(method="ipw") >>> scored = holdout_bf.set_fitted_model(fitted, inplace=False) >>> scored.is_adjusted True >>> scored.model is not None True
- set_target(target: BalanceFrame | SampleFrame, inplace: bool | None = None) Self[source]¶
Set or replace the target population.
When target is a BalanceFrame (or subclass such as Sample), a deep copy of
selfis returned with the target set (immutable pattern). When target is a raw SampleFrame, the behaviour depends on inplace: True mutates self, False returns a new BalanceFrame.- Parameters:
target – The target population — a BalanceFrame/Sample or a SampleFrame.
inplace – If True, mutates self (only valid for SampleFrame targets). If False, returns a new copy. Defaults to None which auto-selects: copy for BalanceFrame targets, inplace for SampleFrame targets.
- Returns:
BalanceFrame with the new target set.
- Raises:
TypeError / ValueError – If target is not a BalanceFrame or SampleFrame, or if they share no covariate columns.
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]})) >>> bf = BalanceFrame(sample=resp) >>> bf.set_target(tgt) >>> bf.has_target() True
- set_unadjusted(second: BalanceFrame) Self[source]¶
Set the unadjusted link for comparative analysis.
Returns a deep copy with
_sf_sample_pre_adjustpointing at second’s responder SampleFrame, and_links["unadjusted"]pointing at second.- Parameters:
second – A BalanceFrame (or subclass) whose responder data becomes the unadjusted baseline.
- Returns:
A new BalanceFrame with the unadjusted link set.
- Raises:
TypeError – If second is not a BalanceFrame.
- set_weights(weights: Series | float | None, *, use_index: bool = False) None[source]¶
Set or replace the responder weights.
Delegates to the underlying SampleFrame’s
set_weights.When called on an unadjusted BalanceFrame (
is_adjustedis False),_sf_sampleand_sf_sample_pre_adjustshare the same DataFrame, so the change is visible to both automatically — changing base weights is not an adjustment.Warning
If this BalanceFrame has already been fitted (i.e.,
adjust()has been called), callingset_weights()changes the design weights but does not invalidate the stored fit artifacts (_adjustment_model). The link values in those artifacts were computed using the old weights, sopredict_weights()will use newcurrent_sample_weightswith stale links, producing a mathematical inconsistency. Users should re-fit (calladjust()again) after changing weights on an already-fitted BalanceFrame.- Parameters:
weights – New weights. A Series, a scalar (broadcast to all rows), or
None(sets all to 1.0).use_index – If True, align weights by index instead of requiring matching length. See
SampleFrame.set_weights().
- summary() str[source]¶
Consolidated summary of covariate balance, weight health, and outcomes.
Produces a multi-line summary combining covariate ASMD / KLD diagnostics, weight design effect, and outcome means. Delegates to
_build_summary()after computing the necessary intermediate values.When no target is set, returns a minimal summary with weight diagnostics and outcome means only.
- Returns:
A human-readable multi-line summary string.
- Return type:
str
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2, 3, 4], "x": [0, 1, 1, 0], ... "weight": [1.0, 2.0, 1.0, 1.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [5, 6, 7, 8], "x": [0, 0, 1, 1], ... "weight": [1.0, 1.0, 1.0, 1.0]})) >>> bf = BalanceFrame(sample=resp, target=tgt) >>> adjusted = bf.adjust(method="null") >>> "Covariate diagnostics:" in adjusted.summary() True
- property target: SampleFrame | None¶
Alias for
_sf_target(backward compat, will be removed).
- to_csv(path_or_buf: str | Path | IO | None = None, **kwargs: Any) str | None[source]¶
Write the combined DataFrame to CSV.
Writes the output of
df(responder + target + unadjusted rows with a"source"column) to a CSV file or string. Delegates toto_csv_with_defaults().- Parameters:
path_or_buf – Destination. If
None, returns the CSV as a string.**kwargs – Additional keyword arguments passed to
pd.DataFrame.to_csv().
- Returns:
CSV string if
path_or_bufis None, else None.- Return type:
str or None
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]})) >>> bf = BalanceFrame(sample=resp, target=tgt) >>> "id" in bf.to_csv() True
- to_download(tempdir: str | None = None) Any[source]¶
Create a downloadable file link of the combined DataFrame.
Writes
dfto a temporary CSV file and returns an IPythonFileLinkfor interactive download.- Parameters:
tempdir – Directory for the temp file. If None, uses
tempfile.gettempdir().- Returns:
An IPython file link for downloading the CSV.
- Return type:
FileLink
Examples
>>> import tempfile >>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 1.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]})) >>> bf = BalanceFrame(sample=resp, target=tgt) >>> link = bf.to_download(tempdir=tempfile.gettempdir())
- to_sample() Any[source]¶
Convert this BalanceFrame back to a
Sample.Reconstructs a Sample with the responder data and target set. If this BalanceFrame is adjusted, the returned Sample will also be adjusted —
is_adjusted()returns True,has_target()returns True, and the original (unadjusted) weights are preserved via the"unadjusted"link.- Returns:
- A Sample mirroring this BalanceFrame’s data, target,
and adjustment state.
- Return type:
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2, 3], "x": [10.0, 20.0, 30.0], ... "weight": [1.0, 1.0, 1.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [4, 5, 6], "x": [15.0, 25.0, 35.0], ... "weight": [1.0, 1.0, 1.0]})) >>> bf = BalanceFrame(sample=resp, target=tgt) >>> s = bf.to_sample() >>> s.has_target() True
- trim(ratio: float | int | None = None, percentile: float | tuple[float, float] | None = None, keep_sum_of_weights: bool = True, target_sum_weights: float | int | np.floating | None = None, *, inplace: bool = False) Self[source]¶
Trim extreme weights using mean-ratio clipping or percentile winsorization.
Delegates to
SampleFrame.trim()for computation and weight history tracking, then wraps the result in a new BalanceFrame (preserving target, pre-adjust baseline, and links).- Parameters:
ratio – Mean-ratio upper bound. Mutually exclusive with percentile.
percentile – Percentile(s) for winsorization. Mutually exclusive with ratio.
keep_sum_of_weights – Whether to rescale after trimming to preserve the original sum of weights.
target_sum_weights – If provided, rescale trimmed weights so their sum equals this target.
inplace – If True, mutate this BalanceFrame’s weights and return it. If False (default), return a new BalanceFrame.
- Returns:
The BalanceFrame with trimmed weights (self if inplace, else a new instance).
- property unadjusted: SampleFrame | None¶
Alias for
_sf_sample_pre_adjustif adjusted, else None (backward compat).
- property weight_series: Series | None¶
The active weight as a Series, delegated to
_sf_sample.
- weights() Any[source]¶
Return a
BalanceDFWeightsfor the responders.The returned object carries linked target (and unadjusted, if adjusted) views for comparative weight analysis.
- Returns:
Weight view with linked sources.
- Return type:
Examples
>>> import pandas as pd >>> from balance.sample_frame import SampleFrame >>> from balance.balance_frame import BalanceFrame >>> resp = SampleFrame.from_frame( ... pd.DataFrame({"id": [1, 2], "x": [10.0, 20.0], "weight": [1.0, 2.0]})) >>> tgt = SampleFrame.from_frame( ... pd.DataFrame({"id": [3, 4], "x": [15.0, 25.0], "weight": [1.0, 1.0]})) >>> bf = BalanceFrame(sample=resp, target=tgt) >>> bf.weights().df.columns.tolist() ['weight']