balance.weighting_methods.rake

balance.weighting_methods.rake.prepare_marginal_dist_for_raking(dict_of_dicts: Dict[str, Dict[str, float]]) DataFrame[source]

Realizes a nested dictionary of proportions into a DataFrame.

Parameters:

dict_of_dicts – A nested dictionary where the outer keys are column names and the inner dictionaries have keys as category labels and values as their proportions (float).

Returns:

A DataFrame with columns specified by the outer keys of the input dictionary and rows containing the category labels according to their proportions. An additional “id” column is added with integer values as row identifiers.

Example

::
print(prepare_marginal_dist_for_raking({

“A”: {“a”: 0.5, “b”: 0.5}, “B”: {“x”: 0.2, “y”: 0.8}

}))

# Returns a DataFrame with columns A, B, and id

# A B id # 0 a x 0 # 1 b y 1 # 2 a y 2 # 3 b y 3 # 4 a y 4 # 5 b x 5 # 6 a y 6 # 7 b y 7 # 8 a y 8 # 9 b y 9

balance.weighting_methods.rake.rake(sample_df: DataFrame, sample_weights: Series, target_df: DataFrame, target_weights: Series, variables: List[str] | None = None, transformations: Dict[str, Callable] | str = 'default', na_action: str = 'add_indicator', max_iteration: int = 1000, convergence_rate: float = 0.0005, rate_tolerance: float = 1e-08, *args, **kwargs) Dict[source]

Perform raking (using the iterative proportional fitting algorithm). See: https://en.wikipedia.org/wiki/Iterative_proportional_fitting

Returns weights normalised to sum of target weights

Arguments: sample_df — (pandas dataframe) a dataframe representing the sample. sample_weights — (pandas series) design weights for sample. target_df — (pandas dataframe) a dataframe representing the target. target_weights — (pandas series) design weights for target. variables — (list of strings) list of variables to include in the model.

If None all joint variables of sample_df and target_df are used.

transformations — (dict) what transformations to apply to data before fitting the model.

Default is “default” (see apply_transformations function).

na_action — (string) what to do with NAs. Default is “add_indicator”, which adds NaN as a

group (called “__NaN__”) for each weighting variable (post-transformation); “drop” removes rows with any missing values on any variable from both sample and target.

max_iteration — (int) maximum number of iterations for iterative proportional fitting algorithm convergence_rate — (float) convergence criteria; the maximum difference in proportions between

sample and target marginal distribution on any covariate in order for algorithm to converge.

rate_tolerance — (float) convergence criteria; if convergence rate does not move more

than this amount than the algorithm is also considered to have converged.

Returns: A dictionary including: “weight” — The weights for the sample. “model” — parameters of the model: iterations (dataframe with iteration numbers and

convergence rate information at all steps), converged (Flag with the output status: 0 for failure and 1 for success).