balance.stats_and_plots.ascii_plots¶
- balance.stats_and_plots.ascii_plots.ascii_comparative_hist(dfs: List[DataFrameWithWeight], names: List[str], column: str, weighted: bool = True, n_bins: int | None = None, bar_width: int | None = None) str[source]¶
Produces a columnar, baseline-relative ASCII histogram.
The first dataset is the baseline. Subsequent datasets show bars split into segments that indicate how each bin compares to the baseline.
- How to read the output:
Each row is a bin range. The first column is the baseline dataset, shown with solid
█bars. For every other column:█(solid fill) = the portion of the bar that matches the baseline proportion. This is the “common” part.▒(medium shade) = the portion that exceeds the baseline. The bin has more mass than the baseline in this range.`` ]`` (right bracket) = a deficit relative to the baseline. The gap before the bracket shows how much mass is missing compared to the baseline in this range.
A number without any bar means the percentage is too small to render at the chosen
bar_width.
All percentages are normalized so each column sums to 100%.
- Parameters:
dfs – List of DataFrameWithWeight dicts. The first entry is used as the baseline for comparison.
names – Names for each DataFrame (e.g., [“Target”, “Sample”]).
column – The numeric column name to plot.
weighted – Whether to use weights. Defaults to True.
n_bins – Number of histogram bins. Defaults to None, which auto-detects using Sturges’ rule.
bar_width – Maximum character width for bars. Defaults to None, which auto-detects based on terminal width.
- Returns:
ASCII comparative histogram text.
Example
>>> print(ascii_comparative_hist(dfs, names=["Target", "Sample"], ... column="income", n_bins=2, bar_width=20)) === income (numeric, comparative) === Range | Target (%) | Sample (%) --------------------------------------------------------------- [10.00, 25.00) | █████████████ 50.0 | █████████████▒▒▒▒▒▒▒ 75.0 [25.00, 40.00] | █████████████ 50.0 | ███████ ] 25.0 --------------------------------------------------------------- Total | 100.0 | 100.0
In the Sample column above, bin [10, 25) shows
▒excess (75% vs 50% baseline) while bin [25, 40] shows `` ]`` deficit (25% vs 50% baseline).
- balance.stats_and_plots.ascii_plots.ascii_plot_bar(dfs: List[DataFrameWithWeight], names: List[str], column: str, weighted: bool = True, bar_width: int | None = None, dist_type: str | None = None, separate_categories: bool = True) str[source]¶
Produces an ASCII grouped barplot for a single categorical variable.
Uses
relative_frequency_table()to compute weighted proportions for each dataset, then renders grouped horizontal bars.- How to read the output:
Each row is a category value. Within a row, each dataset gets its own bar drawn with a distinct fill character (
█,▓, etc.).The percentage at the end of each bar is the weighted proportion of that category within its dataset (i.e., proportions within each dataset sum to 100%).
Bar lengths are scaled so that the longest bar across all datasets spans the full
bar_width.
- Parameters:
dfs – List of DataFrameWithWeight dicts.
names – Names for each DataFrame (e.g., [“self”, “target”]).
column – The categorical column name to plot.
weighted – Whether to use weights. Defaults to True.
bar_width – Maximum character width for bars. Defaults to None, which auto-detects based on terminal width.
dist_type – Accepted for compatibility but only “hist_ascii” is supported. A warning is logged if any other value is passed.
separate_categories – If True, insert a blank line between categories for readability. Defaults to True.
- Returns:
ASCII barplot text for this variable.
Example
>>> df_a = pd.DataFrame({"color": ["red", "blue", "blue", "green"]}) >>> df_b = pd.DataFrame({"color": ["red", "red", "blue", "green"]}) >>> dfs = [ ... {"df": df_a, "weight": pd.Series([1.0, 1.0, 1.0, 1.0])}, ... {"df": df_b, "weight": pd.Series([1.0, 1.0, 1.0, 1.0])}, ... ] >>> print(ascii_plot_bar(dfs, names=["self", "target"], ... column="color", bar_width=20)) === color (categorical) === Category | sample population | blue | ████████████████████ (50.0%) | ▒▒▒▒▒▒▒▒▒▒ (25.0%) green | ██████████ (25.0%) | ▒▒▒▒▒▒▒▒▒▒ (25.0%) red | ██████████ (25.0%) | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ (50.0%) Legend: █ sample ▒ population Bar lengths are proportional to weighted frequency within each dataset.
- balance.stats_and_plots.ascii_plots.ascii_plot_dist(dfs: List[DataFrameWithWeight], names: List[str] | None = None, variables: List[str] | None = None, numeric_n_values_threshold: int = 15, weighted: bool = True, n_bins: int | None = None, bar_width: int | None = None, dist_type: str | None = None, separate_categories: bool = True, comparative: bool = True) str[source]¶
Produces ASCII text comparing weighted distributions across datasets.
Iterates over variables, classifying each as categorical or numeric (using the same logic as
seaborn_plot_dist()), then delegates to the appropriate plotting function.Two display modes are available for numeric variables:
comparative (
comparative=True, the default): numeric variables are rendered withascii_comparative_hist(), a columnar layout where the first dataset is the baseline and subsequent datasets show excess / deficit relative to it.grouped (
comparative=False): numeric variables are rendered withascii_plot_hist(), a grouped-bar layout where each dataset gets its own bar per bin (the same style used for categorical variables).
Categorical variables always use
ascii_plot_bar()regardless of this setting.The output is both printed to stdout and returned as a string.
- Parameters:
dfs – List of DataFrameWithWeight dicts.
names – Names for each DataFrame (e.g., [“self”, “unadjusted”, “target”]). If None, defaults to “df_0”, “df_1”, etc.
variables – Subset of variables to plot. None means all.
numeric_n_values_threshold – Columns with fewer unique values than this are treated as categorical. Defaults to 15.
weighted – Whether to use weights. Defaults to True.
n_bins – Number of bins for numeric histograms. Defaults to None, which auto-detects using Sturges’ rule.
bar_width – Maximum character width for the longest bar. Defaults to None, which auto-detects based on terminal width.
dist_type – Accepted for compatibility but only “hist_ascii” is supported. A warning is logged if any other value is passed.
separate_categories – If True, insert a blank line between categories in barplots for readability. Defaults to True.
comparative – If True (default), numeric variables use a columnar comparative histogram (
ascii_comparative_hist()) that highlights differences relative to a baseline dataset. If False, numeric variables use a grouped-bar histogram (ascii_plot_hist()) instead.
- Returns:
The full ASCII output text.
Examples
>>> import pandas as pd >>> from balance.stats_and_plots.ascii_plots import ascii_plot_dist >>> df_a = pd.DataFrame({ ... "color": ["red", "blue", "blue", "green"], ... "age": [10.0, 20.0, 30.0, 40.0], ... }) >>> df_b = pd.DataFrame({ ... "color": ["red", "red", "blue", "green"], ... "age": [10.0, 10.0, 10.0, 40.0], ... }) >>> dfs = [ ... {"df": df_a, "weight": pd.Series([1.0, 1.0, 1.0, 1.0])}, ... {"df": df_b, "weight": pd.Series([1.0, 1.0, 1.0, 1.0])}, ... ] >>> print(ascii_plot_dist(dfs, names=["self", "target"], ... numeric_n_values_threshold=0, n_bins=2, bar_width=20)) === color (categorical) === Category | population sample | blue | ██████████ (25.0%) | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ (50.0%) green | ██████████ (25.0%) | ▒▒▒▒▒▒▒▒▒▒ (25.0%) red | ████████████████████ (50.0%) | ▒▒▒▒▒▒▒▒▒▒ (25.0%) Legend: █ population ▒ sample Bar lengths are proportional to weighted frequency within each dataset. === age (numeric, comparative) === Range | population (%) | sample (%) --------------------------------------------------------------- [10.00, 25.00) | █████████████ 50.0 | █████████████▒▒▒▒▒▒▒ 75.0 [25.00, 40.00] | █████████████ 50.0 | ███████ ] 25.0 --------------------------------------------------------------- Total | 100.0 | 100.0 Key: █ = shared with population, ▒ = excess, ] = deficit
To use grouped-bar histograms (same style as categorical) instead of comparative histograms for numeric variables, pass
comparative=False:>>> print(ascii_plot_dist(dfs, names=["self", "target"], ... numeric_n_values_threshold=0, n_bins=2, bar_width=20, ... comparative=False)) === color (categorical) === ... === age (numeric) === Bin | population sample | [10.00, 25.00) | ████████████████████ (75.0%) | ▒▒▒▒▒▒▒▒▒▒▒▒▒ (50.0%) [25.00, 40.00] | ███████ (25.0%) | ▒▒▒▒▒▒▒▒▒▒▒▒▒ (50.0%) Legend: █ population ▒ sample Bar lengths are proportional to weighted frequency within each dataset.
- balance.stats_and_plots.ascii_plots.ascii_plot_hist(dfs: List[DataFrameWithWeight], names: List[str], column: str, weighted: bool = True, n_bins: int | None = None, bar_width: int | None = None, dist_type: str | None = None) str[source]¶
Produces an ASCII histogram for a single numeric variable.
Computes weighted histogram bins across all datasets using a shared bin range, then renders grouped horizontal bars for each bin.
- How to read the output:
Each row is a numeric bin range. Within a row, each dataset gets its own bar drawn with a distinct fill character (
█,▓, etc.).The percentage at the end of each bar is the weighted proportion of observations falling in that bin within its dataset (i.e., proportions within each dataset sum to 100%).
Bar lengths are scaled so that the longest bar across all datasets spans the full
bar_width.
- Parameters:
dfs – List of DataFrameWithWeight dicts.
names – Names for each DataFrame (e.g., [“self”, “target”]).
column – The numeric column name to plot.
weighted – Whether to use weights. Defaults to True.
n_bins – Number of histogram bins. Defaults to None, which auto-detects using Sturges’ rule.
bar_width – Maximum character width for bars. Defaults to None, which auto-detects based on terminal width.
dist_type – Accepted for compatibility but only “hist_ascii” is supported. A warning is logged if any other value is passed.
- Returns:
ASCII histogram text for this variable.
Example
>>> df_a = pd.DataFrame({"age": [10.0, 20.0, 30.0, 40.0]}) >>> df_b = pd.DataFrame({"age": [10.0, 10.0, 10.0, 40.0]}) >>> dfs = [ ... {"df": df_a, "weight": pd.Series([1.0, 1.0, 1.0, 1.0])}, ... {"df": df_b, "weight": pd.Series([1.0, 1.0, 1.0, 1.0])}, ... ] >>> print(ascii_plot_hist(dfs, names=["self", "target"], ... column="age", n_bins=2, bar_width=20)) === age (numeric) === Bin | sample population | [10.00, 25.00) | █████████████ (50.0%) | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ (75.0%) [25.00, 40.00] | █████████████ (50.0%) | ▒▒▒▒▒▒▒ (25.0%) Legend: █ sample ▒ population Bar lengths are proportional to weighted frequency within each dataset.