balance.stats_and_plots.ascii_plots

balance.stats_and_plots.ascii_plots.ascii_comparative_hist(dfs: List[DataFrameWithWeight], names: List[str], column: str, weighted: bool = True, n_bins: int | None = None, bar_width: int | None = None) str[source]

Produces a columnar, baseline-relative ASCII histogram.

The first dataset is the baseline. Subsequent datasets show bars split into segments that indicate how each bin compares to the baseline.

How to read the output:

Each row is a bin range. The first column is the baseline dataset, shown with solid bars. For every other column:

  • (solid fill) = the portion of the bar that matches the baseline proportion. This is the “common” part.

  • (medium shade) = the portion that exceeds the baseline. The bin has more mass than the baseline in this range.

  • `` ]`` (right bracket) = a deficit relative to the baseline. The gap before the bracket shows how much mass is missing compared to the baseline in this range.

  • A number without any bar means the percentage is too small to render at the chosen bar_width.

All percentages are normalized so each column sums to 100%.

Parameters:
  • dfs – List of DataFrameWithWeight dicts. The first entry is used as the baseline for comparison.

  • names – Names for each DataFrame (e.g., [“Target”, “Sample”]).

  • column – The numeric column name to plot.

  • weighted – Whether to use weights. Defaults to True.

  • n_bins – Number of histogram bins. Defaults to None, which auto-detects using Sturges’ rule.

  • bar_width – Maximum character width for bars. Defaults to None, which auto-detects based on terminal width.

Returns:

ASCII comparative histogram text.

Example

>>> print(ascii_comparative_hist(dfs, names=["Target", "Sample"],
...       column="income", n_bins=2, bar_width=20))
=== income (numeric, comparative) ===

Range          | Target (%)         | Sample (%)
---------------------------------------------------------------
[10.00, 25.00) | █████████████ 50.0 | █████████████▒▒▒▒▒▒▒ 75.0
[25.00, 40.00] | █████████████ 50.0 | ███████     ] 25.0
---------------------------------------------------------------
Total          | 100.0              | 100.0

In the Sample column above, bin [10, 25) shows excess (75% vs 50% baseline) while bin [25, 40] shows `` ]`` deficit (25% vs 50% baseline).

balance.stats_and_plots.ascii_plots.ascii_plot_bar(dfs: List[DataFrameWithWeight], names: List[str], column: str, weighted: bool = True, bar_width: int | None = None, dist_type: str | None = None, separate_categories: bool = True) str[source]

Produces an ASCII grouped barplot for a single categorical variable.

Uses relative_frequency_table() to compute weighted proportions for each dataset, then renders grouped horizontal bars.

How to read the output:

Each row is a category value. Within a row, each dataset gets its own bar drawn with a distinct fill character (, , etc.).

  • The percentage at the end of each bar is the weighted proportion of that category within its dataset (i.e., proportions within each dataset sum to 100%).

  • Bar lengths are scaled so that the longest bar across all datasets spans the full bar_width.

Parameters:
  • dfs – List of DataFrameWithWeight dicts.

  • names – Names for each DataFrame (e.g., [“self”, “target”]).

  • column – The categorical column name to plot.

  • weighted – Whether to use weights. Defaults to True.

  • bar_width – Maximum character width for bars. Defaults to None, which auto-detects based on terminal width.

  • dist_type – Accepted for compatibility but only “hist_ascii” is supported. A warning is logged if any other value is passed.

  • separate_categories – If True, insert a blank line between categories for readability. Defaults to True.

Returns:

ASCII barplot text for this variable.

Example

>>> df_a = pd.DataFrame({"color": ["red", "blue", "blue", "green"]})
>>> df_b = pd.DataFrame({"color": ["red", "red", "blue", "green"]})
>>> dfs = [
...     {"df": df_a, "weight": pd.Series([1.0, 1.0, 1.0, 1.0])},
...     {"df": df_b, "weight": pd.Series([1.0, 1.0, 1.0, 1.0])},
... ]
>>> print(ascii_plot_bar(dfs, names=["self", "target"],
...       column="color", bar_width=20))
=== color (categorical) ===

Category | sample  population
         |
blue     | ████████████████████ (50.0%)
         | ▒▒▒▒▒▒▒▒▒▒ (25.0%)

green    | ██████████ (25.0%)
         | ▒▒▒▒▒▒▒▒▒▒ (25.0%)

red      | ██████████ (25.0%)
         | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ (50.0%)

Legend: █ sample  ▒ population
Bar lengths are proportional to weighted frequency within each dataset.
balance.stats_and_plots.ascii_plots.ascii_plot_dist(dfs: List[DataFrameWithWeight], names: List[str] | None = None, variables: List[str] | None = None, numeric_n_values_threshold: int = 15, weighted: bool = True, n_bins: int | None = None, bar_width: int | None = None, dist_type: str | None = None, separate_categories: bool = True, comparative: bool = True) str[source]

Produces ASCII text comparing weighted distributions across datasets.

Iterates over variables, classifying each as categorical or numeric (using the same logic as seaborn_plot_dist()), then delegates to the appropriate plotting function.

Two display modes are available for numeric variables:

  • comparative (comparative=True, the default): numeric variables are rendered with ascii_comparative_hist(), a columnar layout where the first dataset is the baseline and subsequent datasets show excess / deficit relative to it.

  • grouped (comparative=False): numeric variables are rendered with ascii_plot_hist(), a grouped-bar layout where each dataset gets its own bar per bin (the same style used for categorical variables).

Categorical variables always use ascii_plot_bar() regardless of this setting.

The output is both printed to stdout and returned as a string.

Parameters:
  • dfs – List of DataFrameWithWeight dicts.

  • names – Names for each DataFrame (e.g., [“self”, “unadjusted”, “target”]). If None, defaults to “df_0”, “df_1”, etc.

  • variables – Subset of variables to plot. None means all.

  • numeric_n_values_threshold – Columns with fewer unique values than this are treated as categorical. Defaults to 15.

  • weighted – Whether to use weights. Defaults to True.

  • n_bins – Number of bins for numeric histograms. Defaults to None, which auto-detects using Sturges’ rule.

  • bar_width – Maximum character width for the longest bar. Defaults to None, which auto-detects based on terminal width.

  • dist_type – Accepted for compatibility but only “hist_ascii” is supported. A warning is logged if any other value is passed.

  • separate_categories – If True, insert a blank line between categories in barplots for readability. Defaults to True.

  • comparative – If True (default), numeric variables use a columnar comparative histogram (ascii_comparative_hist()) that highlights differences relative to a baseline dataset. If False, numeric variables use a grouped-bar histogram (ascii_plot_hist()) instead.

Returns:

The full ASCII output text.

Examples

>>> import pandas as pd
>>> from balance.stats_and_plots.ascii_plots import ascii_plot_dist
>>> df_a = pd.DataFrame({
...     "color": ["red", "blue", "blue", "green"],
...     "age": [10.0, 20.0, 30.0, 40.0],
... })
>>> df_b = pd.DataFrame({
...     "color": ["red", "red", "blue", "green"],
...     "age": [10.0, 10.0, 10.0, 40.0],
... })
>>> dfs = [
...     {"df": df_a, "weight": pd.Series([1.0, 1.0, 1.0, 1.0])},
...     {"df": df_b, "weight": pd.Series([1.0, 1.0, 1.0, 1.0])},
... ]
>>> print(ascii_plot_dist(dfs, names=["self", "target"],
...       numeric_n_values_threshold=0, n_bins=2, bar_width=20))
=== color (categorical) ===

Category | population  sample
         |
blue     | ██████████ (25.0%)
         | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ (50.0%)

green    | ██████████ (25.0%)
         | ▒▒▒▒▒▒▒▒▒▒ (25.0%)

red      | ████████████████████ (50.0%)
         | ▒▒▒▒▒▒▒▒▒▒ (25.0%)

Legend: █ population  ▒ sample
Bar lengths are proportional to weighted frequency within each dataset.

=== age (numeric, comparative) ===

Range          | population (%)     | sample (%)
---------------------------------------------------------------
[10.00, 25.00) | █████████████ 50.0 | █████████████▒▒▒▒▒▒▒ 75.0
[25.00, 40.00] | █████████████ 50.0 | ███████     ] 25.0
---------------------------------------------------------------
Total          | 100.0              | 100.0

Key: █ = shared with population, ▒ = excess,    ] = deficit

To use grouped-bar histograms (same style as categorical) instead of comparative histograms for numeric variables, pass comparative=False:

>>> print(ascii_plot_dist(dfs, names=["self", "target"],
...       numeric_n_values_threshold=0, n_bins=2, bar_width=20,
...       comparative=False))
=== color (categorical) ===
...
=== age (numeric) ===

Bin            | population  sample
               |
[10.00, 25.00) | ████████████████████ (75.0%)
               | ▒▒▒▒▒▒▒▒▒▒▒▒▒ (50.0%)
[25.00, 40.00] | ███████ (25.0%)
               | ▒▒▒▒▒▒▒▒▒▒▒▒▒ (50.0%)

Legend: █ population  ▒ sample
Bar lengths are proportional to weighted frequency within each dataset.
balance.stats_and_plots.ascii_plots.ascii_plot_hist(dfs: List[DataFrameWithWeight], names: List[str], column: str, weighted: bool = True, n_bins: int | None = None, bar_width: int | None = None, dist_type: str | None = None) str[source]

Produces an ASCII histogram for a single numeric variable.

Computes weighted histogram bins across all datasets using a shared bin range, then renders grouped horizontal bars for each bin.

How to read the output:

Each row is a numeric bin range. Within a row, each dataset gets its own bar drawn with a distinct fill character (, , etc.).

  • The percentage at the end of each bar is the weighted proportion of observations falling in that bin within its dataset (i.e., proportions within each dataset sum to 100%).

  • Bar lengths are scaled so that the longest bar across all datasets spans the full bar_width.

Parameters:
  • dfs – List of DataFrameWithWeight dicts.

  • names – Names for each DataFrame (e.g., [“self”, “target”]).

  • column – The numeric column name to plot.

  • weighted – Whether to use weights. Defaults to True.

  • n_bins – Number of histogram bins. Defaults to None, which auto-detects using Sturges’ rule.

  • bar_width – Maximum character width for bars. Defaults to None, which auto-detects based on terminal width.

  • dist_type – Accepted for compatibility but only “hist_ascii” is supported. A warning is logged if any other value is passed.

Returns:

ASCII histogram text for this variable.

Example

>>> df_a = pd.DataFrame({"age": [10.0, 20.0, 30.0, 40.0]})
>>> df_b = pd.DataFrame({"age": [10.0, 10.0, 10.0, 40.0]})
>>> dfs = [
...     {"df": df_a, "weight": pd.Series([1.0, 1.0, 1.0, 1.0])},
...     {"df": df_b, "weight": pd.Series([1.0, 1.0, 1.0, 1.0])},
... ]
>>> print(ascii_plot_hist(dfs, names=["self", "target"],
...       column="age", n_bins=2, bar_width=20))
=== age (numeric) ===

Bin            | sample  population
               |
[10.00, 25.00) | █████████████ (50.0%)
               | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ (75.0%)
[25.00, 40.00] | █████████████ (50.0%)
               | ▒▒▒▒▒▒▒ (25.0%)

Legend: █ sample  ▒ population
Bar lengths are proportional to weighted frequency within each dataset.