0.12.1 (2025-11-03)
New Features
- Added a welcome message when importing the package.
Documentation
- Added 'CHANGELOG' to the docs website. https://import-balance.org/docs/docs/CHANGELOG/
Bug Fixes
- Fixed plotly figures in all the tutorials. https://import-balance.org/docs/tutorials/
0.12.0 (2025-10-14)
New Features
- Support for Python 3.13 + 3.14
- Update setup.py and CI/CD integration to include Python 3.13 and 3.14.
- Remove upper version constraints from numpy, pandas, scipy, and scikit-learn dependencies for Python 3.12+.
Contributors
@talgalili, @wesleytlee
0.11.0 (2025-09-24)
New Features
- Python 3.12 support - Complete support for Python 3.12 alongside existing Python 3.9, 3.10, and 3.11 support (with CI/CD integration).
- Implemented Python version-specific dependency constraints - Added conditional version ranges for numpy, pandas, scipy, and scikit-learn that vary based on Python version (e.g., numpy>=1.21.0,<2.0 for Python <3.12, numpy>=1.24.0,<2.1 for Python >=3.12)
- Pandas compatibility improvements - Replaced
value_counts(dropna=False)withgroupby().size()in frequency table creation to avoid FutureWarning - Fixed various pandas deprecation warnings and improved DataFrame handling
- Improved raking algorithm - Completely refactored rake weighting from DataFrame-based to array-based ipfn algorithm using multi-dimensional arrays and itertools for better performance and compatibility with latest Python versions. Variables are now automatically alphabetized to ensure consistent results regardless of input order.
- poststratify method enhancement - New
strict_matchingparameter (default True) handles cases where sample cells are not present in target data. When False, issues warning and assigns weight 0 to uncovered samples
Bug Fixes
- Type annotations - Enhanced Pyre type hints throughout the codebase, particularly in utility functions
- Sample class improvements - Fixed weight type assignment (ensuring float64 type), improved DataFrame manipulation with
.infer_objects(copy=False)for pandas compatibility, and enhanced weight setting logic - Website dependencies - Updated various website dependencies including Docusaurus and related packages
Tests
Comprehensive test refactoring, including:
- Enhanced test validation - Added detailed explanations of test methodologies and expected behaviors in docstrings
- Improved test coverage - Tests now include edge cases like NaN handling, different data types, and error conditions
- Improved test organization (more granular) across all test modules (test_stats_and_plots.py, test_balancedf.py, test_ipw.py, test_rake.py, test_cli.py, test_weighted_comparisons_plots.py, test_cbps.py, test_testutil.py, test_adjustment.py, test_util.py, test_sample.py)
- Updated GitHub workflows to include Python 3.12 in build and test matrix
- Fix 261 "pandas deprecation" warnings!
- Added type annotations - Converted test_balancedf.py to pyre-strict with.
Documentation
- GitHub issue template for support questions - Added structured template to help users ask questions about using the balance package
Contributors
@talgalili, @wesleytlee, @dependabot
0.10.0 (2025-01-06)
New Features
- Dependency on glmnet has been removed, and the
ipwmethod now uses sklearn. - The transition to sklearn should enable support for newer python versions (3.11) as well as the Windows OS!
ipwmethod uses logistic regression with L2-penalties instead of L1-penalties for computational reasons. The transition from glmnet to sklearn and use of L2-penalties will lead to slightly different generated weights compared to previous versions of Balance.- Unfortunately, the sklearn-based
ipwmethod is generally slower than the previous version by 2-5x. Consider using the new argumentslambda_min,lambda_max, andnum_lambdasfor a more efficient search over theipwpenalization space.
Misc
- Update license from GPL v2 to MIT license.
- Updated Python and package compatibility. Balance is now compatible with Python 3.11, but no longer compatible with Python 3.8 due to typing errors. Balance is currently incompatible with Python 3.12 due to the removal of distutils.
Contributors
@wesleytlee, @talgalili, @SarigT
0.9.1 (2023-07-30)
Bug Fixes
- Fix E721 flake8 issue (see: https://github.com/facebookresearch/balance/actions/runs/5704381365/job/15457952704)
- Remove support for python 3.11 from release.yml
Documentation
- Added links to presentation given at ISA 2023.
- Fixed misc typos.
0.9.0 (2023-05-22)
News
- Remove support for python 3.11 due to new test failures. This will be the case until glmnet will be replaced by sklearn. hopefully before end of year.
New Features
- All plotly functions: add kwargs to pass arguments to update_layout in all plotly figures. This is useful to control width and height of the plot. For example, when wanting to save a high resolution of the image.
- Add a
summarymethods toBalanceWeightsDF(i.e.:Sample.weights().summary()) to easily get access to summary statistics of the survey weights. Also, it means thatSample.diagnostics()now uses this new summary method in its internal implementation. BalanceWeightsDF.plotmethod now relies on the defaultBalanceDF.plotmethod. This means that instead of a static seaborn kde plot we'll get an interactive plotly version.
Bug Fixes
- datasets
- Remove a no-op in
load_dataand accommodate deprecation of pandas syntax by using a list rather than a set when selecting df columns (thanks @ahakso for the PR). - Make the outcome variable (
happiness) be properly displayed in the tutorials (so we can see the benefit of the weighting process). This included fixing the simulation code in the target.
- Remove a no-op in
- Fix
Sample.outcomes().summary()so it will output the ci columns without truncating them.
Documentation
- Fix text based on updated from version 0.7.0 and 0.8.0.
- Fix tutorials to include the outcome in the target.
Contributors
@talgalili, @SarigT, @ahakso
0.8.0 (2023-04-26)
New Features
- Add
rakemethod to .adjust (currently in beta, given that it doesn't handles marginal target as input). - Add a new function
prepare_marginal_dist_for_raking- to take in a dict of marginal proportions and turn them into a pandas DataFrame. This can serve as an input target population for raking.
Misc
- The
ipwfunction now gets max_de=None as default (instead of 1.5). This version is faster, and the user can still choose a threshold as desired. - Adding hex stickers graphics files
Documentation
- New section on raking.
- New notebook (in the tutorial section):
- quickstart_rake - like the quickstart tutorial, but shows how to use the rake (raking) algorithm and compares the results to IPW (logistic regression with LASSO).
Contributors
@talgalili, @SarigT
0.7.0 (2023-04-10)
New Features
- Add
plotly_plot_densityfunction: Plots interactive density plots of the given variables using kernel density estimation. - Modified
plotly_plot_distandplot_distto also support 'kde' plots. Also, these are now the default options. This automatically percolates toBalanceDF.plot()methods. Sample.from_framecan now guess that a column called "weights" is a weight column (instead of only guessing so if the column is called "weight").
Bug Fixes
- Fix
rm_mutual_nas: it now remembers the index of pandas.Series that were used as input. This fixed erroneous plots produced by seaborn functions which uses rm_mutual_nas. - Fix
plot_hist_kdeto work when dist_type = "ecdf" - Fix
plot_hist_kdeandplot_barwhen having an input only with "self" and "target", by fixing_return_sample_palette.
Misc
- All plotting functions moved internally to expect weight column to be called
weight, instead ofweights. - All adjust (ipw, cbps, poststratify, null) functions now export a dict with a key called
weightinstead ofweights.
Contributors
@talgalili, @SarigT
0.6.0 (2023-04-05)
New Features
- Variance of the weighted mean
- Add the
var_of_weighted_meanfunction (from balance.stats_and_plots.weighted_stats import var_of_weighted_mean): Computes the variance of the weighted average (pi estimator for ratio-mean) of a list of values and their corresponding weights.- Added the
var_of_meanoption to stat in thedescriptive_statsfunction (based onvar_of_weighted_mean) - Added the
.var_of_mean()method to BalanceDF.
- Added the
- Add the
ci_of_weighted_meanfunction (from balance.stats_and_plots.weighted_stats import ci_of_weighted_mean): Computes the confidence intervals of the weighted mean using the (just added) variance of the weighted mean.- Added the
ci_of_meanoption to stat in thedescriptive_statsfunction (based onci_of_weighted_mean). Also added kwargs support. - Added the
.ci_of_mean()method to BalanceDF. - Added the
.mean_with_ci()method to BalanceDF. - Updated
.summary()methods to include the output ofci_of_mean.
- Added the
- Add the
- All bar plots now have an added ylim argument to control the limits of the y axis.
For example use:
plot_dist(dfs1, names=["self", "unadjusted", "target"], ylim = (0,1))Or this:s3_null.covars().plot(ylim = (0,1)) - Improve 'choose_variables' function to control the order of the returned variables
- The return type is now a list (and not a Tuple)
- The order of the returned list is based on the variables argument. If it is not supplied, it is based on the order of the column names in the DataFrames. The df_for_var_order arg controls which df to use.
- Misc
- The
_prepare_input_model_matrixand downstream functions (e.g.:model_matrix,sample.outcomes().mean(), etc) can now handle DataFrame with special characters in the column names, by replacing special characters with '_' (or '_i', if we end up with columns with duplicate names). It also handles cases in which the column names have duplicates (using the new_make_df_column_names_uniquefunction). - Improve choose_variables to control the order of the returned variables
- The return type is now a list (and not a Tuple)
- The order of the returned list is based on the variables argument. If it is not supplied, it is based on column names in the DataFrames. The df_for_var_order arg controls which df to use.
- The
Contributors
@talgalili, @SarigT
0.5.0 (2023-03-06)
New Features
- The
datasets.load_datafunction now also supports the input "sim_data_cbps", which loads the simulated data used in the CBPS R vs Python tutorial. It is also used in unit-testing to compare the CBPS weights produced from Python (i.e.: balance) with R (i.e.: the CBPS package). The testing shows how the correlation of the weights from the two implementations (both Pearson and Spearman) produce a correlation of >0.98. - cli improvements:
- Add an option to set formula (as string) in the cli.
Documentation
- New notebook (in the tutorial section):
- Comparing results of fitting CBPS between R's
CBPSpackage and Python'sbalancepackage (using simulated data). link
- Comparing results of fitting CBPS between R's
Contributors
@stevemandala, @talgalili, @SarigT
0.4.0 (2023-02-08)
New Features
- Added two new flags to the cli:
--standardize_types: This gives cli users the ability to set thestandardize_typesparameter in Sample.from_frame to True or False. To learn more about this parameter, see: https://import-balance.org/api_reference/html/balance.sample_class.html#balance.sample_class.Sample.from_frame--return_df_with_original_dtypes: the Sample object now stores the dtypes of the original df that was read using Sample.from_frame. This can be used to restore the original dtypes of the file output from the cli. This is relevant in cases in which we want to convert back the dtypes of columns from how they are stored in Sample, to their original types (e.g.: if something was Int32 it would be turned in float32 in balance.Sample, and using the new flag will return that column, when using the cli, to be back in the Int32 type). This feature may not be robust to various edge cases. So use with caution.
- In the logging:
- Added warnings about dtypes changes. E.g.: if using Sample.from_frame with a column that has Int32, it will be turned into float32 in the internal storage of sample. Now there will be a warning message indicating of this change.
- Increase the default length of logger printing (from 500 to 2000)
Bug Fixes
- Fix pandas warning: SettingWithCopyWarning in from_frame (and other places in sample_class.py)
- sample.from_frame has a new argument
use_deepcopyto decide if changes made to the df inside the sample object would also change the original df that was provided to the sample object. The default is now set toTruesince it's more likely that we'd like to keep the changes inside the sample object to the df contained in it, and not have them spill into the original df.
Contributors
@SarigT, @talgalili
0.3.1 (2023-02-01)
Bug Fixes
- Sample.from_frame now also converts int16 and in8 to float16 and float16. Thus helping to avoid
TypeError: Cannot interpret 'Int16Dtype()' as a data typestyle errors.
Documentation
- Added ISSUE_TEMPLATE
Contributors
@talgalili, @stevemandala, @SarigT
0.3.0 (2023-01-30)
New Features
- Added compatibility for Python 3.11 (by supporting SciPy 1.9.2) (props to @tomwagstaff-opml for flagging this issue).
- Added the
session-infopackage as a dependency.
Bug Fixes
- Fixed pip install from source on Windows machines (props to @tomwagstaff-opml for the bug report).
Documentation
- Added
session_info.show()outputs to the end of the three tutorials (at: https://import-balance.org/docs/tutorials/) - Misc updates to the README.
Contributors
@stevemandala, @SarigT, @talgalili
0.2.0 (2023-01-19)
New Features
- cli improvements:
- Add an option to set weight_trimming_mean_ratio = None for no trimming.
- Add an option to set transformations to be None (i.e. no transformations).
- Add an option to adapt the title in:
- stats_and_plots.weighted_comparison_plots.plot_bar
- stats_and_plots.weighted_comparison_plots.plot_hist_kde
Bug Fixes
- Fix (and simplify) balanceDF.plot to organize the order of groups (now unadjusted/self is left, adjusted/self center, and target is on the right)
- Fix plotly functions to use the red color for self when only compared to target (since in that case it is likely unadjusted): balance.stats_and_plots.weighted_comparisons_plots.plotly_plot_qq and balance.stats_and_plots.weighted_comparisons_plots.plotly_plot_bar
- Fix seaborn_plot_dist: output None by default (instead of axis object). Added a return_Axes argument to control this behavior.
- Fix some test_cbps tests that were failing due to non-exact matches (we made the test less sensitive)
Documentation
- New blog section, with the post: Bringing "balance" to your data
- New tutorial:
- quickstart_cbps - like the quickstart tutorial, but shows how to use the CBPS algorithm and compares the results to IPW (logistic regression with LASSO).
- balance_transformations_and_formulas - This tutorial showcases ways in which transformations, formulas and penalty can be included in your pre-processing of the covariates before adjusting for them.
- API docs:
- New: highlighting on codeblocks
- a bunch of text fixes.
- Update README.md
- logo
- with contributors
- typo fixes (props to @zbraiterman and @luca-martial).
- Added section about "Releasing a new version" to CONTRIBUTING.md
- Available under "Docs/Contributing" section of website
Misc
- Added automated Github Action package builds & deployment to PyPi on release.
- See release.yml
Contributors
@stevemandala, @SarigT, @talgalili
0.1.0 (2022-11-20)
Summary
- balance released to the world!
Contributors
@SarigT, @talgalili, @stevemandala