Skip to main content

ยท 7 min read

๐ŸŽ‰ balance v0.15.0 is out!

What is balance?โ€‹

balance is a Python package (from Meta) offering a simple workflow and methods for dealing with biased data samples when looking to infer from them to some population of interest. Biased samples often occur in survey statistics when respondents present non-response bias or surveys suffer from sampling bias (that are not missing completely at random). A similar issue arises in observational studies when comparing treated vs untreated groups, and in any data that suffers from selection bias.

Highlights from v0.15.0 (since v0.12.0):โ€‹

โœ… More control over modeling: The ability to run any sklearn model (instead of just LogisticRegression) to fit inverse-propensity-score weights. Plus formula-driven summaries and explicit missing-data handling.

โœ… Stronger diagnostics: The way weights influence covariate imbalance can now be evaluated not just with ASMD (as before), but also with various distribution distance metrics (KLD, EMD, CVMD, KS).

โœ… Reliable code: Test coverage was increased to 100%, with full type-checking across the whole codebase. Plus CLI enhancements and improved docs/tutorials.

balance_logo_horizontal

ยท 4 min read

tl;dr โ€“ balance v0.12.0

We're excited to announce balance v0.12.0! Since our initial release, balance has evolved into a comprehensive Python package for adjusting biased samples. This post highlights the most significant improvements from v0.1.0 (2022-11-20) through v0.12.0 (2025-10-14), showcasing how we've made balance easier to use:

  • Expanded compatibility: Now supports Python 3.9โ€“3.14 on Windows, macOS, and Linux, with smarter dependency management and a switch to the MIT license.
  • Major upgrades: Improved statistical methods (IPW, raking, poststratification), interactive Plotly visualizations, and new variance/confidence interval tools.
  • Better experience: Enhanced CLI, bug fixes, and expanded docs/tutorials for easier use and learning.

balance_logo_horizontal

ยท 5 min read

In research and data science, we sometimes encounter biased data: that is, data that has not been sampled completely randomly and suffers from an over- or under-indexing toward the population of interest. Survey data is an example in this regard. Surveys play an important role in providing measurements on subjective user experience indicators, such as sentiment and opinions, which cannot be measured by other means. But because survey data is collected from a self-selected group of participants, it needs to be analyzed carefully.