Skip to main content

Covariate Balancing Propensity Score (CBPS)

The Covariate Balancing Propensity Score (CBPS) algorithm estimates the propensity score in a way that maximizes the covariate balance as well as the prediction accuracy of sample inclusion - against some target population of interest. Its main advantage is in cases when the researcher wants better balance on the covariates than traditional propensity score methods - because one believes the assignment model might be misspecified and would like to avoid the need to fit followup models to improve the balance of the covariates.

References and implementation

Reference: Imai, K., & Ratkovic, M. (2014). Covariate balancing propensity score. Journal of the Royal Statistical Society: Series B: Statistical Methodology, 243-263. (link)

R package: (github repo )

The implementation of CBPS in balance is based on the R package, but is enhanced so to match balance's workflow by adding: features transformations, ability to bound the design effect by running a constrained optimization, and weight trimming.

For the implementation in balance see code.

The CBPS implementation in balance was written by Luke Sonnet and Tal Sarig.


Goal: Estimate the propensity score that will also result in maximizing the covariate balance.

Background: When estimating propensity score, there is often a process of adjusting the model and choosing the covariates for better covariate balancing. The goal of CBPS is to allow the researcher to avoid this iterative process and suggest an estimator that is optimizing both the propensity score and the balance of the covariates together.

Advantages of this method over propensity score methods:

  1. Preferable in the cases of misspecification of the propensity score model, which may lead to a bias in the estimated measure.
  2. Simple to adjust and extend to other settings in causal inference.
  3. Inherits theoretical properties of GMM (generalized method of moments) and EL (empirical likelihood), which offers some theoretical guarantees of the method.


A full description of the methodology and details are described in Imai and Ratkovic (2014). We provide here a short description of the methodology.

Consider a sample of respondents of size nn and a random sample from a target populaiton of size NN. For each iSampleTargeti \in Sample \cup Target, let IiI_i be the indicator for inclusion in sample (0 for target and 1 for sample) and XiX_i be a vector of observed covariates. The propensity score is defined as the conditional probability of being included in the sample conditioned on the covariates, P(Ii=1Xi=x)P(I_i=1 | X_i=x).

Let YiY_i be the potential outcome observed only for iSamplei\in Sample.


  1. The propensity is bounded away from 0 and 1 (all individuals have a theoretical probability to be in the respondents group): 0<P(Ii=1Xi=x)<10<P(I_i=1 | X_i=x)<1 for all xx.
  2. Ignorability assumption: ((Yi(0),Yi(1))Ii)Xi({(Y_i(0), Y_i(1))}\perp I_i) | X_i, where Yi(0)Y_i(0) indicates the response of unit ii if it is from the sample, and Yi(1)Y_i(1) indicates the hypothetical response of unit ii if it is from the target population. Rosenbaum and Rubin (1983) [2] showed that this assumption implies that the outcome is independent of the inclusion in the sample given the (theoretical) propensity score (this is the "dimension reduction" property of the propensity score). I.e.: ((Yi(0),Yi(1))Ii)P(Ii=1Xi=x)({(Y_i(0), Y_i(1))}\perp I_i) | P(I_i=1 | X_i=x).

Recap - Propensity score estimation

Using a logistic regression model, the propensity score is modeled by: πβ(Xi)=P(Ii=1Xi=x)=exp(XiTβ)1+exp(XiTβ)\pi _\beta(X_i)=P(I_i=1|X_i=x)=\frac{\exp(X_i ^T \beta)}{1+\exp(X_i ^T \beta)} for all iSamplei \in Sample.

This is estimated by maximizing the log-likelihood, which results in:

β^MLE=argmaxβi=1nIilog(πβ(Xi))+(1Ii)log(1πβ(Xi))\hat{\beta}_{MLE}=\arg\max_\beta \sum_{i=1}^n I_i\log(\pi_\beta(X_i))+(1-I_i)\log(1-\pi_\beta(X_i))

which implies the first order condition:

1ni=1n[Iiπβ(Xi)πβ(Xi)+(1Ii)πβ(Xi)1πβ(Xi)]=0\frac{1}{n}\sum_{i=1}^n \left[ \frac{I_i\pi^\prime_\beta(X_i)}{\pi_\beta(X_i)} +\frac{(1-I_i)\pi^\prime_\beta(X_i)}{1-\pi_\beta(X_i)}\right]=0

where the derivative of π\pi is by βT\beta^T. This condition can be viewed as a condition that balances a certain function of the covariates, in this case the derivative of the propensity score πβ(Xi)\pi^\prime_\beta(X_i).


Generally, we can expand the above to hold for any function ff: E{Iif(Xi)πβ(Xi)+(1Ii)f(Xi)1πβ(Xi)}=0\mathbb{E} \left\{ \frac{I_if(X_i)}{\pi_\beta(X_i)} +\frac{(1-I_i)f(X_i)}{1-\pi_\beta(X_i)}\right\} =0 (given the expectation exists). CBPS chooses f(x)=xf(x)=x as the balancing function ff in addition to the traditional logistic regression condition (this is what implemented in R and in balance), but generally any function the researcher may choose could be used here. The function f(x)=xf(x)=x results in balancing the first moment of each covariate

Estimation of CBPS

The estimation is done by using Generalized Methods of Moments (GMM): Given moments conditions of the form E{g(Xi,θ)}=0\mathbb{E}\{g(X_i,\theta)\}=0, the optimal solution minimizes the norm of the sample analog, 1ni=1ng(xi,θ)\frac{1}{n}\sum_{i=1}^n g(x_i,\theta), with respect to θ\theta. This results in an estimator of the form:

θ^=argminθ1ni=1ngT(xi,θ)Wg(xi,θ),\hat{\theta}=\arg\min_\theta \frac{1}{n}\sum_{i=1}^n g^T(x_i,\theta)Wg(x_i,\theta),

where WW is semi-definite positive matrix, often chosen to be the variance matrix W(θ)=(1ni=1ng(xi,θ)gT(xi,θ))1W(\theta)=\left(\frac{1}{n}\sum_{i=1}^n g(x_i,\theta)g^T(x_i,\theta)\right)^{-1} (which is unknown). This can be solved by iterative algorithm, by starting with W^=I\hat{W}=I, computing θ^\hat{\theta} and W(θ^)W(\hat{\theta}), and so on (for two-step GMM, we stop after optimizing to θ^\hat{\theta}). Alternatively, it can be solved by Continuously updating GMM algorithm, which estimate θ^\hat{\theta} and W^\hat{W} on the same time. The model is over-identified if the number of equations is larger than the number of parameters.

For CBPS, the sample analog for the covariate balancing moment condition is: 1ni=1n[Iixiπβ(xi)+(1Ii)xi1πβ(xi)]\frac{1}{n}\sum_{i=1}^n\left[\frac{I_ix_i}{\pi_\beta(x_i)} +\frac{(1-I_i)x_i}{1-\pi_\beta(x_i)}\right], which can be written as 1ni=1nIiπβ(Xi)πβ(xi)(1πβ(xi))xi=0\frac{1}{n}\sum_{i=1}^n\frac{I_i-\pi_\beta(X_i)}{\pi_\beta(x_i)(1-\pi_\beta(x_i))}x_i=0 (for Ii{0,1}I_i\in\{0,1\}).


gi(Ii,Xi)=(Iiπβ(Xi)πβ(Xi)(1πβ(Xi))πβ(Xi) Iiπβ(Xi)πβ(Xi)(1πβ(Xi))Xi )g_i(I_i,X_i)=\left(\begin{matrix} \frac{I_i-\pi_\beta(X_i)}{\pi_\beta(X_i)(1-\pi_\beta(X_i))}\pi^\prime_\beta(X_i)\ \\ \frac{I_i-\pi_\beta(X_i)}{\pi_\beta(X_i)(1-\pi_\beta(X_i))} X_i\ \end{matrix}\right)

be the vector representing the moments we would like to solve. This contains the two conditions of maximizing the log-likelihood and balancing the covariates. Note that this is over-identified, since the number of equations is larger then the number of parameters. Another option is to consider the “just-identified” ("exact") CBPS, where we consider only the covariate balancing conditions and not the propensity score condition.

Using GMM, we have

β^=argminβgˉTΣβ1gˉ\hat{\beta}=\arg\min_\beta \bar{g}^T \Sigma^{-1}_\beta \bar{g}

where gˉ=1ni=1ngi\bar{g}=\frac{1}{n}\sum_{i=1}^n g_i and Σβ=E[1ni=1ngigiTXi]\Sigma_\beta=\mathbb{E}\left[\frac{1}{n}\sum_{i=1}^n g_i g^T_i | X_i\right], which can be estimated by

Σ^β=1ni=1n(πβ(Xi)(1πβ(Xi))XiXiTXiXiTXiXiTπβ(Xi)(1πβ(Xi))XiXiT)\hat{\Sigma}_\beta=\frac{1}{n}\sum_{i=1}^n \left( \begin{matrix} \pi_\beta(X_i)(1-\pi_\beta(X_i))X_iX_i^T & X_i X_i^T \\ X_iX_i^T & \pi_\beta(X_i)(1-\pi_\beta(X_i))X_iX_i^T \end{matrix} \right)

To optimize this, we use the two-step GMM, using gradient-based optimization, starting with βMLE\beta^{MLE} (from the original logistic regression):

  1. β0=β^MLE\beta_0=\hat{\beta}_{MLE}
  2. W^0=Σβ01\hat{W}_0=\Sigma_{\beta_0}^{-1}
  3. β^=argminβgˉTW^β^0gˉ\hat{\beta}=\arg\min_\beta \bar{g}^T\hat{W}_{\hat{\beta}_0}\bar{g} - use gradient based optimization


[1] Imai, K., & Ratkovic, M. (2014). Covariate balancing propensity score. Journal of the Royal Statistical Society: Series B: Statistical Methodology, 243-263.

[2] PAUL R. ROSENBAUM, DONALD B. RUBIN, The central role of the propensity score in observational studies for causal effects, Biometrika, Volume 70, Issue 1, April 1983, Pages 41–55,