Introduction
In an ideal world, we would evaluate the impact of an intervention using a randomised controlled trial (RCT). Random assignment balances the treatment and control groups, making it easier to attribute outcome differences to the intervention. In many business, healthcare, and public policy settings, however, randomisation is not feasible. Instead, we rely on observational data where individuals self-select into a treatment, or where treatment assignment is influenced by operational constraints. This introduces confounding: treated and untreated groups differ in meaningful ways before the treatment even happens. Propensity scoring provides a practical set of methods to reduce this bias and estimate treatment effects more credibly. For learners building causal thinking through a data analyst course, propensity scores are an essential bridge between predictive modelling and decision-focused analytics.
The Core Challenge in Observational Studies
Suppose you want to measure whether a marketing campaign increases conversions. People who saw the campaign might already be more engaged customers, making conversions higher even without the campaign. Similarly, a new clinical protocol might be given more often to patients with certain risk profiles. In both cases, naive comparisons (treated vs untreated) mix the treatment effect with pre-existing differences.
Propensity score methods aim to correct for this by creating comparable groups based on observed covariates. The key assumption is no unmeasured confounding: once you control for relevant observed features, treatment assignment is “as if random” within strata of those features. This is a strong assumption, but when it is plausible and carefully checked, propensity scoring can significantly improve causal estimates.
What Is a Propensity Score?
A propensity score is the probability that an individual receives the treatment given their observed characteristics:
e(x)=P(T=1∣X=x)e(x) = P(T = 1 \mid X = x)e(x)=P(T=1∣X=x)
Where:
- TTT is the treatment indicator (1 = treated, 0 = not treated)
- XXX is a vector of observed covariates (age, prior purchases, baseline health indicators, etc.)
The important result is that if treatment assignment is ignorable given XXX, then it is also ignorable given the propensity score e(X)e(X)e(X). In practical terms, instead of matching or balancing on a high-dimensional covariate set, you can balance on a single score.
Estimating the propensity score usually involves a classification model: logistic regression is common, but tree-based models, gradient boosting, or other classifiers can also be used if they improve balance.
Common Propensity Scoring Techniques
There are several ways to use propensity scores. The choice depends on data size, overlap between groups, and the estimand you want (such as average treatment effect or effect on the treated).
1) Propensity Score Matching
Matching pairs treated individuals with similar untreated individuals based on propensity score. Common matching variants include nearest neighbour matching, caliper matching (only match within a small score range), and matching with replacement.
Strengths: intuitive and often produces well-balanced groups.
Limitations: may discard unmatched observations; results depend on matching choices.
2) Inverse Probability of Treatment Weighting (IPTW)
IPTW reweights observations to create a pseudo-population in which treatment assignment is independent of covariates. Treated units get weight 1/e(X)1/e(X)1/e(X), untreated units get weight 1/(1−e(X))1/(1-e(X))1/(1−e(X)).
Strengths: uses more of the data than strict matching and directly targets average treatment effects.
Limitations: extreme weights can occur when e(X)e(X)e(X) is very close to 0 or 1; trimming or stabilised weights may be needed.
3) Stratification (Subclassification)
Here you divide observations into strata (for example quintiles) based on propensity score. You estimate treatment effects within each stratum and then combine them.
Strengths: simple, transparent, and often robust for communication.
Limitations: coarse stratification can leave residual imbalance; more strata can help but requires enough data.
4) Covariate Adjustment Using the Propensity Score
You include the propensity score as a covariate in an outcome regression model. While easy, this method can be less reliable than balancing methods if the outcome model is misspecified.
A practical workflow is to combine methods: for example, propensity weighting plus a well-specified outcome regression (a “doubly robust” approach), which can provide protection if either the propensity model or the outcome model is correctly specified.
Balance Diagnostics: The Step You Cannot Skip
A common mistake is to treat the propensity score model as a prediction problem and optimise accuracy. The goal is not to predict treatment perfectly; it is to achieve covariate balance between treated and control groups.
Key diagnostics include:
- Standardised mean differences (SMD): compare covariate means across groups; values below ~0.1 are often used as a balance target.
- Propensity score overlap: check whether treated and untreated groups share a common support region. Lack of overlap means the effect is not identifiable for those regions.
- Weight distribution (for IPTW): inspect extreme weights; consider trimming or stabilising.
These diagnostics are exactly the kind of disciplined practice that a data analysis course in Pune should encourage: not just running models, but validating assumptions and checking whether the method achieved its purpose.
Interpreting Results and Limits
Propensity scores help address confounding from observed covariates, but they cannot correct for unmeasured confounders. If a critical variable is missing (for example, customer intent or disease severity that is not recorded), bias may remain. Sensitivity analysis, domain knowledge, and careful feature selection are necessary to support credible conclusions.
Also be explicit about what effect you estimate. Matching on treated units often targets the treatment effect on the treated, while certain weighting setups target the overall average treatment effect. Clear reporting matters.
Conclusion
Propensity scoring techniques offer practical tools for estimating treatment effects when randomised experiments are not possible. By modelling the probability of treatment assignment and using that score for matching, weighting, or stratification, analysts can reduce confounding and produce more credible causal estimates from observational data. The value of these methods lies not just in computation, but in disciplined validation: balance checks, overlap assessment, and transparent interpretation. For professionals advancing through a data analyst course or applying causal methods learnt in a data analysis course in Pune, propensity scores are a foundational capability for evidence-based decision-making in real-world environments.
Contact Us:
Business Name: Elevate Data Analytics
Address: Office no 403, 4th floor, B-block, East Court Phoenix Market City, opposite GIGA SPACE IT PARK, Clover Park, Viman Nagar, Pune, Maharashtra 411014
Phone No.:095131 73277

