06/12/2025 –, Main Stream Langue: English
Stratified sampling helps ensure representative data subsets, especially when working with imbalanced groups. In this talk, we’ll explore its use in survey analysis, demonstrate Python-based implementation, and share best practices for improving data reliability and reducing bias. Attendees will gain a clear understanding of when and how to use stratified sampling effectively in real-world scenarios.
Whether you're conducting surveys, building predictive models, or working with imbalanced datasets, stratified sampling is a powerful technique to ensure your data is representative and your insights are reliable.
In survey research, especially within diverse populations like college students, simple random sampling can lead to underrepresentation of key subgroups. Stratified sampling addresses this by dividing the population into distinct strata (e.g., age groups, majors, demographics) and sampling proportionally from each. This approach reduces sampling bias and improves the reliability of statistical estimates.
In this presentation, I will demonstrate how to use Python to implement stratified sampling in a student survey at our college. By maintaining proportional representation across age groups, we ensured that our sample reflected the diversity of the student body. I’ll walk through the conceptual foundations of stratified sampling, compare it with simple random sampling, and highlight its advantages in survey research and data analysis.
We’ll also explore how to calculate weighted means when sample proportions differ from population proportions, which is a crucial step in producing unbiased estimates. Code examples and visualizations will be shared via GitHub to support hands-on learning and reproducibility.
Learning Outcomes:
1. Understand when to use stratified sampling over random sampling
2. Learn how to define subgroups (strata) and implement sampling in Python
3. Evaluate sample representativeness by checking distribution patterns
4. Apply weights to adjust for unequal proportions in score estimation
Cary Jim holds a Ph.D. in Information Science with a concentration in Data Science. She brings a strong background in user research and analytics, applying a holistic approach to understanding user characteristics and behavior across diverse environments. Cary is passionate about leveraging data to drive meaningful insights and enjoys getting hands-on with the latest technology tools. Outside of work, she loves experimenting with new recipes in the kitchen and unwinding with British detective shows.
