Code Author: Shuhan Yang
This project investigates whether Uber services act as a substitute or complement to public transit systems across different metropolitan areas. Using advanced machine learning techniques and panel data analysis, we estimate the causal effect of Uber's presence on public transit ridership.
Key Finding: Uber demonstrates a statistically significant 4.28% increase in public transit ridership, suggesting a complementary rather than substitutive relationship.
- Dataset Size: 76,000+ panel observations
- Coverage: Multiple Metropolitan Statistical Areas (MSAs)
- Time Period: Multi-year panel data
- Key Variables: Transit ridership, Uber market presence, demographic controls, economic indicators
- Double Machine Learning (DML): Applied to estimate causal effects while controlling for high-dimensional confounders
- Panel Data Analysis: Exploited temporal and cross-sectional variation
- Robustness Check: Multiple model specifications and sample restrictions
- Lasso Regression: For feature selection and regularization
- Random Forest: For non-linear pattern detection
- Cross-Validation: For model selection and hyperparameter tuning
- Advanced Econometric Methods: Double machine learning implementation for causal inference
- Comprehensive Robustness Checks: Multiple model specifications and sample restrictions
- Heterogeneity Analysis: Treatment effect variation by agency and MSA characteristics
- Scalable Data Processing: Efficient handling of 76,000+ observations
- Statistical Validation: Rigorous testing of model assumptions and results
- Data Preprocessing: Panel data cleaning and variable construction
- Feature Engineering: Creation of interaction terms and control variables
- Model Training: Implementation of ML algorithms with cross-validation
- Causal Estimation: Double machine learning for treatment effect estimation
- Robustness Testing: Multiple specifications and sample restrictions
- Heterogeneity Analysis: Subgroup analysis by MSA characteristics