Understanding how global crises shape financial markets is crucial for investors, policymakers, and financial analysts. This project analyzes 23 years of S&P 500 data (2000-2024) to uncover how major historical events impact different market sectors and to predict future market behavior using advanced statistical and machine learning techniques.
Why this matters:
- ๐ For Investors: Identify sectors that demonstrate resilience during crises and optimize portfolio diversification strategies
- ๐๏ธ For Policymakers: Understand differential recovery patterns across industries to design targeted economic interventions
- ๐ฎ For Analysts: Leverage predictive models to forecast market responses to future global disruptions
- ๐ For Researchers: Access reproducible analysis combining classical econometrics (ARIMA) with modern deep learning (LSTM)
-
Which sectors drive the S&P 500 index?
- Using ANOVA and linear regression to identify primary market drivers
-
How do different industries respond to major historical events?
- Statistical comparison of sector performance before/after crises
-
Can we forecast the S&P 500 index effectively?
- Comparative evaluation of ARIMA vs. LSTM models
Financial crisis triggered by subprime mortgage collapse
- Impact: Financials, Energy, and Real Estate sectors devastated
- Insight: Banking sector took 3+ years to recover
Global health crisis causing unprecedented economic disruption
- Impact: Technology sector surged; Real Estate and Utilities declined
- Insight: Digital transformation accelerated by 5-7 years
Geopolitical conflict affecting global supply chains
- Impact: Energy and Industrials gained; Technology remained resilient
- Insight: Energy sector benefited from supply disruptions and pricing power
| Crisis | Most Resilient | Most Vulnerable |
|---|---|---|
| Great Recession | Health Care, Consumer Staples | Financials, Real Estate, Energy |
| COVID-19 | Information Technology, Energy | Real Estate, Utilities |
| Russia-Ukraine | Information Technology, Energy | Real Estate |
Cross-Event Pattern: Real Estate consistently underperformed across all three crises, while Information Technology demonstrated exceptional resilience.
Our analysis reveals that LSTM neural networks outperform traditional ARIMA models in forecasting accuracy:
- ARIMA: Fast computation (~minutes), moderate accuracy
- LSTM: High accuracy, captures complex patterns, requires more computational resources (~hours)
Practical Implication: For short-term trading decisions requiring rapid updates, ARIMA suffices. For strategic portfolio management, LSTM's superior accuracy justifies the computational cost.
- โ All sector impacts on closing prices are highly significant (p < 2.2e-16)
- โ T-tests confirm statistically significant changes in sector performance post-crisis (p < 0.05)
- โ Information Technology and Consumer Discretionary are primary index drivers
Statistical Analysis: R (โฅ 4.0) with ANOVA, t-tests, time series analysis
Machine Learning: LSTM neural networks (Keras/TensorFlow)
Data: 500+ companies, 11 GICS sectors, 6,000+ trading days
Reproducibility: R Markdown notebooks with full documentation
๐ See INSTALLATION.md for setup instructions
# Clone the repository
git clone https://github.com/DATS6101-TeamNeo/final-project.git
cd final-project
# Open analysis/Main.Rmd in RStudio and knit to generate the full reportFor detailed installation and usage instructions, see INSTALLATION.md
โโโ analysis/ # R Markdown analysis notebooks
โ โโโ Main.Rmd # ๐ฏ Complete analysis (START HERE)
โ โโโ Summary.Rmd # Executive summary
โ โโโ EDA.Rmd # Exploratory data analysis
โ โโโ ARIMA_baseline.Rmd # ARIMA forecasting model
โโโ reports/ # Generated HTML reports
โ โโโ Main.html
โ โโโ Summary.html
โ โโโ EDA.html
โ โโโ ARIMA_baseline.html
โโโ data/
โ โโโ raw/ # Historical datasets
โ โ โโโ sp500_companies.csv
โ โ โโโ sp500_index.csv
โ โโโ scripts/ # Data collection scripts
โ โโโ Generate_SandP500.Rmd
โโโ models/
โ โโโ checkpoints.h5 # Pre-trained LSTM model
โโโ figures/ # Generated plots and images
โโโ docs/
โ โโโ INSTALLATION.md # Setup guide
โ โโโ Final-Project-Proposal.pdf
โโโ README.md
โโโ LICENSE
- Diversification Strategy: Allocate higher weights to Health Care and Technology during periods of high economic uncertainty
- Crisis Hedging: Reduce exposure to Real Estate and traditional Energy before anticipated market disruptions
- Stress Testing: Use sector-specific volatility patterns from historical crises to model portfolio risk
- Recovery Timelines: Plan liquidity based on observed sector recovery periods (Financials: 3+ years; Technology: <1 year)
- Targeted Stimulus: Prioritize support for Real Estate and Financials during financial crises; support travel, hospitality, and retail during pandemic-type events
- Industry Monitoring: Focus regulatory attention on sectors showing abnormal volatility patterns
Course: DATS 6101 - Introduction to Data Science
Institution: The George Washington University
Team: Phanindra Kumar Kalaga, Prudhvi Chekuri, Bharat Khandelwal, Dinesh Chandra Gaddam
This project demonstrates:
- Integration of classical statistics with modern machine learning
- Reproducible research practices using R Markdown
- Real-world application of data science to financial markets
- Rigorous hypothesis testing and model validation
| Name | |
|---|---|
| Phanindra Kumar Kalaga | [email protected] |
| Prudhvi Chekuri | [email protected] |
| Bharat Khandelwal | [email protected] |
| Dinesh Chandra Gaddam | [email protected] |
- ๐ฅ INSTALLATION.md - Setup and installation guide
- ๐ Main.html - Complete analysis report (knit analysis/Main.Rmd to generate)
- ๐ Final-Project-Proposal.pdf - Original project proposal
- GitHub Repository: DATS6101-TeamNeo/final-project
- Dataset Source: S&P 500 GICS on Kaggle
- Python Implementation: Kaggle Notebook
MIT License - see LICENSE file for details
Copyright (c) 2024 Team Neo
- Professor and TAs of DATS 6101 at The George Washington University
- Yahoo Finance for comprehensive historical data
- R and TensorFlow communities for excellent open-source tools
โญ If this research helped your work, please star this repository!
Last Updated: November 2024