This project performs quantitative analysis of Indian equities by applying Principal Component Analysis (PCA) and KMeans Clustering on historical return data of Nifty 50 stocks. It helps in grouping similar stocks based on their return patterns and volatilities, providing a solid foundation for portfolio diversification, risk management, and factor analysis.
To cluster Nifty 50 stocks based on their return structure and volatility characteristics using:
-
Log Return Calculation
-
Dimensionality Reduction (PCA)
-
KMeans Clustering
-
Volatility and Correlation Analysis
-
Python
-
Streamlit for interactive UI
-
Yahoo Finance API (yfinance) for data fetching
-
scikit-learn for PCA and KMeans
-
matplotlib & seaborn for visualizations
-
pandas & numpy for data manipulation
-
Interactive selection of stock universe and cluster count
-
Visualize 2D and 3D PCA-based clusters
-
View cluster members and their statistical summary
-
Volatility distribution analysis per cluster
-
Time-series and correlation analysis between clusters
Visualizes how stocks group together based on return patterns. Clearly distinguishable clusters (e.g., tech stocks, PSU banks, Adani group).
Offers deeper insight into spatial separation of stock clusters. Shows outlier behavior (e.g., INDUSINDBK.NS as a separate cluster).
Tracks average return of each cluster over time. Helps identify consistent outperformers or laggards.
Boxplots reveal how volatility varies across clusters. Clusters like Adani stocks have higher volatility, whereas FMCG stocks have lower.
Cluster 1: HCLTECH.NS, INFY.NS, TCS.NS, TECHM.NS, WIPRO.NS
Cluster 2: COALINDIA.NS, JIOFIN.NS, NTPC.NS, POWERGRID.NS, SBIN.NS
Cluster 3: APOLLOHOSP.NS, BHARTIARTL.NS, CIPLA.NS, DRREDDY.NS, HDFCLIFE.NS, HEROMOTOCO.NS, SBILIFE.NS, SUNPHARMA.NS, TITAN.NS
Cluster 4: INDUSINDBK.NS
Cluster 5: ADANIENT.NS, ADANIPORTS.NS
Cluster 6: AXISBANK.NS, BAJAJFINSV.NS, BAJFINANCE.NS, GRASIM.NS, HDFCBANK.NS, ICICIBANK.NS, KOTAKBANK.NS, M&M.NS, MARUTI.NS, RELIANCE.NS, ULTRACEMCO.NS
Cluster 7: JSWSTEEL.NS, LT.NS, TATAMOTORS.NS, TATASTEEL.NS
Cluster 8: BEL.NS, ONGC.NS
Cluster 9: ASIANPAINT.NS, HINDUNILVR.NS, ITC.NS, NESTLEIND.NS, TATACONSUM.NS
Cluster | Mean Return | Volatility |
---|---|---|
0 | 0.000534 | 0.008012 |
1 | 0.000969 | 0.012181 |
2 | 0.000707 | 0.011313 |
3 | -0.001145 | NaN |
4 | 0.000451 | 0.007574 |
5 | 0.000719 | 0.010802 |
6 | 0.000442 | 0.010282 |
7 | 0.001612 | 0.011192 |
8 | 0.000041 | 0.008627 |
-
Cluster 7 (Auto/Metals) shows the highest average return.
-
Cluster 5 (Adani Group) has moderate returns but higher volatility.
-
Cluster 9 (FMCG) offers low volatility, stable returns — ideal for defensive positioning.
-
Cluster 4 (INDUSINDBK.NS) behaves as an outlier and forms a singleton cluster.
-
Low correlation between defensive (FMCG) and volatile clusters (Adani/Auto) supports diversification strategies.
-
High correlation within banking and infrastructure clusters (e.g., PSU stocks) is expected.
- Fetch historical prices using yfinance
- Calculate log returns
- Apply PCA to reduce dimensions to top 3 components
- Perform KMeans clustering on PCA results
- Visualize clusters in 2D and 3D
- Analyze cluster volatility, correlation, and time-series returns
-
Quantitative Portfolio Construction
-
Risk-Parity Strategies
-
Market Regime Segmentation
-
Sector Rotation Analysis
-
Factor Clustering & Feature Engineering
-
Portfolio Diversification: Construct cluster-neutral baskets
pip install streamlit yfinance pandas numpy scikit-learn matplotlib seaborn
streamlit run main.py
You can also try this program on my Streamlit App -> stock Clustering using PCA and KMeans
- Fork the repository
- Create a feature branch
- Commit your changes
- Push and submit a PR