Analyze historical stock sector data spanning from Jan 2011 to Jun 2011 and unlock investment insights, predicting which sector (primary, secondary and tertiary) represents high risk in terms of investment and the other that will likely to provide greatest rate of return.
Stock Sector Analysis is a data-driven project that provides in-depth analysis of stock behavior patterns and investment risks across various sectors. It leverages data analysis techniques, statistical assessments, and clustering algorithms to enhance understanding of the stock market.
-
Obtaining Dataset: https://code.datasciencedojo.com/datasciencedojo/datasets/tree/master/Dow%20Jones%20Index
-
Data Exploration: Explored by preprocessing and visualizing data from multiple sectors.
-
Agglomerative Clustering: Utilized Agglomerative Clustering to create sector-specific stock performance clusters.
-
Statistical Analysis: Conducted ANOVA and post-hoc tests to reveal significant differences in stock behavior patterns.
-
Risk Assessment: Identified sector-specific risk profiles, enabling strategic investment decisions.
-
Tools Used: Employed Python, Pandas, Scikit-Learn, Seaborn, and Plotly for data analysis and visualization.
-
Efficiency Enhancement: Reduced data processing time by 81.25% in just two days, optimized file size by 55%, and eliminated errors by 78.6%.
-
Alternative Modeling: Demonstrated an alternative analytic approach with an average model fit of -42%, highlighting limitations in traditional regression models.
- Enhanced risk assessment with significant variations in stock volatility.
- Provided sector-specific investment opportunities and risk profiles.
- Valuable insights from statistical assessments and clustering methods.
- Created sector-specific clusters for in-depth stock behavior understanding.
- Clone this repository.
- Explore the project through Jupyter notebooks in the
notebooksdirectory.
For questions or collaboration, please contact hafiyhp at [email protected].
This data set has been sourced from the Machine Learning Repository of University of California, Irvine Dow Jones Index Data Set (UC Irvine). The UCI page mentions the following publication as the original source of the data set:
Brown, M. S., Pelosi, M. & Dirska, H. (2013). Dynamic-radius Species-conserving Genetic Algorithm for the Financial Forecasting of Dow Jones Index Stocks. Machine Learning and Data Mining in Pattern Recognition, 7988, 27-41
This project is open-source and available under the MIT License.