Machine Learning Analysis on Housing Prices and Iris Dataset This repository contains Python notebooks and datasets for machine learning analysis, focusing on housing prices and Iris dataset classification. The project also includes a Principal Component Analysis (PCA) for dimensionality reduction and feature analysis.
Repository Structure
-
Datasets This folder includes two key datasets: Housing.csv: Dataset containing various features related to housing prices, such as area, number of rooms, and other property attributes. It is used to predict housing prices. IRIS.csv: The classic Iris dataset containing features of iris flowers (sepal length, petal length, etc.) and their corresponding species.
-
Notebooks This folder contains Jupyter notebooks for analysis and model building: HousingPrices.ipynb: A Jupyter notebook that explores the housing dataset, performs data preprocessing, and builds a model to predict housing prices based on the given features. IRIS.ipynb: A notebook that analyzes the Iris dataset, applies classification algorithms, and evaluates the accuracy of predicting Iris flower species. PCA_Analysis.ipynb: A notebook focused on Principal Component Analysis (PCA), which reduces the dimensionality of datasets to make analysis more efficient. The PCA is applied to understand the most important features contributing to variability in data, either on the Iris dataset or other data.
How to Run the Notebooks Install Dependencies: To run the notebooks, ensure you have the required libraries installed. You can install the necessary libraries with the following command:
pip install pandas numpy matplotlib scikit-learn seaborn Run Jupyter Notebook: Use the following command to start Jupyter Notebook and open any of the .ipynb files:
jupyter notebook You can then navigate to the notebook of interest and explore the code and results.
Files Overview Housing.csv: Dataset for housing price prediction. HousingPrices.ipynb: Analysis and model building on the Housing dataset. IRIS.csv: Dataset for Iris species classification. IRIS.ipynb: Classification model built on the Iris dataset. PCA_Analysis.ipynb: Principal Component Analysis on a dataset, highlighting feature reduction. Dependencies
The following Python libraries are required: pandas numpy matplotlib seaborn scikit-learn
To install these, you can run: pip install pandas numpy matplotlib seaborn scikit-learn
Usage HousingPrices.ipynb: Use this notebook to predict housing prices based on features such as area and number of rooms. The notebook covers data cleaning, visualization, and model training. IRIS.ipynb: This notebook is a comprehensive exploration of the Iris dataset, where you can learn how different machine learning algorithms perform on this classic dataset. PCA_Analysis.ipynb: This notebook introduces Principal Component Analysis (PCA) and demonstrates its application for dimensionality reduction. It is useful for improving model efficiency and understanding the most significant features.