Predictive Maintenance Using Machine Learning

This project focuses on developing machine learning models to predict machine faults using sensor data from an industrial vibration sensor. The goal is to implement and evaluate various machine learning algorithms to accurately predict maintenance needs, thereby preventing machine failures.

Setup and Environment

Platform: Databricks Machine Learning
Cluster: Created with the latest Databricks runtime
Dataset: faultDataset.csv

Data Exploration and Cleaning

Importing Dataset:
- Uploaded faultDataset.csv to the Databricks file system.
- Imported dataset into RDD and DataFrame formats using sc.textFile and spark.read.csv.
Initial Data Cleaning:
- Removed header rows using mapPartitionsWithIndex.
- Defined schema for the DataFrame and converted RDD to DataFrame.
- Created temporary views for SQL analysis using createOrReplaceTempView.
Data Exploration:
- Analyzed the dataset to evaluate minimum, average, and maximum values for each state of the fault_detected column.
- Identified any missing or inconsistent data.

Data Preprocessing

Data Transformation:
- Used RFormula to transform data into a format suitable for machine learning.
- The target variable (fault_detected) was assigned to the label column, and other features were included in the features column.
Data Splitting:
- Split data into training (70%) and test (30%) sets using randomSplit.

Machine Learning Models Developed

Decision Tree Classification Model

Training:
- Trained the Decision Tree model using the training dataset.
- Evaluated the model using MulticlassClassificationEvaluator to measure accuracy.
Results:
- Achieved 95.55% accuracy with default hyperparameters.
- Improved accuracy to 95.56% through hyperparameter tuning using grid search.

Other Models

Linear SVC:
- Trained and evaluated using the same process as the Decision Tree model.
- Achieved an accuracy of 80.65%.
Logistic Regression:
- Trained and evaluated, achieving competitive accuracy.
Random Forest:
- Trained and evaluated, with a higher accuracy than Linear SVC and Logistic Regression.
Gradient-Boosted Tree:
- Achieved the highest accuracy of 99.67%.
- Performed hyperparameter tuning for further optimization.

Hyperparameter Tuning

Grid Search:
- Specified grids of values for parameters such as maxDepth, maxBins, impurity, and minInstancesPerNode.
- Used TrainValidationSplit to find the optimal set of hyperparameters for each model.
Results:
- Gradient-Boosted Tree model maintained its high accuracy of 99.67% even after tuning.

Data Privacy, Ethical, and Legal Issues

Ensured that datasets used were publicly available and licensed for educational and research purposes.
Adhered to data privacy and ethical guidelines throughout the project.

Conclusion

This project demonstrates the development and evaluation of multiple machine learning models to predict machine maintenance needs based on sensor data. The Gradient-Boosted Tree model achieved the highest accuracy, showcasing its effectiveness for this task. The project highlights the importance of data preprocessing, model selection, and hyperparameter tuning in achieving high-performance machine learning models.

Contact Information:

Author: [Dagogo Orifama]
Email: [[email protected]]

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Predictive Maintenance Databricks.ipynb		Predictive Maintenance Databricks.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predictive Maintenance Using Machine Learning

Setup and Environment

Data Exploration and Cleaning

Data Preprocessing

Machine Learning Models Developed

Decision Tree Classification Model

Other Models

Hyperparameter Tuning

Data Privacy, Ethical, and Legal Issues

Conclusion

About

Releases

Packages

Languages

DagogoOrifama/Predictive-Maintenance-Using-Machine-Learning-on-Databricks

Folders and files

Latest commit

History

Repository files navigation

Predictive Maintenance Using Machine Learning

Setup and Environment

Data Exploration and Cleaning

Data Preprocessing

Machine Learning Models Developed

Decision Tree Classification Model

Other Models

Hyperparameter Tuning

Data Privacy, Ethical, and Legal Issues

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages