GitHub - samyaroy/Predicting-Wine-Quality-Robust-Regression: This‬‭ project‬‭ was carried out as part of fulfilment of the B.Sc. (Hons.) Statistics degree at Sister Nivedita University which explores‬‭ the‬‭ application‬‭ of‬‭ various‬‭ linear‬‭ regression‬‭ techniques‬‭ for‬‭ predicting‬ ‭ wine‬‭ quality

Predicting Wine Quality: A Comparison of Linear Regression Techniques on a Multicollinear and Outlier-Affected Dataset

Abstract

This project explores the application of various linear regression techniques for predictingwine quality, which has a very important role, particularly in the wine industry for shaping upthe preferences of consumers, influencing the pricing related strategies and guiding decisionmaking in production, using a dataset characterised by multicollinearity and a significantpresence of outliers. The dataset comprises 32 wine samples, each evaluated across 10physicochemical attributes - including pH, total sulphur dioxide, anthocyanin concentrationand colour density - with quality scores assigned on a 0-20 scale. An extensive ExploratoryData Analysis (EDA) revealed distributional skewness, strong inter-variable correlations, anda high incidence of outliers. The initial application of Ordinary Least Squares (OLS)regression exposed limitations due to multicollinearity, as confirmed through elevatedVariance Inflation Factors (VIFs). Stepwise regression (both-way) improved modelparsimony, but heteroscedasticity and sensitivity to outliers persisted. To address these, robustregression approaches were adopted, including Huber’s M-estimator and the MM-estimator.Comparative analysis using metrics such as Adjusted R² and Root Mean Square Error(RMSE) demonstrated the MM-estimator’s superior resilience to data irregularities.Ultimately, the MM-estimator emerged as the most reliable and interpretable model, offeringa robust framework for data-driven wine quality assessment and decision-making inviticulture. This project thus underscores the importance of robust techniques in real-worlddata environments and presents a generalisable modelling framework to support objectivewine quality assessment in the wine industry.

Keywords

Wine quality rating; Exploratory Data Analysis; Ordinary Least Squares; Stepwise Regression; Robust Regression

Repository Structure

analysis.ipynb → Clean notebook aligned with the project report
explorations.ipynb → Additional experiments, alternative methods, and sampling tests
dataset/data.csv → Dataset used in the study
dataset/data_description.md → Description of dataset variables
project_report.pdf → Full project report (sample paper)
project_presentation.pdf → Summary presentation slides
requirements.txt → R package dependencies
README.md → This file

Conclusion

In this project, we explored the challenge of predicting wine quality using a small,multicollinear, and outlier-affected dataset. Starting with Ordinary Least Squares (OLS)regression as a baseline, we uncovered limitations stemming from multicollinearity andsensitivity to outliers, which compromised the stability and interpretability of the model.Through stepwise selection, we improved model parsimony, but residual diagnostics revealedpersistent issues such as heteroscedasticity.To overcome these challenges, we turned to robust regression techniques, particularlyHuber’s M-estimator and the MM-estimator. Among the approaches compared, theMM-estimator emerged as the most effective method, achieving the best trade-off betweenpredictive accuracy and resistance to data irregularities as demonstrated by the lowest RMSEand a relatively high Adjusted R². This robust method proved especially valuable in handlingsmall-sample data with violations of key linear regression assumptions.The findings underscore the importance of choosing adaptive and assumption-resilientmodels in practical data science applications. Particularly in domains like oenology, wherequality assessment can benefit from more objective, data-driven methods, such approachesoffer a reliable framework for supporting decision-making and quality control.While the dataset size and lack of metadata presented certain limitations, the modellingpipeline developed in this study can serve as a generalisable blueprint for future studies inwine quality assessment or similar regression problems involving complex real-world data.Further research could explore advanced regularisation techniques like Robust RidgeRegression, expand the dataset for improved statistical power and improve generalisabilitywithout sacrificing interpretability.

Citation

If you use this project, please cite:

Roy, S. (2025). Predicting Wine Quality: A Comparison of Linear Regression Techniques on a Multicollinear and Outlier-Affected Dataset. ResearchGate. DOI: 10.13140/RG.2.2.16657.13926

License

This repository is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
You are free to use, modify, and distribute this work for academic, research, or commercial purposes, with proper attribution.
See the LICENSE file for details. Read more here.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
dataset		dataset
LICENSE.txt		LICENSE.txt
README.md		README.md
analysis.ipynb		analysis.ipynb
explorations.ipynb		explorations.ipynb
projectPresentation.pdf		projectPresentation.pdf
projectReportUnsigned.pdf		projectReportUnsigned.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Predicting Wine Quality: A Comparison of Linear Regression Techniques on a Multicollinear and Outlier-Affected Dataset

Abstract

Keywords

Repository Structure

Conclusion

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

License

Uh oh!

samyaroy/Predicting-Wine-Quality-Robust-Regression

Folders and files

Latest commit

History

Repository files navigation

Predicting Wine Quality: A Comparison of Linear Regression Techniques on a Multicollinear and Outlier-Affected Dataset

Abstract

Keywords

Repository Structure

Conclusion

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages