This README file provides an overview of the end-of-semester project in Big Data conducted by Aziz Zina. The project focuses on leveraging big data techniques to analyze and visualize insights from a real survey dataset related to individuals working in the domain of computer science and data science.
-
Data Acquisition The project began with the acquisition of an Excel file containing survey data. This survey encompassed various questions aimed at gathering information about professionals in the computer science and data science fields. The dataset served as the foundation for subsequent analysis and visualization.
-
Data Cleaning To ensure the accuracy and reliability of the analysis, a comprehensive data cleaning process was undertaken. This step involved handling missing values, addressing outliers, and standardizing data formats. The cleaned dataset formed the basis for the subsequent stages of the project.
-
Data Analysis and Visualization
-
3.1 Power BI Visualizations Power BI was employed to create interactive and insightful visualizations based on the cleaned dataset. Various charts, graphs, and dashboards were generated to represent key trends, patterns, and relationships within the data. Power BI's capabilities were harnessed to provide a user-friendly and dynamic interface for exploring the survey insights.
-
3.2 Python Visualizations In addition to Power BI, Python was utilized to perform further data analysis and generate visualizations. Python libraries such as Pandas, Matplotlib, and Seaborn were employed to create additional plots and charts, enhancing the depth of analysis and providing a diverse set of visual representations.
-
-
Results and Findings The project culminated in the identification of significant findings and insights derived from the analysis of the survey data. These findings were presented through a combination of Power BI dashboards and Python-generated visualizations, providing a comprehensive understanding of the surveyed population.
The project repository includes the following files:
Data Professionals Survey.xlsx: The original Excel file containing the raw survey data. pfs.pbix: Power BI file containing interactive visualizations and Python code for additional visualizations. Data Visualization.odp: Power Point File that contains a small presentation of the project. README.md: This documentation file.
For a detailed walkthrough and explanation of the project, please refer to the accompanying YouTube video: Project Demo Video
Ensure you have the necessary tools installed, including Power BI and a Python environment with required libraries. Open the Power BI file (pfs.pbix) to explore interactive visualizations.
Special thanks to Mr. Riadh Ghlala for guidance and support throughout the duration of the project.
Feel free to reach out for any further clarifications or inquiries.
Happy exploring and analyzing!