Skip to content

Latest commit

 

History

History
31 lines (26 loc) · 1.85 KB

README.md

File metadata and controls

31 lines (26 loc) · 1.85 KB

Data Analysis

This project works through data analysis concepts and techniques from a course on www.suanlab.com. The materials cover core data analysis skills that apply to any field or industry.

Course Chapters

  1. Exploratory Data Analysis - Initial investigation into datasets. Counting values, finding distributions, spotting outliers, etc.
  2. Data Preprocessing - Formatting, cleaning and restructuring data before analysis. 3.Data Cleaning - Identifying and fixing issues like missing values, duplicates, inconsistent formatting, etc.
  3. Data Integration - Combining two or more datasets together. Requires ensuring the data can be merged, handling overlap, etc.
  4. Data Reduction - Using summarization, aggregation, dimension reduction, etc. to make datasets smaller and more manageable.
  5. Data Transformation - Changing the form, structure or scale of data to suit your needs. Log transforms, scaling, binning, etc.
  6. Feature Engineering - Creating new features from raw data to use in models.
  7. OpenRefine - Tool for data cleaning and transformation.
  8. NumPy - Library for scientific computing in Python. Used for statistics, matrices, and more.
  9. Pandas - Library for data analysis and manipulation in Python. Built on NumPy.
  10. Exploring and Visualization of Titanic Dataset - Putting the skills into action on the Titanic Kaggle dataset.

Code and Tools

The course uses: Colab Python NumPy Pandas Matplotlib, Seaborn OpenRefine

The projects analyze and visualize various public datasets to demonstrate concepts and techniques.

I found this course very helpful for building a basic foundation in qualitative and quantitative data analysis using Python. Please let me know if you have any feedback or suggestions for improving my data analysis skills!

Acknowledgments

Please visit the original site (www.suanlab.com) for more in-depth tutorials and resources.