ICS 434:

1_Introduction.ipynb

This notebook introduces the basics of data science, its origins, and its key components. Data science is an area that brings together methods from statistics and computer science to handle, analyze, and get insights from data. The notebook covers the essential parts of data science: collecting data, preparing and cleaning it, exploring it to spot trends, building models to predict future trends, and visualizing data in clear and informative ways.

2_Intro_to_pandas_Python_package.ipynb

This notebook provides an overview of packages and modules in Python. It explains how packages are structured directories containing Python modules, which are individual Python files, and details how they can be imported and used to organize and reuse code efficiently in Python programming.

3_intro_to_pandas.ipynb

This notebook introduces Pandas, the leading library for data wrangling. Specifically, the notebook introduces two pivotal data structures essential for data wrangling (Series and DataFrames), and provides an in-depth exploration of indexing techniques for efficient data handling.

4_exploratory_data_analysis.ipynb

This notebook provides a comprehensive introduction to exploratory data analysis using Pandas. We start by exploring general dataset attributes, such as the number of rows and columns, and understanding column data types. The notebook then delves into methods for invoking descriptive statistics operations, such as calculating the mean and median, and describes the concept of axis in Pandas operations. The notebook also describes how missing values are handled and provides insights into sorting data and concludes with practical examples of using basic Pandas plots for data visualization.

5_arithmetic_ops_and_data_alignment.ipynb

This notebook provides a thorough overview of vectorization in Pandas and demonstrates the efficiency of vectorized operations over traditional loops, the concept of broadcasting in array manipulation, and how to apply arithmetic and comparison operations effectively in Pandas. Additionally, the notebook covers data querying and subsetting, highlighting the ease and speed of handling large datasets with these techniques.

6_0_summary_statistics.ipynb

This notebook offers a concise overview of summary statistics, essential for data analysis. It covers key concepts like central tendency measures (mean, median, mode). The notebook also discusses measures of variability (range, variance, standard deviation) and quantiles (quartiles, percentiles) and highlights their role in describing data distribution.

12_intro_probability.ipynb

This Jupyter Notebook serves as an introduction to basic probability concepts and terminology. I also introduces a simulation technique to illustrate the the long-term frequency of events by exploring a simple problem.

13_probability_distributions_binomial.ipynb

This Jupyter Notebook introduces the binomial probability distribution, providing a comprehensive exploration through practical examples.

14_probability_distributions_gaussian.ipynb

This Jupyter Notebook introduces the Guassian probability distribution, providing a comprehensive exploration through practical examples.

15_kernel_density_estimation.ipynb

This Jupyter Notebook introduces kernel density estimation, starting with an overview of histograms, their limitations, and moves on to the concept and application of kernel density estimation as a more effective method for estimating the probability density function of a random variable.

16_KDE_bandwidth.ipynb

This Jupyter Notebook focuses on the estimation of bandwidth in kernel density estimation, detailing the methodologies and considerations involved in selecting an optimal bandwidth to accurately approximate the probability density function of a dataset.

17_probability_distributions_poisson.ipynb

This Jupyter Notebook introduces the Poisson probability distribution, providing a comprehensive exploration through practical examples.

18_param_estimation_bootstrap.ipynb

This Jupyter Notebook covers parameter estimation with a focus on Bootstrap Confidence Intervals, explaining the process and techniques for estimating confidence intervals using the bootstrap method.

19_param_esitmation_maximum_likelihood.ipynb

This Jupyter Notebook presents parameter estimation through maximum likelihood (ML). It privides a practical understanding of Likelihood, and delves into the concept and significance of Log Likelihood in optimizing parameter estimates.

9_group_by.ipynb

This Jupyter Notebook explores the groupby method, focusing on the split-apply-combine strategy for data aggregation, transformation, filtering, and thinning within groups. It offers a concise examination of how to efficiently manage and analyze grouped data in Python.

10_hierarchical_indexes.ipynb

This Jupyter Notebook introduces Hierarchical Indexing, expanding upon its mention in our groupby discussions. It details how to implement multiple indexes on rows and/or columns. The concept of levels within a MultiIndex object is also explored, providing a foundational understanding of structured data manipulation and analysis.

21_hypothesis_testing_normal.ipynb

This Jupyter Notebook introduces the concept of multiple testing using bootstrap methods. It guides you through building a background distribution via bootstrapping—sampling repeatedly with replacement—to estimate variability. We then compare actual data against this distribution to discern statistically significant results from those that could occur by chance.

22_hypothesis_testing_multi_categories.ipynb

This Jupyter Notebook explores the technique of comparing proportions using bootstrap methods. It demonstrates how to create a simulated distribution of sample proportions through repeated bootstrapping, then compares these proportions to actual data to determine if observed differences are statistically significant or likely due to random variation.

25_correlation.ipynb

This Jupyter Notebook introduces the concept of correlation analysis. It explains how to calculate and interpret correlation coefficients, helping you understand the strength and direction of relationships between two variables. It also explains how to interpret the R-square statistic, which is common through out machine learning models.

26_linear_regression.ipynb

This Jupyter Notebook introduces the basics of linear regression. It walks you through the steps of fitting a linear model to data, helping you understand how to predict one variable based on another. The notebook includes simple, practical examples

27_non_linear_regression.ipynb

This Jupyter Notebook delves into non-linear regression, tailored for beginners in data science. It explains how to model relationships between variables that don't follow a straight line, using more complex functions. The notebook provides examples to illustrate the fitting of non-linear models to data, helping you grasp the basics of this important statistical technique.

28_time_series_regression_based.ipynb

This Jupyter Notebook introduces the fundamentals of time series regression. It guides you through identifying trends (linear or non-linear) and seasonal patterns in time series data, and then models these characteristics to make forecasts. The notebook offers step-by-step examples to clearly demonstrate how to analyze and model time-related data effectively.

29_exponential_smoothing.ipynb

This Jupyter Notebook explores exponential smoothing techniques, including single, double, and triple smoothing methods. It teaches how to apply these techniques to forecast data, adjusting for level, trend, and seasonality. The notebook provides straightforward examples to help you understand and implement exponential smoothing.

30_clustering.ipynb

This Jupyter Notebook introduces clustering techniques, focusing on k-means and hierarchical clustering. It explains how to group data based on similarities using simple Euclidean or non-Euclidean distances. The notebook includes practical examples to demonstrate both methods and introduces the silhouette coefficient to evaluate the quality of the clustering.

31_mixture_models.ipynb

This Jupyter Notebook explores the use of mixture models for clustering. It focuses on implementing the Expectation-Maximization (EM) algorithm to classify data into two clusters and discusses methods for extending this approach to more than two clusters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ICS 434:

1_Introduction.ipynb

2_Intro_to_pandas_Python_package.ipynb

3_intro_to_pandas.ipynb

4_exploratory_data_analysis.ipynb

5_arithmetic_ops_and_data_alignment.ipynb

6_0_summary_statistics.ipynb

12_intro_probability.ipynb

13_probability_distributions_binomial.ipynb

14_probability_distributions_gaussian.ipynb

15_kernel_density_estimation.ipynb

16_KDE_bandwidth.ipynb

17_probability_distributions_poisson.ipynb

18_param_estimation_bootstrap.ipynb

19_param_esitmation_maximum_likelihood.ipynb

9_group_by.ipynb

10_hierarchical_indexes.ipynb

21_hypothesis_testing_normal.ipynb

22_hypothesis_testing_multi_categories.ipynb

25_correlation.ipynb

26_linear_regression.ipynb

27_non_linear_regression.ipynb

28_time_series_regression_based.ipynb

29_exponential_smoothing.ipynb

30_clustering.ipynb

31_mixture_models.ipynb

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.ipynb_checkpoints		.ipynb_checkpoints
assignments/assignment_1		assignments/assignment_1
data		data
images		images
quizzes		quizzes
.DS_Store		.DS_Store
.gitignore		.gitignore
0_Syllabus.ipynb		0_Syllabus.ipynb
10_hierarchical_indexes.ipynb		10_hierarchical_indexes.ipynb
11_joining_data.ipynb		11_joining_data.ipynb
12_intro_probability.ipynb		12_intro_probability.ipynb
13_probability_distributions_binomial.ipynb		13_probability_distributions_binomial.ipynb
14_probability_distributions_gaussian.ipynb		14_probability_distributions_gaussian.ipynb
15_kernel_density_estimation.ipynb		15_kernel_density_estimation.ipynb
16_KDE_bandwidth.ipynb		16_KDE_bandwidth.ipynb
17_probability_distributions_poisson.ipynb		17_probability_distributions_poisson.ipynb
18_param_estimation_bootstrap.ipynb		18_param_estimation_bootstrap.ipynb
19_param_esitmation_maximum_likelihood.ipynb		19_param_esitmation_maximum_likelihood.ipynb
1_Introduction.ipynb		1_Introduction.ipynb
20_approximate_Bayesian_for_estimation.ipynb		20_approximate_Bayesian_for_estimation.ipynb
21_hypothesis_testing_normal.ipynb		21_hypothesis_testing_normal.ipynb
22_hypothesis_testing_multi_categories.ipynb		22_hypothesis_testing_multi_categories.ipynb
23_hypothesis_testing_common_tests.ipynb		23_hypothesis_testing_common_tests.ipynb
24_effect_size.ipynb		24_effect_size.ipynb
25_correlation.ipynb		25_correlation.ipynb
26_linear_regression.ipynb		26_linear_regression.ipynb
26_linear_regression_terminology.ipynb		26_linear_regression_terminology.ipynb
27_non_linear_regression.ipynb		27_non_linear_regression.ipynb
28_time_series_regression_based.ipynb		28_time_series_regression_based.ipynb
29_exponential_smoothing.ipynb		29_exponential_smoothing.ipynb
2_Intro_to_pandas_Python_package.ipynb		2_Intro_to_pandas_Python_package.ipynb
30_clustering.ipynb		30_clustering.ipynb
31_mixture_models.ipynb		31_mixture_models.ipynb
32_introduction_to_ML-full.ipynb		32_introduction_to_ML-full.ipynb
32_introduction_to_ML.ipynb		32_introduction_to_ML.ipynb
33_logistic_regression-full.ipynb		33_logistic_regression-full.ipynb
34_cross_validation.ipynb		34_cross_validation.ipynb
35_evaluating_classification.ipynb		35_evaluating_classification.ipynb
3_intro_to_pandas.ipynb		3_intro_to_pandas.ipynb
4_exploratory_data_analysis.ipynb		4_exploratory_data_analysis.ipynb
5_arithmetic_ops_and_data_alignment.ipynb		5_arithmetic_ops_and_data_alignment.ipynb
6_0_summary_statistics.ipynb		6_0_summary_statistics.ipynb
6_data_vis_1.ipynb		6_data_vis_1.ipynb
7_data_vis_2.ipynb		7_data_vis_2.ipynb
8_data_preparation_and_cleaning.ipynb		8_data_preparation_and_cleaning.ipynb
9_group_by.ipynb		9_group_by.ipynb
Python_for_data_science_best_practices.ipynb		Python_for_data_science_best_practices.ipynb
README.md		README.md
Untitled.ipynb		Untitled.ipynb

mahdi-b/ics434-s24

Folders and files

Latest commit

History

Repository files navigation

ICS 434:

1_Introduction.ipynb

2_Intro_to_pandas_Python_package.ipynb

3_intro_to_pandas.ipynb

4_exploratory_data_analysis.ipynb

5_arithmetic_ops_and_data_alignment.ipynb

6_0_summary_statistics.ipynb

12_intro_probability.ipynb

13_probability_distributions_binomial.ipynb

14_probability_distributions_gaussian.ipynb

15_kernel_density_estimation.ipynb

16_KDE_bandwidth.ipynb

17_probability_distributions_poisson.ipynb

18_param_estimation_bootstrap.ipynb

19_param_esitmation_maximum_likelihood.ipynb

9_group_by.ipynb

10_hierarchical_indexes.ipynb

21_hypothesis_testing_normal.ipynb

22_hypothesis_testing_multi_categories.ipynb

25_correlation.ipynb

26_linear_regression.ipynb

27_non_linear_regression.ipynb

28_time_series_regression_based.ipynb

29_exponential_smoothing.ipynb

30_clustering.ipynb

31_mixture_models.ipynb

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages