Machine Learning

Learning the (hidden or obvious) mapping of the world via a sample of data and computer algorithms that optimize parameters through training.

In supervised ML, the learned model can use information gained in features to reduce uncertainty when guessing/predicting the target label, namely, identifying data signals relevant for the new, unseen.

It is different from automation.

Algorithms / models:

Algorithm / model	Type	Use case	Online demo / example
Association Rules	Unsupervised/Supervised	To identify items frequently bought together in transactional data; to perform market basket / affinity analysis	Demo: Generating association rules with transactions data (*interactive*)
Neural Network	Unsupervised/Supervised	To understand how similar products are in order to design a campaign	Example: R
Deep Neural Network: Softmax	Unsupervised/Supervised	To capture personalized preferences for a latent factor model for recommendations; To detect fraud transactions	Example: see collaborative filtering
Collaborative Filtering	Unsupervised	To recommend an item to a buyer because (a) similar buyers purchased it and (b) the user purchased similar item(s)	Examples: Python, R
Content-Based Filtering	Unsupervised	To recommend an item to a buyer because the item strongly fits the user's preference	Example: Illustration
Clustering	Unsupervised	To understand the grouping of consumers with respect to their purchase habits	Examples: Python, R
PCA	Unsupervised	(a) To summarize data on a 2D map; (b) To reconstruct data using PCs	Examples: Clojure, Python, R
t-SNE	Unsupervised	To visualize data consisting of legitimate and fraudulent transactions on a 2D map	Examples: Python, R
UMAP	Unsupervised	To visualize higher dimensional data on a 2D map	Examples: Python, R
Network Analysis	Unsupervised	To understand the dynamics of how purchasing one item may affect purchasing another	Examples: Python, R
Bayesian/Probabilities Networks	Supervised	To predict the chain of events linking to greater likelihood of consumer purchasing	Example: R
k-Nearest Neighbors	Supervised	To predict what product a new customer may like, given the customer's characteristics	Examples: Python, R
Support Vector Machine (SVM)	Supervised	To predict consumer's dichotomous purchasing decision	Examples: Python, R
Naive Bayes	Supervised	To predict consumer's dichotomous purchasing decision	Examples: Python
Linear Regression	Supervised	To explain sales via advertising budget	Examples: Python, R
Logistic Regression	Supervised	To predict consumer's dichotomous purchasing decision	(1) Example: Python; (2) Demo: Running logistic regression with retail data (*interactive*)
Decision Tree	Supervised	To predict consumer's decision of purchasing	Example: Decision trees of consumer purchasing

Assumptions:

Algorithm / model (selected)	Assumptions
Associate Rules (e.g., apriori)	1. All subsets of a frequent itemset are frequent.
Decision Trees	1. The data can be described by features. 2. The class label can be predicted using the logic set of decisions in a decision tree. 3. Effectiveness can be achieved by finding a smaller tree with lower error.
Neural Networks	As opposed to real neurons: 1. Nodes connect to each other sequentially via distinct layers. 2. Nodes within the same layer do not communicate with each other. 3. Nodes of the same layer have the same activation functions. 4. Input nodes only communicate indirectly with output nodes via the hidden layer.
K-means clustering	1. The clusters are spherical. 2. The clusters are of similar size.
Naive Bayes	1. Every pair of feature variables is independent of each other. 2. The contribution each feature makes to the target variable is equal.
Logistic Regression	1. DV is binary or ordinal. 2. Observations are independent of each other. 3. Little or no multicollinearity among the IV. 4. Linearity of IV (the X) and log odds (the z). 5. A large sample size. It needs at minimum of 10 cases with the least frequent DV for each IV. 6. There is no influential values (extreme values or outliers) in the continuous IV.
Linear Regression	1. Linearity: The relationship between X and Y is linear. 2. Independence: Residual -- Y is independent of the residuals. 3. Homoscedasticity: Residual -- variance of the residuals is the same for all values of X. 4. Normality: Residual -- residual is normally distributed. (2-4 are also known as IID: residuals are Independently, Identically Distributed as normal). 5. No or little multicollinearity among X's (for Multiple Linear Regression).

Algorithm Selection:

It dependes on several factors, including (a) the nature of the data, (b) the goal of the analysis, (c) the relative performance of the algorithm, and (d) the possibility to integrate with business & operations.

Factors	Details
Nature of the data	Categorical, continuous, etc.
Goal of analysis	* To describe, estimate, predict, cluster, classify, associate, explain, etc. * For example, decision trees are more readily interpretable than neural networks
Algorithm performance/ Model evaluation	* For classification, predictive power can be assessed via the area under ROC * For regression, there are a variety of choices, including R², AIC, RMSE
Business integration	* Data availability * Model tuning vs. new model * Thinking through IT integration at the beginning of the project * Business end users' actual uses

Name		Name	Last commit message	Last commit date
Latest commit History 1,336 Commits
DNN-softmax		DNN-softmax
PCA		PCA
SVM		SVM
UMAP		UMAP
association_rules		association_rules
bayesian_networks		bayesian_networks
clustering		clustering
collaborative_filtering		collaborative_filtering
content-based_filtering		content-based_filtering
decision_tree		decision_tree
glossary		glossary
kNN		kNN
linear_regression		linear_regression
logistic_regression		logistic_regression
model_evaluation		model_evaluation
model_optimization		model_optimization
naive_bayes		naive_bayes
network_analysis		network_analysis
neural_network		neural_network
recommendation_system		recommendation_system
regularity		regularity
t-SNE		t-SNE
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning

Algorithms / models:

Assumptions:

Algorithm Selection:

About

Releases

Packages

Languages

License

daniel-yj-yang/ml

Folders and files

Latest commit

History

Repository files navigation

Machine Learning

Algorithms / models:

Assumptions:

Algorithm Selection:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages