Learning the (hidden or obvious) mapping of the world via a sample of data and computer algorithms that optimize parameters through training.
In supervised ML, the learned model can use information gained in features to reduce uncertainty when guessing/predicting the target label, namely, identifying data signals relevant for the new, unseen.
It is different from automation.
Algorithm / model | Type | Use case | Online demo / example |
---|---|---|---|
Association Rules | Unsupervised/Supervised | To identify items frequently bought together in transactional data; to perform market basket / affinity analysis | Demo: Generating association rules with transactions data (*interactive*) |
Neural Network | Unsupervised/Supervised | To understand how similar products are in order to design a campaign | Example: R |
Deep Neural Network: Softmax | Unsupervised/Supervised | To capture personalized preferences for a latent factor model for recommendations; To detect fraud transactions |
Example: see collaborative filtering |
Collaborative Filtering | Unsupervised | To recommend an item to a buyer because (a) similar buyers purchased it and (b) the user purchased similar item(s) | Examples: Python, R |
Content-Based Filtering | Unsupervised | To recommend an item to a buyer because the item strongly fits the user's preference | Example: Illustration |
Clustering | Unsupervised | To understand the grouping of consumers with respect to their purchase habits | Examples: Python, R |
PCA | Unsupervised | (a) To summarize data on a 2D map; (b) To reconstruct data using PCs |
Examples: Clojure, Python, R |
t-SNE | Unsupervised | To visualize data consisting of legitimate and fraudulent transactions on a 2D map | Examples: Python, R |
UMAP | Unsupervised | To visualize higher dimensional data on a 2D map | Examples: Python, R |
Network Analysis | Unsupervised | To understand the dynamics of how purchasing one item may affect purchasing another | Examples: Python, R |
Bayesian/Probabilities Networks | Supervised | To predict the chain of events linking to greater likelihood of consumer purchasing | Example: R |
k-Nearest Neighbors | Supervised | To predict what product a new customer may like, given the customer's characteristics | Examples: Python, R |
Support Vector Machine (SVM) | Supervised | To predict consumer's dichotomous purchasing decision | Examples: Python, R |
Naive Bayes | Supervised | To predict consumer's dichotomous purchasing decision | Examples: Python |
Linear Regression | Supervised | To explain sales via advertising budget | Examples: Python, R |
Logistic Regression | Supervised | To predict consumer's dichotomous purchasing decision | (1) Example: Python; (2) Demo: Running logistic regression with retail data (*interactive*) |
Decision Tree | Supervised | To predict consumer's decision of purchasing | Example: Decision trees of consumer purchasing |
Algorithm / model (selected) | Assumptions |
---|---|
Associate Rules (e.g., apriori) | 1. All subsets of a frequent itemset are frequent. |
Decision Trees | 1. The data can be described by features. 2. The class label can be predicted using the logic set of decisions in a decision tree. 3. Effectiveness can be achieved by finding a smaller tree with lower error. |
Neural Networks | As opposed to real neurons: 1. Nodes connect to each other sequentially via distinct layers. 2. Nodes within the same layer do not communicate with each other. 3. Nodes of the same layer have the same activation functions. 4. Input nodes only communicate indirectly with output nodes via the hidden layer. |
K-means clustering | 1. The clusters are spherical. 2. The clusters are of similar size. |
Naive Bayes | 1. Every pair of feature variables is independent of each other. 2. The contribution each feature makes to the target variable is equal. |
Logistic Regression | 1. DV is binary or ordinal. 2. Observations are independent of each other. 3. Little or no multicollinearity among the IV. 4. Linearity of IV (the X) and log odds (the z). 5. A large sample size. It needs at minimum of 10 cases with the least frequent DV for each IV. 6. There is no influential values (extreme values or outliers) in the continuous IV. |
Linear Regression | 1. Linearity: The relationship between X and Y is linear. 2. Independence: Residual -- Y is independent of the residuals. 3. Homoscedasticity: Residual -- variance of the residuals is the same for all values of X. 4. Normality: Residual -- residual is normally distributed. (2-4 are also known as IID: residuals are Independently, Identically Distributed as normal). 5. No or little multicollinearity among X's (for Multiple Linear Regression). |
It dependes on several factors, including (a) the nature of the data, (b) the goal of the analysis, (c) the relative performance of the algorithm, and (d) the possibility to integrate with business & operations.
Factors | Details |
---|---|
Nature of the data | Categorical, continuous, etc. |
Goal of analysis | * To describe, estimate, predict, cluster, classify, associate, explain, etc. * For example, decision trees are more readily interpretable than neural networks |
Algorithm performance/ Model evaluation |
* For classification, predictive power can be assessed via the area under ROC * For regression, there are a variety of choices, including R2, AIC, RMSE |
Business integration | * Data availability * Model tuning vs. new model * Thinking through IT integration at the beginning of the project * Business end users' actual uses |