Coursera UW Machine Learning Clustering & Retrieval

Course can be found in Coursera

Notebook for quick search can be found in my blog SSQ

Videos in Bilibili(to which I post it)

Week 1 Intro
Week 2 Nearest Neighbor Search: Retrieving Documents
- Implement nearest neighbor search for retrieval tasks
- Contrast document representations (e.g., raw word counts, tf-idf,…)
  - Emphasize important words using tf-idf
- Contrast methods for measuring similarity between two documents
  - Euclidean vs. weighted Euclidean
  - Cosine similarity vs. similarity via unnormalized inner product
- Describe complexity of brute force search
- Implement KD-trees for nearest neighbor search
- Implement LSH for approximate nearest neighbor search
- Compare pros and cons of KD-trees and LSH, and decide which is more appropriate for given dataset
- Choosing features and metrics for nearest neighbor search
- Implementing Locality Sensitive Hashing from scratch
Week 3 Clustering with k-means
- Describe potential applications of clustering
- Describe the input (unlabeled observations) and output (labels) of a clustering algorithm
- Determine whether a task is supervised or unsupervised
- Cluster documents using k-means
- Interpret k-means as a coordinate descent algorithm
- Define data parallel problems
- Explain Map and Reduce steps of MapReduce framework
- Use existing MapReduce implementations to parallelize kmeans, understanding what’s being done under the hood
- Clustering text data with k-means
Week 4 Mixture Models: Model-Based Clustering
- Interpret a probabilistic model-based approach to clustering using mixture models
- Describe model parameters
- Motivate the utility of soft assignments and describe what they represent
- Discuss issues related to how the number of parameters grow with the number of dimensions
  - Interpret diagonal covariance versions of mixtures of Gaussians
- Compare and contrast mixtures of Gaussians and k-means
- Implement an EM algorithm for inferring soft assignments and cluster parameters
  - Determine an initialization strategy
  - Implement a variant that helps avoid overfitting issues
- Implementing EM for Gaussian mixtures
- Clustering text data with Gaussian mixtures
Week 5 Latent Dirichlet Allocation: Mixed Membership Modeling
- Compare and contrast clustering and mixed membership models
- Describe a document clustering model for the bagof-words doc representation
- Interpret the components of the LDA mixed membership model
- Analyze a learned LDA model
  - Topics in the corpus
  - Topics per document
- Describe Gibbs sampling steps at a high level
- Utilize Gibbs sampling output to form predictions or estimate model parameters
- Implement collapsed Gibbs sampling for LDA
- Modeling text topics with Latent Dirichlet Allocation
Week 6 Hierarchical Clustering & Closing Remarks
- Bonus content: Hierarchical clustering
  - Divisive clustering
  - Agglomerative clustering
    - The dendrogram for agglomerative clustering
    - Agglomerative clustering details
- Hidden Markov models (HMMs): Another notion of “clustering”
- Modeling text data with a hierarchy of clusters

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Lecture Slides		Lecture Slides
Week 1 PA 1		Week 1 PA 1
Week 1 PA 2		Week 1 PA 2
Week 3 PA 1		Week 3 PA 1
Week 4 PA 1		Week 4 PA 1
Week 4 PA 2		Week 4 PA 2
Week 5 PA 1		Week 5 PA 1
Week 6 PA 1		Week 6 PA 1
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Coursera UW Machine Learning Clustering & Retrieval

About

Uh oh!

Releases

Packages

Languages

License

SSQ/Coursera-UW-Machine-Learning-Clustering-Retrieval

Folders and files

Latest commit

History

Repository files navigation

Coursera UW Machine Learning Clustering & Retrieval

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages