Skip to content

Wikipedia Latent Semantic Analysis with PySpark

Notifications You must be signed in to change notification settings

opentrainingcamp/Wikipedia_LSA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Py-Spark implementation of the 6th chapter of the book "Advanced Analytics with Spark: Patterns for Learning from Data at Scale" (Uri Laserson, Sean Owen, Sandy Ryza, Josh Wills), originally implemented in Scala. The goal is to apply LSA (Latent Semantic Analysis) to a corpus of Wikipedia articles. In order to do this, we employ the Wikipedia Data Dumps dataset.

About

Wikipedia Latent Semantic Analysis with PySpark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%