Skip to content

Uses a custom search algorithm to quickly find a path between two Wikipedia pages.

Notifications You must be signed in to change notification settings

matt-mekha/WikipediaSpeedrunner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Algorithm Basics

Wikipedia pages have lots of links, so simply using something like a breadth-first search would take way too long. If you were to search every linked page recursively in a BFS, that would result in x^y total searches where x is the number of links per page (let's say 50 on average) and y is the depth of the search. After 4 layers, you would already be at over six million searches.

The solution to this problem is to determine which links on a page are most likely to result in a shorter path, and only search the first few. To determine this, the algorithm checks how long the title is (the assumption here is that shorter titles tend to be more general) along with some other factors. After searching the those pages, the algorithm then determines which page is best to be the new "parent" page using factors like content length and number of links. The algorithm will very quickly begin to reach general pages such as United States. Now, however, a new problem arises. Although we can easily travel from specific pages to general pages, once we're there, it is nearly impossible to determine how to narrow back down into specific pages like Adam Sandler without some sort of human knowledge (or a Neural Network).

The solution to the new problem is to generalize both ways. Using Wikipedia's What links here tool, we can go from specific to general pages in reverse, starting at the end page. Now that we can generalize in both directions, all we need is a "midpoint" page for both searches to meet in the middle. I chose to use the United States page because it is one of the most commonly linked pages on Wikipedia, and the likelihood that a page is connected to United States in the imaginary node graph of Wikipedia pages is very high.

About

Uses a custom search algorithm to quickly find a path between two Wikipedia pages.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published