-
Notifications
You must be signed in to change notification settings - Fork 0
LaoLiulaoliu/udacity-search-engine
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
udacity-search-engine ################################## This search engine is separated into two parts. 1. engine_spider.py 2. engine_search.py We can crawl new things to enlarge the data, and search word without a crawling process. engine_spider: Crawling web pages using thread pool model. At the same time, indexing all the words with crawled page(word stemming in it). After crawling and indexing, based on all pages' outgoing link, compute every page's ranking. Dump the ranking and indexing variables into the hard disk. Load the ranking and indexing variables from old data on hard disk. Whenever need to stop the spider,use Ctrl+C signal, program will dump the variables from memory to disk automatically. Auto detecte the webpages' encoding, and decode them. engine_search: Load ranking and indexing variables from hard disk. Search the words or a sentence. Inputing words from the console, then stemming and searching. Showing the result in smart way.(quick sort and count the NO. of different word in one url) Caching the query result.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published