ENCE360

A concurrent web scraper which downloads all linked files for a particular website using a number of threads to provide efficient implementation. The task is split into three parts which should each be completed in turn, then the final task will build on the first two to complete the web scraper.

Guidelines:

 No global variables (at all)

 No more than 3 levels of nesting in any function and less than 40 lines of code

 Use minimum amount of code required (No more than 200 lines for any part of the assignment and preferably less than 100)

 The program should not crash, hang or exit unexpectedly as a result of bad output from an external website

 Instead, errors should be handled in a way described by the interface in the header files

 Implement all code in C, not C++ or using external programs

 Code outside of http.c crawler.c and queue.c will not be marked (but you may implement small bug fixes provided the interface stays the same)

 All memory allocated during the programs execution should be free()'ed and all resources, files, sockets should be closed before the program finishes

 Comment code as necessary, excessive commenting is not required but anything not obvious should be documented

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
include/uriparser		include/uriparser
lib		lib
src		src
test		test
Makefile		Makefile
Performance analysis.pdf		Performance analysis.pdf
README.md		README.md
crawler		crawler
http_test		http_test
queue_test		queue_test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ENCE360

About

Releases

Packages

Languages

iceman201/Web_Scraper

Folders and files

Latest commit

History

Repository files navigation

ENCE360

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages