Skip to content

Latest commit

 

History

History
22 lines (13 loc) · 1.96 KB

README.md

File metadata and controls

22 lines (13 loc) · 1.96 KB

wiki-list-patterns Binder

BSc project template on information extraction from structured lists

Why: Many different kinds of documents contain lists: they are a simple way of enumerating several related items, and allow readers to look up and compare items. Often, every item in the list has some kind of structure. For example, here’s a few of Keira Knightley's film roles:

Each item contains different parts: the title of a film, year, the character she played and sometimes a comment. These parts are delineated by punctuation, but also often specific words or phrases. It would be nice to be able to extract information from these kinds of structures to add to Knowledge Bases, so we could automatically integrate it with other data.

What: This project aims to develop a system to automatically infer patterns in the structure of lists on Wikipedia.

How: The starting point of this project is a collection of thousands of lists from Wikipedia. These lists contain links to Wikipedia pages, and therefore you could use background knowledge from Knowledge Graphs that are linked to Wikipedia pages. You will evaluate and compare several approaches against an existing Python library.

Supervisor: Benno Kruit ([email protected])

See also: