Skip to content

BSc project template on information extraction from structured lists

Notifications You must be signed in to change notification settings

SpazzSpud/wiki-list-patterns

 
 

Repository files navigation

wiki-list-patterns Binder

BSc project template on information extraction from structured lists

Why: Many different kinds of documents contain lists: they are a simple way of enumerating several related items, and allow readers to look up and compare items. Often, every item in the list has some kind of structure. For example, here’s a few of Keira Knightley's film roles:

Each item contains different parts: the title of a film, year, the character she played and sometimes a comment. These parts are delineated by punctuation, but also often specific words or phrases. It would be nice to be able to extract information from these kinds of structures to add to Knowledge Bases, so we could automatically integrate it with other data.

What: This project aims to develop a system to automatically infer patterns in the structure of lists on Wikipedia.

How: The starting point of this project is a collection of thousands of lists from Wikipedia. These lists contain links to Wikipedia pages, and therefore you could use background knowledge from Knowledge Graphs that are linked to Wikipedia pages. You will evaluate and compare several approaches against an existing Python library.

Supervisor: Benno Kruit ([email protected])

See also:

About

BSc project template on information extraction from structured lists

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%