Skip to content

Projects for the course Data Engineering held by professor Paolo Merialdo at Roma Tre University.

License

Notifications You must be signed in to change notification settings

Xhst/data-engineering-projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Projects

Group projects for the course Data Engineering held by professor Paolo Merialdo at Roma Tre University.

  • Project 1 - Scraping & Data extraction:

    Downloading scientific papers from Arxiv (in html format) and extracting information regarding tables from them using xpaths.

  • Project 2 - Paper Search Engine:

    Search engine for scientific papers, extracted in the previous project.

    Server made with Apache Lucene and SpringBoot (Java).

    Client made with Typescript and Bootstrap.

  • Project 3 - Table Search Engine + Semantich Search:

    Continuation of project 2 with the introduction of the table search engine. Semantic search with evaluation of different models (e.g., BERT, All MiniLM v2) and different embedding methods.

  • Project 4 - Table data extraction and understanding:

  • Project 5 - Data Integration of Airline flights: