Group projects for the course Data Engineering held by professor Paolo Merialdo at Roma Tre University.
-
Project 1 - Scraping & Data extraction:
Downloading scientific papers from Arxiv (in html format) and extracting information regarding tables from them using xpaths.
-
Project 2 - Paper Search Engine:
Search engine for scientific papers, extracted in the previous project.
Server made with Apache Lucene and SpringBoot (Java).
Client made with Typescript and Bootstrap.
-
Project 3 - Table Search Engine + Semantich Search:
Continuation of project 2 with the introduction of the table search engine. Semantic search with evaluation of different models (e.g., BERT, All MiniLM v2) and different embedding methods.
-
Project 4 - Table data extraction and understanding:
-
Project 5 - Data Integration of Airline flights: