Skip to content

MrMimic/covid-19-kaggle

Repository files navigation

Kaggle: COVID-19 Challenge

This library provides tools aiming to find different opinions in the scientific litterature regarding the user query.

The Kaggle notebook can be find here.

Birielfy:

  • It loads all articles into an SQLite DB.
  • Sentences are pre-processed.
  • Word2vec and TF-IDF are trained.
  • Sentences are vectorised.
  • The query is pre-processed and vectorised.
  • The distance between query and sentences is computed.
  • The top-k sentences are kept.
  • A clustering is applied on these sentences.
  • A ranking regarding its proximity to the centroid and authors of the papers.

Installation

Simply use:

pip install -q git+https://github.com/MrMimic/covid-19-kaggle

An then the library can be imported with:

from c19 import parameters, database_utilities, text_preprocessing, embedding, query_matching, clusterise_sentences, plot_clusters, display_output

Usage

Create the database

Please use this script to create the local database.

Query the DB

Please use this one to query the trained DB.

Re-train the W2V and TF-IDF

This script allows to re-train the W2V and TF-IDF to re-generate the parquet file.

Usage

All queries from the Kaggle challenge have been reformulated here. They have then been processed with the tool presented here.

Results are visible on Kaggle.

About

Library to be installed on the notebook.

Resources

License

Stars

Watchers

Forks

Packages

No packages published