Kaggle: COVID-19 Challenge

This library provides tools aiming to find different opinions in the scientific litterature regarding the user query.

The Kaggle notebook can be find here.

Birielfy:

It loads all articles into an SQLite DB.
Sentences are pre-processed.
Word2vec and TF-IDF are trained.
Sentences are vectorised.
The query is pre-processed and vectorised.
The distance between query and sentences is computed.
The top-k sentences are kept.
A clustering is applied on these sentences.
A ranking regarding its proximity to the centroid and authors of the papers.

Installation

Simply use:

pip install -q git+https://github.com/MrMimic/covid-19-kaggle

An then the library can be imported with:

from c19 import parameters, database_utilities, text_preprocessing, embedding, query_matching, clusterise_sentences, plot_clusters, display_output

Usage

Create the database

Please use this script to create the local database.

Query the DB

Please use this one to query the trained DB.

Re-train the W2V and TF-IDF

This script allows to re-train the W2V and TF-IDF to re-generate the parquet file.

Usage

All queries from the Kaggle challenge have been reformulated here. They have then been processed with the tool presented here.

Results are visible on Kaggle.

Name		Name	Last commit message	Last commit date
Latest commit History 212 Commits
images		images
resources		resources
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
w2v_parquet_file_new_version.parquet		w2v_parquet_file_new_version.parquet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaggle: COVID-19 Challenge

Installation

Usage

Create the database

Query the DB

Re-train the W2V and TF-IDF

Usage

About

Releases

Packages

Contributors 3

Languages

License

MrMimic/covid-19-kaggle

Folders and files

Latest commit

History

Repository files navigation

Kaggle: COVID-19 Challenge

Installation

Usage

Create the database

Query the DB

Re-train the W2V and TF-IDF

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages