CONTENT Search Engine

Brief

i have given python files for the query model and a static folder and templates folder .The static folder contains css files while the templates folder containes html files for both the content and the query.The model is taken from https://github.com/UKPLab/sentence-transformers and the model was remotely placed in a model folder and the pretrained model parameters was downloaded from.The project report can be acceses from MODEL REPORT

model

The files for the textrank + T5 text summarizer + BERT embedding model by Akshay are:

data_provider_2
api_2
model_summarizer

Setup Instructions for content query

    1. First installed pip3 and pip sudo apt-get install python3-pip
    2. then my first action was to make python version to 3.6 since it was necessary to run pytorch and sentence_transformers library hence I decided to install python3.6 .But python 3.6 wasnt readily available to ubuntu 16.04 hence i instaled it by a indiect method
    3. sudo apt-get install software-properties-common
    4. sudo add-apt-repository ppa:deadsnakes/ppa 
    5. sudo apt-get update
    6. sudo apt-get install python3.6
    7. installed a pakcage called virtualenv  using sudo pip3 install virtualenv
    8.  To make a enviornment to install all packages there since i needed a lot of packages to install and hence i didnt want to make any changes to the sytem libraries and if anything wrong happens using installation you can delete the folder and repeat
    9. I created the enviornment by typing virtualenv --python=/usr/bin/python3.6 venv
    10. created a enviornment  and got into its enviornment do source venv/bin/activate
    11. to go out of the enviornment after ur work is done type deactivate (do not type it now only if u want to go to the system libraries)
    12. then  pip3 --no-cache-dir install torchvision to install torchvision library which would also install the libraries future,numpy,pillow,torch
    13. pip3 install sentence_transformers which installed the following packages threadpoolctl, scipy, scikit-learn, nltk, sentence-transformers
    14. pip3 install mysql-connector-python-rf
    15. pip3 install pandas
    16. pip3 install -U textblob
    17. pip3 install Flask
    18. pip3 install BeautifulSoup4
    19. pip3 install transformers
    20.pip3 install --upgrade gensim

for the content query

Build Question Model

please make the changes in the sql connect before running the files to connect to ur database

for model

Run the data_provider2 file which would make the final_content.csv file which makes two new columns of word count and sentence that are text ranked by connecting to the sql mysql.connector.connect(host='localhost', database={enter the name of the questions database}, user={enter the user } password = {enter the password})

Build Content Model

this is only for model

after running data_provider_2.py run model_and_summarizer.py which would create summary_content.csv which has the summary for all the text and then it also creates a embedding for the article summary and saves it in a file called sentence_encoder_content.Currently it gives a summary for all the articles and it takes 3 hours in GPU so if any new articles are added please do the summary only for those and add it and dont run it again. run the model embedding after that. then it created another file called summary_content.csv

Running API Server

for model

run the api_2.py and set the server accordingly to local host or any other server.running it will take around 30 seconds and after that displaying the results will take around 3-5 seconds.

summary_content.csv

sentence_encoder_content

                                           database='parentlane',
                                           user='root'
                                           password= '*****'')

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
static/css		static/css
templates		templates
MODEL REPORT.pdf		MODEL REPORT.pdf
README.md		README.md
api_2.py		api_2.py
data_provider_2.py		data_provider_2.py
model_and_summarizer.py		model_and_summarizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CONTENT Search Engine

Brief

model

Setup Instructions for content query

for the content query

Build Question Model

Build Content Model

this is only for model

Running API Server

for model

About

Releases

Packages

Languages

Akshay1-6180/NLP-search-engine-using-BERT

Folders and files

Latest commit

History

Repository files navigation

CONTENT Search Engine

Brief

model

Setup Instructions for content query

for the content query

Build Question Model

Build Content Model

this is only for model

Running API Server

for model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages