Search in your own papers

NaimAI is a Python package that (1) searches effeciently in papers and (2) generates an automatic review. It does do by structures each scientific paper using their abstract into 3 categories : objectives methods and results. Hence, when searching, the results will be showed by category. The results can then be reviewed and a review text will be automatically generated along with the references.
All the features are deployed on the NaimAI's website, where millions of paper are processed.
A Medium article goes more in depth with naimai's features of the web app.

Search in your own papers

You can either give a directory of the folder with articles in PDF format or a csv file with abstracts and other meta data as showed here.
The processing, the results and searching for relevent papers are explained in this colab.

Search in millions of papers

To search in the millions of papers already processed, you can use the naimai website. I might open source this part too if needed.

Structure your abstract

If you already have an abstract and want to test the segmentor (naimai's algorithm that structures abstract into Background, Objectives, Methods and Results), this colab walks you through the necessary steps.

Example of structured abstract :

Features to improve

Review Generation

The review generation needs more enhancement. The actual method consists of only rephrasing the objective phrase of each paper. I've some idea to go further and improve the review generation part. Let me know if you're interested and we'll do it together!

Besides the generated text, the references generation still can be brushed up to meet with many references style, and also to export it to other formats (BibTeX..).

Semantic search

The search is mainly based on a v0 semantic algorithm (using TfIdf model mainly). In a previous version, I've finetuned bert model for each field and the results were pretty interesting. The problem is that, with 10 fields on the web app, I ended up having 10 fine-tuned model. So the usage was pretty slow and the models were heavy. If you have any idea and/or want to contribute in this part, I'll be happy to talk to you!

Data papers

I've used about 10 millions open access abstracts I found here and there on the internet. If you've any source that could be useful, or even better, if we can process much more papers together to get more informations for the users, that'd be cool!

References

For abbreviations purposes, I used this code.
For PDF processing, I used Grobid.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 512 Commits
.github/workflows		.github/workflows
naimai		naimai
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bomr_classif.jpg		bomr_classif.jpg
logo.png		logo.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Search in your own papers

Search in millions of papers

Structure your abstract

Features to improve

Review Generation

Semantic search

Data papers

References

About

Releases 1

Packages

Languages

License

yassinekdi/naimai

Folders and files

Latest commit

History

Repository files navigation

Search in your own papers

Search in millions of papers

Structure your abstract

Features to improve

Review Generation

Semantic search

Data papers

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages