Skip to content

yassinekdi/naimai

Repository files navigation

Naimai logo

NaimAI is a Python package that (1) searches effeciently in papers and (2) generates an automatic review. It does do by structures each scientific paper using their abstract into 3 categories : objectives methods and results. Hence, when searching, the results will be showed by category. The results can then be reviewed and a review text will be automatically generated along with the references.
All the features are deployed on the NaimAI's website, where millions of paper are processed.
A Medium article goes more in depth with naimai's features of the web app.

Search in your own papers

You can either give a directory of the folder with articles in PDF format or a csv file with abstracts and other meta data as showed here.
The processing, the results and searching for relevent papers are explained in this colab.

Search in millions of papers

To search in the millions of papers already processed, you can use the naimai website. I might open source this part too if needed.

Structure your abstract

If you already have an abstract and want to test the segmentor (naimai's algorithm that structures abstract into Background, Objectives, Methods and Results), this colab walks you through the necessary steps.

Example of structured abstract :

classified abstract

Features to improve

Review Generation 

The review generation needs more enhancement. The actual method consists of only rephrasing the objective phrase of each paper. I've some idea to go further and improve the review generation part. Let me know if you're interested and we'll do it together!

 Besides the generated text, the references generation still can be brushed up to meet with many references style, and also to export it to other formats (BibTeX..).

Semantic search 

The search is mainly based on a v0 semantic algorithm (using TfIdf model mainly). In a previous version, I've finetuned bert model for each field and the results were pretty interesting. The problem is that, with 10 fields on the web app, I ended up having 10 fine-tuned model. So the usage was pretty slow and the models were heavy. If you have any idea and/or want to contribute in this part, I'll be happy to talk to you! 

Data papers 

I've used about 10 millions open access abstracts I found here and there on the internet. If you've any source that could be useful, or even better, if we can process much more papers together to get more informations for the users, that'd be cool!

References

  • For abbreviations purposes, I used this code.
  • For PDF processing, I used Grobid.

CC BY-NC-SA 4.0

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

CC BY-NC-SA 4.0