Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Importing Bibtex Files #1

Open
josepaiva94 opened this issue Apr 8, 2021 · 4 comments
Open

Add Support for Importing Bibtex Files #1

josepaiva94 opened this issue Apr 8, 2021 · 4 comments

Comments

@josepaiva94
Copy link

I have a list of relevant publications in a bibtex file. How can I start snowball process from such list?

@blochberger
Copy link
Owner

You could use bibtexparser to process the file in Python and create the necessary objects (Author and Publication), see for example:

for result in results:
# Add authors to database
authors: List[Author] = []
for name in result.authors:
author, created = Author.objects.get_or_create(name=name)
if created:
self.log_success(f"Added author: {author}")
else:
self.log_info(f"Author '{author}' alreay known")
authors.append(author)
# Add publication to database
publication, created = Publication.objects.get_or_create(
cite_key=result.cite_key,
title=result.title,
year=result.year,
peer_reviewed=result.is_peer_reviewed,
first_page=result.first_page,
last_page=result.last_page,
doi=result.doi,
)
if created:
self.log_success(f"Added publication: {publication}")
else:
self.log_info(f"Publication '{publication}' already known")
publications.append(publication)
# Assign authors
for position, author in enumerate(authors):
publication_author, created = PublicationAuthor.objects.get_or_create(
author=author,
publication=publication,
position=position,
)
if created:
self.log_success(f"Assigned author '{author}' to publication '{publication}' at position {position}")
else:
self.log_info(f"Author '{author}' already assigned to publication '{publication}' at position '{position}'")

If you already use DBLP cite keys in your file, you can simply import the publications with

./manage.py dblpimport 'DBLP:conf/ease/PetersenFMM08' 'DBLP:conf/ccs/EgeleBFK13'

You can provide as many cite keys in the single command. Add --use-api flag, if you do not have a DBLP dump and want to fetch a live version from the server. But be aware that a request is made for each key provided.

Note that the Semantic Search API integration that is used for snowballing works based on the DOI (or Semantic Search paper ID). If you have a DOI in your bibtex file, it should work out of the box, else you might need to assign the IDs accordingly. You could search DBLP for the publication title to semi-automatically identify related DBLP entries (or more specifically DOIs). You could do the same with the Semantic Search API.

@blochberger blochberger changed the title How to start from a primary list of publications? Add Support for Importing Bibtex Files Apr 8, 2021
@josepaiva94
Copy link
Author

I have converted the bibtex entries into an SQL insert statement on sok_publications. Then, I run ./manage.py repair and I can now ./manage.py snowball :-)

@blochberger
Copy link
Owner

I think the use-case is still interesting and is worth supporting. Hence, I keep the issue open as a reminder.

@blochberger blochberger reopened this Apr 8, 2021
@josepaiva94
Copy link
Author

josepaiva94 commented Apr 8, 2021

The small script for anyone facing the same use-case https://gist.github.com/josepaiva94/c91834935923e8394aa19ed766d8fa51 (DOIs are mandatory!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants