Authors' data improvement #18
Replies: 2 comments
-
Hello @gtarasconi, thanks for your remarks. That's nice to have a precise reporting of these errors. Issue tracker 🖲It can be partly solved on a discretionary basis. In this case, our approach is the following:
Feel free to have a look at it and build on it. We will be glad to receive your push request and collaborate with you! Improve Grobid model 🎯Another approach is to train Grobid model (the parsing library we rely on) on systematic errors (see issue #14 for example). Then, the idea is to parse and consolidate flagged citations once again. That's certainly something we will do in 2020. We might well create a labelling app to crowd-source this important task. Any suggestion welcome. Other ideasAlso, as it seems that you have a particular interest in authors, you might be interested in getting ORCID identifiers. We did not add them yet. They are available in Crossref though. Note that there are Crossref bulks available online (see https://github.com/greenelab/crossref) and the baseline schema to ingest the database on BigQuery is available in schema/. Hope it helps, Cheers |
Beta Was this translation helpful? Give feedback.
-
Hello, |
Beta Was this translation helpful? Give feedback.
-
Feature description
First of all congratulations for the wonderful work!
I'd just highlight some possible 'easy' improvements in data quality:
I could be preparing some code in coming weeks (months?) and in case I can share it if you think it could be useful
--
Beta Was this translation helpful? Give feedback.
All reactions