Skip to content

Commit

Permalink
fix copyright class naming
Browse files Browse the repository at this point in the history
  • Loading branch information
kermitt2 committed Feb 1, 2024
1 parent a74c05c commit 0f9a1dc
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 7 deletions.
7 changes: 4 additions & 3 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,9 @@ The following functionalities are available:
- __Consolidation/resolution of the extracted bibliographical references__ using the [biblio-glutton](https://github.com/kermitt2/biblio-glutton) service or the [CrossRef REST API](https://github.com/CrossRef/rest-api-doc). In both cases, DOI/PMID resolution performance is higher than 0.95 F1-score from PDF extraction.
- __Extraction and parsing of patent and non-patent references in patent__ publications.
- __Extraction of Funders and funding information__ with optional matching of extracted funders with the CrossRef Funder Registry.
- __Identification of copyrights' owner and license associated to the document__, e.g. publisher or authors copyrights, CC-BY/CC-BY-NC/etc. license.

In a complete PDF processing, GROBID manages 55 final labels used to build relatively fine-grained structures, from traditional publication metadata (title, author first/last/middle names, affiliation types, detailed address, journal, volume, issue, pages, DOI, PMID, etc.) to full text structures (section title, paragraph, reference markers, head/foot notes, figure captions, etc.).
In a complete PDF processing, GROBID manages 68 final labels used to build relatively fine-grained structures, from traditional publication metadata (title, author first/last/middle names, affiliation types, detailed address, journal, volume, issue, pages, DOI, PMID, etc.) to full text structures (section title, paragraph, reference markers, head/foot notes, figure captions, etc.).

GROBID includes a comprehensive [web service API](https://grobid.readthedocs.io/en/latest/Grobid-service/), [Docker images](https://grobid.readthedocs.io/en/latest/Grobid-docker/), [batch processing](https://grobid.readthedocs.io/en/latest/Grobid-batch/), a JAVA API, a generic [training and evaluation framework](https://grobid.readthedocs.io/en/latest/Training-the-models-of-Grobid/) (precision, recall, etc., n-fold cross-evaluation), systematic [end-to-end benchmarking](https://grobid.readthedocs.io/en/latest/Benchmarking/) on thousand documents and the semi-automatic generation of training data.

Expand Down Expand Up @@ -108,7 +109,7 @@ A series of additional modules have been developed for performing __structure aw
- [grobid-quantities](https://github.com/kermitt2/grobid-quantities): recognition and normalization of physical quantities/measurements
- [grobid-superconductors](https://github.com/lfoppiano/grobid-superconductors): recognition of superconductor material and properties in scientific literature
- [entity-fishing](https://github.com/kermitt2/entity-fishing), a tool for extracting Wikidata entities from text and document, which can also use Grobid to pre-process scientific articles in PDF, leading to more precise and relevant entity extraction and the capacity to annotate the PDF with interactive layout
- [dataseer-ml](https://github.com/dataseer/dataseer-ml): identification of sections and sentences introducing datasets in a scientific article, and classification of the type of these datasets
- [datastet](https://github.com/kermitt2/datastet): identification of sections and sentences introducing datasets in a scientific article, identification of dataset names (implict and named datasets) and classification of the type of these datasets
- [grobid-ner](https://github.com/kermitt2/grobid-ner): named entity recognition
- [grobid-astro](https://github.com/kermitt2/grobid-astro): recognition of astronomical entities in scientific papers
- [grobid-bio](https://github.com/kermitt2/grobid-bio): a toy bio-entity tagger using BioNLP/NLPBA 2004 dataset
Expand Down Expand Up @@ -143,7 +144,7 @@ If you want to cite this work, please refer to the present GitHub project, toget
title = {GROBID},
howpublished = {\url{https://github.com/kermitt2/grobid}},
publisher = {GitHub},
year = {2008--2023},
year = {2008--2024},
archivePrefix = {swh},
eprint = {1:dir:dab86b296e3c3216e2241968f0d63b68e8209d3c}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ public enum License {
CCBYSA ("CC-BY-SA"),
CCBYNCSA ("CC-BY-NC-SA"),
CCBYND ("CC-BY-ND"),
COPYRIGHTS ("strict-copyrights"),
COPYRIGHT ("strict-copyrights"),
OTHER ("other"),
UNDECIDED ("undecided");

Expand All @@ -55,7 +55,7 @@ public String getName() {
};

public static List<String> licenses =
Arrays.asList("CC-0", "CC-BY", "CC-BY-NC", "CC-BY-NC-ND", "CC-BY-SA", "CC-BY-NC-SA", "CC-BY-ND", "copyrights", "other", "undecided");
Arrays.asList("CC-0", "CC-BY", "CC-BY-NC", "CC-BY-NC-ND", "CC-BY-SA", "CC-BY-NC-SA", "CC-BY-ND", "copyright", "other", "undecided");

private CopyrightsOwner copyrightsOwner;
private double copyrightsOwnerProb;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ protected static List<CopyrightsLicense> extractResults(String copyrightOwnerAsJ
double scoreUndecided = 0.0;
int rank = 0;
for (Double scoreField : scoreFields) {
if (scoreField>0.5 && scoreField >= bestProb) {
if (scoreField>0.5 && scoreField > bestProb) {
owner = CopyrightsOwner.valueOf(owners.get(rank).toUpperCase());
bestProb = scoreField;
}
Expand Down Expand Up @@ -156,7 +156,7 @@ protected static List<CopyrightsLicense> extractResults(String copyrightOwnerAsJ
License license = null;
rank = 0;
for (Double scoreField : scoreFields) {
if (scoreField>0.5 && scoreField >= bestProb) {
if (scoreField>0.5 && scoreField > bestProb) {
String valueLicense = licenses.get(rank);
valueLicense = valueLicense.replace("-", "");
license = License.valueOf(valueLicense.toUpperCase());
Expand Down

0 comments on commit 0f9a1dc

Please sign in to comment.