Abstract Meaning Representation for Brazilian Portuguese

This repository contains all AMR-annotated corpora developed by the Interinstitutional Center for Computational Linguistics (NILC).

For more information about what is AMR and its specific notations, we indicate the AMR guidelines GitHub repository.

Organization

This repository is organized in subdirectories, which contain each individual corpus. All corpora are distributed under the CC-BY-NC-SA license.

OpiSums-PT-AMR

This corpus contains opinion texts from the OpiSums-PT corpus manually annotated in AMR.

AMRNews

This corpus contains news texts from the Folha de São Paulo newspaper manually annotated in AMR.

AMR-LittlePrince

This corpus contains sentences from the Little Prince tale annotated in AMR through alignment from the English version and later manually revised.

For more detailed information about each corpus, please read the README file in the specific corpus directory.

Corpus notation

The corpora follow a standard notation to ease the reading of files. A corpus file contains multiple sentences, each with some metainformation, which starts with a hashtag followed by double colons (# ::) and a keyword (id, snt, alignment...). Then, the AMR graph representation in the PENMAN notation is written. An example is shown below:

# ::id Fala-Serio-Mae.Documento_32.1
# ::snt Amei esse livro .
(a / amar-01
      :ARG0 (e2 / eu)
      :ARG1 (l / livro
            :mod (e3 / esse)))

A blank line separates each sentence.

Statistics

Statistics of each corpus can be obtained by running the script stats_amr.py in this way:

python stats_amr.py <corpus_path> #For example: AMRNews/unsplit/amr.txt

Publications

Both OpiSums-PT-AMR and AMRNews are presented and compared in more detail in the following paper, which has been accepted for publication in DELTA and is currently available as in a pre-print format.

@techreport{InacioEtAl2022,
  type = {Preprint},
  title = {The {{AMR-PT}} Corpus and the Semantic Annotation of Challenging Sentences from Journalistic and Opinion Texts},
  author = {In{\'a}cio, Marcio Lima and Cabezudo, Marco Antonio Sobrevilla and Ramisch, Renata and Di Felippo, Ariani and Pardo, Thiago Alexandre Salgueiro},
  year = {2022},
  month = aug,
  doi = {10.1590/1678-460x202255159},
  url = {https://preprints.scielo.org/index.php/scielo/preprint/view/4652/version/4928},
  urldate = {2022-08-31},
  copyright = {All rights reserved}
}

The AMR-LittlePrince corpus is described in:

@inproceedings{anchieta-pardo-2018-towards,
    title = "Towards {AMR}-{BR}: A {S}em{B}ank for {B}razilian {P}ortuguese Language",
    author = "Anchi{\^e}ta, Rafael  and
      Pardo, Thiago",
    booktitle = "Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC} 2018)",
    month = may,
    year = "2018",
    address = "Miyazaki, Japan",
    publisher = "European Language Resources Association (ELRA)",
    url = "https://www.aclweb.org/anthology/L18-1157",
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
AMR-LittlePrince		AMR-LittlePrince
AMRNews		AMRNews
AMRScien-Br-Corpus @ fe398a3		AMRScien-Br-Corpus @ fe398a3
OpiSums-PT-AMR		OpiSums-PT-AMR
.gitmodules		.gitmodules
LICENSE.md		LICENSE.md
README.md		README.md
stats_amr.py		stats_amr.py
verbo-brasil.dic		verbo-brasil.dic

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstract Meaning Representation for Brazilian Portuguese

Organization

OpiSums-PT-AMR

AMRNews

AMR-LittlePrince

Corpus notation

Statistics

Publications

About

Releases

Packages

Contributors 2

Languages

nilc-nlp/AMR-BP

Folders and files

Latest commit

History

Repository files navigation

Abstract Meaning Representation for Brazilian Portuguese

Organization

OpiSums-PT-AMR

AMRNews

AMR-LittlePrince

Corpus notation

Statistics

Publications

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages