electra-ka

Introduction

electra-ka is an open-source model for the Georgian language.

The model is available on huggingface hub

The model is trained on 33GB of Georgian text collected from 4854621 pages in the commoncrawl archive.

The fine-tuned model is also available on the hub.

from transformers import ElectraTokenizerFast
model = ElectraForSequenceClassification.from_pretrained("jnz/electra-ka-discrediting")
tokenizer = ElectraTokenizerFast.from_pretrained("jnz/electra-ka")

inputs = tokenizer("your text goes here...", return_tensors="pt")
predictions = model(**inputs)

Under the hood, the electra model uses the same architecture as BERT, but to avoid misuse can only serve as a discriminator, which makes it much harder to use for text generation.

BERT architecture language model for Georgian language.

To read more about electra please refer to the paper ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators.

In case of any questions/comments please feel free to reach out at djanezashvili[at]gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

electra-ka

Introduction

Files

README.md

Latest commit

History

README.md

File metadata and controls

electra-ka

Introduction