Korcen-kogpt2

This failure is the seed of innovation.

"Intelligent Filtering: Detecting Nuance and Context with Machine Learning."

Moving beyond keyword matching, this project introduces a machine learning-powered profanity filter. By analyzing context and linguistic patterns, it aims to identify and filter out offensive language more accurately and intelligently, even when subtle variations or creative spellings are used.

Korcen: original before innovation.

Korcen-13M-EXAONE: This failure, though another, is a better one.

Model Overview

total samples: 2,000,000
Training samples: 1,800,000
Validation samples: 200,000

Tokenizer: SKT-AI/KoGPT2

Verification

모델	korean-malicious-comments-dataset	Curse-detection-data	kmhas_korean_hate_speech	Korean Extremist Website Womad Hate Speech Data	LGBT-targeted HateSpeech Comments Dataset (Korean)
korcen	0.7121	0.8415	0.6800	0.6305	0.4479
TF VDCNN_KOGPT2 (23.06.15)	0.7545	0.7824		0.7055	0.6875

Example

# py: 3.10, tf: 2.10
import tensorflow as tf
import numpy as np
import pickle
from tensorflow.keras.preprocessing.sequence import pad_sequences

maxlen = 1000

model_path = 'vdcnn_model.h5'
tokenizer_path = "tokenizer.pickle"

model = tf.keras.models.load_model(model_path)

with open(tokenizer_path, "rb") as f:
    tokenizer = pickle.load(f)

def preprocess_text(text):
    text = text.lower()
    return text

def predict_text(text):
    sentence = preprocess_text(text)
    encoded_sentence = tokenizer.encode_plus(
        sentence,
        max_length=maxlen,
        padding="max_length",
        truncation=True
    )['input_ids']

    sentence_seq = pad_sequences([encoded_sentence], maxlen=maxlen, truncating="post")
    prediction = model.predict(sentence_seq)[0][0]
    return prediction

while True:
    text = input("Enter the sentence you want to test: ")
    result = predict_text(text)
    if result >= 0.5:
        print("This sentence contains abusive language.")
    else:
        print("It's a normal sentence.")

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
example		example
model		model
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Korcen-kogpt2

This failure is the seed of innovation.

Model Overview

Verification

Example

About

Uh oh!

Languages

License

Tanat05/korcen-kogpt2

Folders and files

Latest commit

History

Repository files navigation

Korcen-kogpt2

This failure is the seed of innovation.

Model Overview

Verification

Example

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages