Skip to content

ArrowTextClassifier is a simple text classification tool written in pytorch that allows you to train, summarize, and use text classification models for various tasks.

License

Notifications You must be signed in to change notification settings

Bhargav230m/ArrowTextClassifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ArrowTextClassifier

ArrowTextClassifier is a Python package for text classification tasks, offering functionalities to train, summarize, and classify text using convolutional neural network (CNN) architecture.

Installation

You can install ArrowTextClassifier via pip:

pip install ArrowTextClassifier

How it Works

ArrowTextClassifier implements a convolutional neural network (CNN) architecture for text classification. It tokenizes input text, embeds the tokens, applies convolutional filters over the embedded tokens to extract features, and then classifies the text into predefined categories.

Usage

Training

To train a text classification model, you can utilize the train_model method provided by the Model class:

from ArrowTextClassifier import Model

model = Model(name="your_model_name")
model.train_model(dataset)

How to make a dataset

To make your own custom dataset for training you need to create a parquet file with the following format:

Example Parquet File

{"label":"normal","example":"Hey there!"}
{"label":"normal","example":"Hi!"}
{"label":"toxic","example":"You suck!"}

After you have created the parquet file with the data in the format above, you can provide to the dataset to start training the model.

Summarization

To summarize a trained model, you can use the summarize method:

model.summarize(
    model_path="path_to_your_model",
    hyperparams_path="path_to_hyperparameters_file",
    vocabulary_path="path_to_vocabulary_file",
    modelSummary_write_path="path_to_write_model_summary"
)

Classification

For classifying text using the trained model:

result = model.classify(
    model_path="path_to_your_model",
    hyperparams_path="path_to_hyperparameters_file",
    text="your_input_text",
    vocabulary_path="path_to_vocabulary_file"
)
print(result)

Getting Started

This package provides tools for text classification tasks. You can explore and customize it according to your requirements. Refer to the documentation for detailed usage instructions. We have also made our own colab notebook to help you train a custom offensive language classifier using this.

License

This project is licensed under the MIT License - see the LICENSE file for details.


Contact

For any questions or feedback, please contact [email protected] or you can contact me at discord - techpowerb.

About

ArrowTextClassifier is a simple text classification tool written in pytorch that allows you to train, summarize, and use text classification models for various tasks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published