ArrowTextClassifier is a Python package for text classification tasks, offering functionalities to train, summarize, and classify text using convolutional neural network (CNN) architecture.
You can install ArrowTextClassifier via pip:
pip install ArrowTextClassifier
ArrowTextClassifier implements a convolutional neural network (CNN) architecture for text classification. It tokenizes input text, embeds the tokens, applies convolutional filters over the embedded tokens to extract features, and then classifies the text into predefined categories.
To train a text classification model, you can utilize the train_model
method provided by the Model
class:
from ArrowTextClassifier import Model
model = Model(name="your_model_name")
model.train_model(dataset)
To make your own custom dataset for training you need to create a parquet file with the following format:
Example Parquet File
{"label":"normal","example":"Hey there!"}
{"label":"normal","example":"Hi!"}
{"label":"toxic","example":"You suck!"}
After you have created the parquet file with the data in the format above, you can provide to the dataset to start training the model.
To summarize a trained model, you can use the summarize
method:
model.summarize(
model_path="path_to_your_model",
hyperparams_path="path_to_hyperparameters_file",
vocabulary_path="path_to_vocabulary_file",
modelSummary_write_path="path_to_write_model_summary"
)
For classifying text using the trained model:
result = model.classify(
model_path="path_to_your_model",
hyperparams_path="path_to_hyperparameters_file",
text="your_input_text",
vocabulary_path="path_to_vocabulary_file"
)
print(result)
This package provides tools for text classification tasks. You can explore and customize it according to your requirements. Refer to the documentation for detailed usage instructions. We have also made our own colab notebook to help you train a custom offensive language classifier using this.
This project is licensed under the MIT License - see the LICENSE file for details.
For any questions or feedback, please contact [email protected] or you can contact me at discord - techpowerb.