Persian Formality Style Transfer

This project aims to do formality style transfer on Persian language using the T5 transformer architecture.
You can see a demo of the Formality Style Transfer on HuggingFace Spaces. Demo for the other models will be available soon.

Available Models on HuggingFace

Available Tasks

Tasks	Description
Style Transfer	Convert an informal text/document to a formal style
Style Classification	Classify the style of an input text/document

Used Datasets

For the informal dataset we have used a dataset of Persian product reviews from Digikala, an Iraninan e-commerce company.
For the formal dataset we used the Tapaco dataset which has a paraphrase for every instance that was created by our T5-based paraphraser.

1. How to Run

First do a pip install -r requirements.txt
Modify hyperparameters in the config.py if needed
Run prep.sh to install & download required datasets & packages
Run main.py
- Task: Pass the task argument for the desired task to be performed. transfer for style transfer (Change the style of input text(s) to formal) or classify to classify the style of input text(s).
- Mode: Pass the mode argument to train or test the models.
- Input: Pass the input argument for either task in test mode (Can be a single line or path to a file with each sentence in one line). This input is only used for the test mode. To change the input data for training, please see section 2, Custom Datasets.
Example:
```
* Transfering a single input: python main.py transfer test --input 'من این بچه رو دوست دارم'
* Classifing a whole document: python main.py classify test --input doc.txt
```
Note: You must change the BASE_CONFIG/local_model_path in confing.py to point to your own directory, Google Drive, etc.
Note 2: The output labels of classification task might need to be swapped after each training.

2. Custom Datasets

Put your custom datasets in under the data directory. You could also check the data folder to see example files. To change the path to your custom datasets, please modify values of TRAIN_CONFIG in config.py.

2.1 Style Transfer

For the transfer task you would need a text file (paraphrase_data.txt) with each line containing of two comma-separated instances which are the paraphrase of each other. No labels are required for this to work.

2.2 Style Classification

For the classification task you would need two text files (informal_data.txt & formal_data.txt) with each line containing only one instance. No labels are required for this to work either.

3. Evaluation

TBI

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data		data
notes		notes
.gitignore		.gitignore
README.md		README.md
config.py		config.py
erfan226.session		erfan226.session
eval.ipynb		eval.ipynb
eval.py		eval.py
main.py		main.py
paraphraser.py		paraphraser.py
prep.sh		prep.sh
requirements.txt		requirements.txt
style_classifier.py		style_classifier.py
style_transfer.py		style_transfer.py
text_processor.py		text_processor.py
train.ipynb		train.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Persian Formality Style Transfer

Available Models on HuggingFace

Available Tasks

Used Datasets

1. How to Run

Example:

2. Custom Datasets

2.1 Style Transfer

2.2 Style Classification

3. Evaluation

About

Contributors 2

Languages

Sharif-SLPL/unsupervised-style-transformation

Folders and files

Latest commit

History

Repository files navigation

Persian Formality Style Transfer

Available Models on HuggingFace

Available Tasks

Used Datasets

1. How to Run

Example:

2. Custom Datasets

2.1 Style Transfer

2.2 Style Classification

3. Evaluation

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages