Telugu-POS-Tagger-and-Chunker

This repository contains both Telugu pos tagger and chunker.
Both the models are CRF based (Please install CRF++ toolkit. All the necessary details can be accesed from the below website: https://taku910.github.io/crfpp/).
Consists of 2 steps.
- In the first step, extract features from the raw sentences and run the POS tagger.
- In the second step, the chunker model is run on the output of the POS tagger, here it only depends on the pos tags, not tokens.
How to run the models:
- sh run_telugu_pos_and_chunk_models_and_save_to_file.sh input_file pos_model chunk_model output_file
- You can use the sample file in the repositoty to get the output: sh run_telugu_pos_and_chunk_models_and_save_to_file.sh telugu-input.txt telugu-pos-model.m telugu-chunk-model.m telugu-output.txt
The raw sentences need to be tokenized, for Indian language tokenization use our tokenization repository (https://github.com/Pruthwik/Tokenizer_for_Indian_Languages)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
create_features_for_pos_tagging.py		create_features_for_pos_tagging.py
run_telugu_pos_and_chunk_models_and_save_to_file.sh		run_telugu_pos_and_chunk_models_and_save_to_file.sh
telugu-chunk-model.m		telugu-chunk-model.m
telugu-input.txt		telugu-input.txt
telugu-output.txt		telugu-output.txt
telugu-pos-model.m		telugu-pos-model.m

Provide feedback