KaTo: Tokenization, Normalization, and Lemmatization CLI-Based NLP Tool

Overview

KaTo is a command-line interface (CLI) tool designed specifically for Tokenization, Normalization, and Lemmatization in Natural Language Processing (NLP). Built entirely in Haskell using functional programming principles, KaTo provides a straightforward and efficient solution for processing text input.

Logic Diagram

Below is a simplified logic diagram illustrating the flow of data in KaTo:

+------------------+
|                  |
|  User Input Text |
|                  |
+--------+---------+
         |
         v
+--------+---------+
|                  |
|    Tokenization  | <----+
|                  |       |
+--------+---------+       |
         |                 |
         v                 |
+--------+---------+       |
|                  |       |
|   Normalization  |       |
|                  |       |
+--------+---------+       |
         |                 |
         v                 |
+--------+---------+       |
|                  |       |
|   Lemmatization  |       |
|                  |       |
+--------+---------+       |
         |                 |
         v                 |
+--------+---------+       |
|                  |       |
|   Display Results |      |
|                  |       |
+------------------+       |
         |                 |
         v                 |
+------------------+       |
|                  |       |
|  User Receives    |      |
|  Processed Output |      |
|                  |       |
+------------------+       |

Components of the Logic Diagram

User Input Text: The process begins with the user providing text input through the command line.
Tokenization: The text is split into individual tokens or words for further processing.
Normalization: Each token is standardized (e.g., lowercased, punctuation removed) to ensure consistency.
Lemmatization: The normalized tokens are transformed into their base forms (lemmas).
Display Results: The results of the tokenization, normalization, and lemmatization are formatted and prepared for display.
User Receives Processed Output: Finally, the user sees the processed output in the terminal.

Setting Up KaTo

Clone the Repository:

git clone <repository-url>
cd <repository-name>

Install Dependencies: Ensure you have Haskell and Cabal installed. Run:
```
cabal update
cabal install --only-dependencies
```
Build the Project:
```
cabal build
```

Using KaTo

Run the Application:
```
cabal run
```
Enter Text for Processing: When prompted, type or paste the text you want to analyze and press Enter.
View Results: After processing, KaTo will display the tokens, normalized tokens, and lemmatized tokens.

Example Usage

$ kato
Welcome to KaTo: A Tokenization, Normalization, and Lemmatization CLI-Based NLP Tool!
Please enter the text you want to process:
> The children are running quickly.
Tokens: ["The", "children", "are", "running", "quickly."]
Normalized Tokens: ["the", "children", "are", "running", "quickly"]
Lemmatized Tokens: ["the", "child", "be", "run", "quick"]
Process completed successfully.

Contributing

We welcome contributions! Please refer to the Contribution Guidelines (replace with actual link) for details on how to get involved.

License

This project is licensed under the MIT License (replace with actual link).

Feel free to customize any sections, especially the links for contribution guidelines and licensing. If you need further adjustments or additional sections, let me know!

This project is licensed under the MIT License (replace with actual link).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
kato		kato
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KaTo: Tokenization, Normalization, and Lemmatization CLI-Based NLP Tool

Overview

Logic Diagram

Components of the Logic Diagram

Setting Up KaTo

Using KaTo

Example Usage

Contributing

License

About

Releases

Packages

Languages

License

gluppler/KaTo

Folders and files

Latest commit

History

Repository files navigation

KaTo: Tokenization, Normalization, and Lemmatization CLI-Based NLP Tool

Overview

Logic Diagram

Components of the Logic Diagram

Setting Up KaTo

Using KaTo

Example Usage

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages