Text-categorization-with-WEKA

This repository contains the data used in the experiments conducted for the paper Text categorization with WEKA: a survey by Donatella Merlini (email: [email protected]) and Martina Rossini (email: [email protected]).

In particular, all the multlingual recipes used for our Language Identification experiments can be found in the Recipes folder. A separate test set in ARFF format can be found here; it was used to get an estimate of how well our models could recognize the language of a generic piece of text, that does not have anything to do with cooking. Note that, as stated in the actuall papar these short sentences are extracted from the Leipzig Text Corpora.
Moreover, the stopword_list.txt contains the list of stopwords used for all the six languages we examinated. The file contains one word per line, as is required by the WordsFromFile stopwordsHandler in WEKA.

Lastly, the second text categorization example shown in the paper focuses on detecting the type of dish a certain recipe is about. The dataset used for this part can be found in the Dishes folder.

All our experiments were conducted using WEKA version 3.8.4.

Announcement:

As of 16-04-2021 our paper can be found on Elsevier's journal Machine Learning with Applications and can be accessed here.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-categorization-with-WEKA

Announcement:

About

Releases

Packages

License

mwritescode/text-categorization-with-WEKA

Folders and files

Latest commit

History

Repository files navigation

Text-categorization-with-WEKA

Announcement:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages