Skip to content

This repository contains the data used for our paper 'Text categorization with WEKA: a survey'.

License

Notifications You must be signed in to change notification settings

mwritescode/text-categorization-with-WEKA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Text-categorization-with-WEKA

This repository contains the data used in the experiments conducted for the paper Text categorization with WEKA: a survey by Donatella Merlini (email: [email protected]) and Martina Rossini (email: [email protected]).

In particular, all the multlingual recipes used for our Language Identification experiments can be found in the Recipes folder. A separate test set in ARFF format can be found here; it was used to get an estimate of how well our models could recognize the language of a generic piece of text, that does not have anything to do with cooking. Note that, as stated in the actuall papar these short sentences are extracted from the Leipzig Text Corpora.
Moreover, the stopword_list.txt contains the list of stopwords used for all the six languages we examinated. The file contains one word per line, as is required by the WordsFromFile stopwordsHandler in WEKA.

Lastly, the second text categorization example shown in the paper focuses on detecting the type of dish a certain recipe is about. The dataset used for this part can be found in the Dishes folder.

All our experiments were conducted using WEKA version 3.8.4.


Announcement:

As of 16-04-2021 our paper can be found on Elsevier's journal Machine Learning with Applications and can be accessed here.

About

This repository contains the data used for our paper 'Text categorization with WEKA: a survey'.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published