GitHub

Overview

This is a naive implementation of a binary naive Bayes text classifier in postgresql. The data set was downloaded from http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection.

ts_stat is used to extract counts of each word by class (spam or ham). crosstab is used as a convenience to pivot the spam and ham counts in to adjacent columns. then the frequency distributions of words by class are used to classify documents.

Usage

To train on 30% of the data set (and test on the remaining 70%), run

select test_naive_bayes(0.3);

Sources

Almeida, T.A., GÃ³mez Hidalgo, J.M., Yamakami, A. Contributions to the Study of SMS Spam Filtering: New Collection and Results. Proceedings of the 2011 ACM Symposium on Document Engineering (DOCENG'11), Mountain View, CA, USA, 2011.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
README.postgresml		README.postgresml
naive_bayes.sql		naive_bayes.sql
postgresml--1.0.sql		postgresml--1.0.sql
postgresml.c		postgresml.c
postgresml.control		postgresml.control

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Usage

Sources

About

Uh oh!

Releases

Packages

Languages

yieldsfalsehood/postgresml

Folders and files

Latest commit

History

Repository files navigation

Overview

Usage

Sources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages