Skip to content

Hierarchical bloom classifier for tagging text with a structured word list

Notifications You must be signed in to change notification settings

christopherdebeer/hBloom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hBloom

Hierarchical bloom classifier for tagging text with a structured word list.

Build Status

This has prpbably been solved before by smater people than myself. Given a structured tree of categories and corresponding words/strings, hBloom will create a bloom filter for each level of depth in the data, which returns true/false for all sub categories, in the case of 'true' it will return the matching categories.

An example is available at example/index.js. The example tests 5000 tweets against 16000 tag words and takes on average 4000 milliseconds. (0.8 ms per tweet)

##Install

npm install hbloom

##Usage

var hBloom = require('../hbloom');

var myBloom = hBloom( {STRUCTURED DATA} );
var txt = "This post is about celtic and rangers, but mentions villa.";

myBloom.classifyText(txt, function(result){
	console.log(result);
});

// logs: ['football', 'celtic', 'rangers', 'aston villa']

##Methods

hBloom.classifyText( text, callback )

hBloom.classify( word );

##Structured Data?

The data passed to hBloom({DATA}) should follow the example below. Where keys are tags/categories and strings in arrays or matching words.

{
	"racing": {
		"asscot": ["asscot", "ass", "the big race"]
	},
	"football": {
		"manchester united": ["manu", "man united", "mufc", "manchester united", "manufc"],
		"aston villa": ["aston villa", "villa" , "villafc"],
		"manchester city": ["mancity", "manchester city", "cityfc", "man city", "mancityfc"],
		"scottish league": {
			"dundee united": ["dundee", "dundee united", "dundeefc"],
			"rangers": ["rangers","rangersfc"],
			"celtic": ["celtic", "celticfc"]
		}
	}
}

##License

MIT

Copyright 2012 Christopher de beer

About

Hierarchical bloom classifier for tagging text with a structured word list

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published