category

Categorical transformation for data science

Installation

pip install works for this library.

pip install category

Single Category

# using python core
>>> from category import Category
# using rust core, faster
>>> from category.fast import Category 
>>> book = Category(['a', 'b', 'c', 'Category_d', 'e', 'f', 'g', 'h', 'i', 'j'], pad_mst = False)
>>> book.i2c[2]
'c'

>>> book.c2i[['Category_d','f']]
array([3, 5])

You can set pad_mst to True to handle the missing token

# using python core
>>> from category import Category 
# using rust core, faster
>>> from category.fast import Category 
>>> book = Category(['a', 'b', 'c', 'Category_d', 'e', 'f', 'g', 'h', 'i', 'j'], pad_mst = True)
>>> book.i2c[2] # the 1st token is the missing token, not 'a' any more
'b'
>>> book.c2i[['Stranger','Category_d','Unknown','f']]
array([0, 4, 0, 6])

Multi-Category

# using python core
>>> from category import (Category, MultiCategory)
# using rust core, faster
>>> from category.fast import (Category, MultiCategory)
>>> cates = list(f"category{i}" for i in range(1000))
>>> multi_cate = MultiCategory(Category(cates, pad_mst = True))
>>> multi_cate.string_to_index("category42, category108")
array([42, 108])

You can also try to convert a list of strings, containing multicategorical info (which the data input is frequently used in tabular data), to nhot encoded array, and back

>>> nhot = multi_cate.batch_strings_to_nhot(["category42, category108","category999"])
>>> multi_cate.nhot_to_list(nhot)[0]
["category42", "category108"]

Performance

The running speed of this library mostly depends on python dictionary and numpy operations. Though python is a 'slow' language, such application is pretty fast, our own rust alternative is faster, by not by a huge lead

Here we compare the this library with the Rust implementation

References

GitHub
PyPI package
Rust implementation
Used in Tai-Chi engine, a verstile user-friendly deep learning library

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
category		category
nbs		nbs
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements-test.txt		requirements-test.txt
settings.ini		settings.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

category

Installation

Single Category

Multi-Category

Performance

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

raynardj/category

Folders and files

Latest commit

History

Repository files navigation

category

Installation

Single Category

Multi-Category

Performance

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages