Skip to content
This repository has been archived by the owner on Oct 8, 2019. It is now read-only.

Binarize labels for Positive Negative instances

Makoto YUI edited this page Mar 16, 2016 · 5 revisions

Expanding numeric labels to actual count of samples can contribute to accuracy improvement in some cases. binarize_label explode a record that keeps the count of positive/negative labeled samples into corresponding actual count of samples. For example,

positive negative features
2 3 "[a:1, b:2]"

is converted into

features label
"[a:1, b:2]" 0
"[a:1, b:2]" 0
"[a:1, b:2]" 1
"[a:1, b:2]" 1
"[a:1, b:2]" 1

Caution: Don't forget to shuffle converted training instances in a random order, e.g., by CLUSTER BY rand().

binarize_label(int/long positive, int/long negative, ANY arg1, ANY arg2, ..., ANY argN) returns (ANY arg1, ANY arg2, ..., ANY argN, int label) where label is 0 or 1.

Clone this wiki locally