Skip to content
This repository has been archived by the owner on Jul 9, 2020. It is now read-only.

Imbalance dataset #4

Open
Huy-Ngo opened this issue Nov 20, 2019 · 4 comments
Open

Imbalance dataset #4

Huy-Ngo opened this issue Nov 20, 2019 · 4 comments

Comments

@Huy-Ngo
Copy link
Collaborator

Huy-Ngo commented Nov 20, 2019

The dataset has more train example with false value than true. Modify the dataset to resolve this.

@trahoa
Copy link
Collaborator

trahoa commented Nov 20, 2019

One solution is to double true examples: the json2tsv script should make a copy of any true example and shuffle it into a random position. This is for the training set only!

@Huy-Ngo
Copy link
Collaborator Author

Huy-Ngo commented Nov 21, 2019

I've just done that. The training dataset now has 6350 trues and 6835 falses. That seems balanced enough, so I'm closing this issue

@Huy-Ngo Huy-Ngo closed this as completed Nov 21, 2019
@McSinyx
Copy link
Owner

McSinyx commented Nov 22, 2019

Since no benchmark done on this naïve solution, I'm reopening this just to discuss better alternatives if necessary.

@McSinyx McSinyx reopened this Nov 22, 2019
@McSinyx
Copy link
Owner

McSinyx commented Nov 26, 2019

Apparently it is better handled by cbc4a09.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants