Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training numbers #2

Open
impactcolor opened this issue Oct 18, 2017 · 14 comments
Open

Training numbers #2

impactcolor opened this issue Oct 18, 2017 · 14 comments

Comments

@impactcolor
Copy link

This is probably outside the scope of the "issues" but figure I'd ask.
I notice it doesn't take numbers. Is there away to add numbers to the xml data sets so it can also do numbers?

@Grzego
Copy link
Owner

Grzego commented Oct 18, 2017

You should be able to generate numbers like:

python generate.py --text="1 2 3 4 5 " --noinfo --bias=4.

although the quality will probably be quite bad (too little examples in dataset).

You can add your own examples in .xml format but you will have to match them to those already in dataset (content should contain tags like: <Transcription>, <Text> and <StrokeSet>, structured like in dataset).

Alternatively if you have data with consecutive points representing how to draw numbers (with labels) you could create your own dataset.

So depending on format of your dataset it might be easier or harder. :)

@impactcolor
Copy link
Author

I'm really new to this so I'm not sure how to go about creating a dataset. Do you have any articles or direction you can point me to?

@Grzego
Copy link
Owner

Grzego commented Oct 20, 2017

Sorry for the delay. I get the feeling you have no data, which is problematic. Could you please elaborate a little bit more on what you are trying to achieve? :)

@impactcolor
Copy link
Author

impactcolor commented Oct 20, 2017 via email

@Grzego
Copy link
Owner

Grzego commented Oct 21, 2017

Ok, is this dataset publicly available? I can look into it to see if there is a way to make it compatible with my code. :)

@impactcolor
Copy link
Author

impactcolor commented Oct 21, 2017 via email

@Grzego
Copy link
Owner

Grzego commented Oct 23, 2017

Unfortunatelly, those datasets represent numbers as images. For handwriting generation you would need to have list of consecutive points showing how a digit is written. So those datasets cannot be used here.

@impactcolor
Copy link
Author

impactcolor commented Oct 23, 2017 via email

@Grzego
Copy link
Owner

Grzego commented Oct 23, 2017

This one might work. :) Can you give some examples of sequences you want to generate? I just want to figure out what kind of augmentation to dataset might be needed.

@impactcolor
Copy link
Author

impactcolor commented Oct 23, 2017 via email

@Grzego
Copy link
Owner

Grzego commented Nov 8, 2017

Sorry for very late response. I tried this dataset and unfortunately it doesn't work well :/ The results are even worse than with original IAM dataset. If by any chance I find better dataset for this task I will post it here.

@impactcolor
Copy link
Author

impactcolor commented Nov 8, 2017 via email

@Grzego
Copy link
Owner

Grzego commented Dec 8, 2017

Well it's been a while, but I was kind of interested in this problem and created MNIST handwriting dataset. If you still need to generate numbers you may find it useful. One simple solution is to just pick needed digits from this dataset and concatenate them together. :)

@impactcolor
Copy link
Author

@Grzego THANK YOU!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants