Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple dev sets. #76

Open
francisr opened this issue Oct 12, 2016 · 5 comments
Open

Multiple dev sets. #76

francisr opened this issue Oct 12, 2016 · 5 comments

Comments

@francisr
Copy link
Contributor

francisr commented Oct 12, 2016

What should I do if I have multiple dev sets of unequal sizes, but I want them to contribute equally to the optimisation? So that if one set is 10 times bigger I don't want it to be 10 times more important than the other ones .

@danpovey
Copy link
Owner

You could just repeat the more-important dev data.

On Wed, Oct 12, 2016 at 5:59 AM, Rémi Francis [email protected]
wrote:

What should I do if I have multiple dev sets of unequal sizes, but I want
them to contribute equally to the optimisation? So that if one set is 10
times bigger I don't want it to be 10 times as important.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#76, or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu5bJyIVDSiDcNJmC_U3Xsc8GYGM-ks5qzK-QgaJpZM4KUk7i
.

@francisr
Copy link
Contributor Author

Not the most convenient solution when I have many sources with various amount of data, but it'll probably do.
What is the impact of the dev set on the final model? As in, for the same training text, how much do you expect perplexities to vary with different dev sets?

@danpovey
Copy link
Owner

The dev set will definitely affect the interpolation weights of the
different data sources which will make a difference in some applications.
And of course the perplexity on the dev set itself will be highly dependent
on the nature of the dev set (word length, domain, etc.)

On Thu, Oct 13, 2016 at 6:54 AM, Rémi Francis [email protected]
wrote:

Not the most convenient solution when I have many sources with various
amount of data, but it'll probably do.

What is the impact of the dev set on the final model? As in, for the same
training text, how much do you expect perplexities to vary with different
dev sets?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#76 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu5a9DmyGocZVOIo_-VrcR3ml7yP2ks5qzg3IgaJpZM4KUk7i
.

@francisr
Copy link
Contributor Author

What if there is only one data source?

On 13 October 2016 at 17:38, Daniel Povey [email protected] wrote:

The dev set will definitely affect the interpolation weights of the
different data sources which will make a difference in some applications.
And of course the perplexity on the dev set itself will be highly dependent
on the nature of the dev set (word length, domain, etc.)

On Thu, Oct 13, 2016 at 6:54 AM, Rémi Francis [email protected]
wrote:

Not the most convenient solution when I have many sources with various
amount of data, but it'll probably do.

What is the impact of the dev set on the final model? As in, for the same
training text, how much do you expect perplexities to vary with different
dev sets?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#76 (comment),
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu5a9DmyGocZVOIo_-
VrcR3ml7yP2ks5qzg3IgaJpZM4KUk7i>

.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#76 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AB-8ZI34iXHD27AG5Apd4u3nxBIzdASUks5qzl50gaJpZM4KUk7i
.

@danpovey
Copy link
Owner

If there is one data source I doubt very much that the dev set would make
any difference.

On Thu, Oct 13, 2016 at 12:41 PM, Rémi Francis [email protected]
wrote:

What if there is only one data source?

On 13 October 2016 at 17:38, Daniel Povey [email protected]
wrote:

The dev set will definitely affect the interpolation weights of the
different data sources which will make a difference in some applications.
And of course the perplexity on the dev set itself will be highly
dependent
on the nature of the dev set (word length, domain, etc.)

On Thu, Oct 13, 2016 at 6:54 AM, Rémi Francis [email protected]
wrote:

Not the most convenient solution when I have many sources with various
amount of data, but it'll probably do.

What is the impact of the dev set on the final model? As in, for the
same
training text, how much do you expect perplexities to vary with
different
dev sets?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#76 (comment),
or mute
the thread
<https://github.com/notifications/unsubscribe-
auth/ADJVu5a9DmyGocZVOIo_-
VrcR3ml7yP2ks5qzg3IgaJpZM4KUk7i>

.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#76 (comment),
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AB-
8ZI34iXHD27AG5Apd4u3nxBIzdASUks5qzl50gaJpZM4KUk7i>
.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#76 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu9ux6iWyLuEfdqbRTXPjPqPOqQh4ks5qzl9IgaJpZM4KUk7i
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants