You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need a way to limit how much memory the toolkit will use.
The only operation that uses a lot of memory is 'sort', and I believe this is only called in two places-- when generating counts, and in ARPA generation.
In both cases, the way we can control it is by using the --buffer-size=X option to 'sort', e.g. --buffer-size=10G.
The tricky thing here is we'd like to be able to pass in a --max-memory=X option from the top-level scripts, such as train_lm.py, and have them just do the right thing, while bearing in mind that some of the scripts may invoke 'sort' multiple times in parallel in some instances. So in some instances this would involve dividing the memory requirement by a certain number, e.g. changing 100G to 25G. [you can just treat any letter at the end as an arbitrary string. please don't assume there is a letter, as a simple numeric argument can be treated as a number of bytes.].
We need a way to limit how much memory the toolkit will use.
The only operation that uses a lot of memory is 'sort', and I believe this is only called in two places-- when generating counts, and in ARPA generation.
In both cases, the way we can control it is by using the --buffer-size=X option to 'sort', e.g. --buffer-size=10G.
The tricky thing here is we'd like to be able to pass in a --max-memory=X option from the top-level scripts, such as train_lm.py, and have them just do the right thing, while bearing in mind that some of the scripts may invoke 'sort' multiple times in parallel in some instances. So in some instances this would involve dividing the memory requirement by a certain number, e.g. changing 100G to 25G. [you can just treat any letter at the end as an arbitrary string. please don't assume there is a letter, as a simple numeric argument can be treated as a number of bytes.].
@keli
Dan
The text was updated successfully, but these errors were encountered: