Open
Description
I've been reading your paper, interesting work.
I have a question about how you compute perplexities, especially over datasets that are already tokenized (e.g., wikitext-103). I understand that your encoding can assign probabilities to any string, but I'd expect the LM to do poorly when fed pre-tokenized input. For example, the tokenized wikitext-103 input looks like M @-@ 82 begins at a junction with M @-@ 120 and B @-@ 96 west of Fremont .
How do you report perplexity in this case?