XLNet cached memory/recurrence on segments for fine-tuning #1474
Unanswered
PaulTran47
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there,
Apologies if this is a dumb question, but I was curious if the usage of XLNet for fine-tuning utilises the cached memory feature for long sequences.
My preliminary understanding is that XLNet is able to get around the issue of fixed segment lengths that BERT has, and that this discussion in this repo seems to show pretraining was completed with cached memory.
But when using XLNet through simpletransformers, is there a particular setting I need to set for cached memory to be used for fine-tuning? Or is it already done by default?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions