Save average document length after fitting #1

dunefox · 2022-02-10T08:20:44Z

I think avgdl should be saved as an attribute after fitting so it's not estimated again if transform is called for one document instead of the 'training' corpus.

So

fit(X).transform(X)

makes sense because all documents in X use the same avgdl but

fit(X).transform(X)
transform(other_document)

then estimates avgdl for this document alone again.

arosh · 2022-02-10T08:34:00Z

You are completely right. It is terrible that this issue has been overlooked for more than several years...

arosh added the bug label Feb 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save average document length after fitting #1

Save average document length after fitting #1

dunefox commented Feb 10, 2022

arosh commented Feb 10, 2022

Save average document length after fitting #1

Save average document length after fitting #1

Comments

dunefox commented Feb 10, 2022

arosh commented Feb 10, 2022