You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently we are trying to reproduce the experimental results in your paper:
Lee, Bruce W., Yoo Sung Jang, and Jason Lee. "Pushing on Text Readability Assessment: A Transformer Meets Handcrafted Linguistic Features." Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021.
Just found that the OSKF method in lingfeat returned exactly the same 16 feature names as in WBKF. Please see examples below:
from lingfeat import extractor
text = "When you see the word Amazon, what’s the first thing that springs to mind – the world’s biggest forest, the longest river or the largest internet retailer – and which do you consider most important?"
LingFeat = extractor.pass_text(text)
LingFeat.preprocess()
WBKF = LingFeat.WBKF_() # WeeBit Corpus Knowledge Features
OSKF = LingFeat.OSKF_() # OneStopEng Corpus Knowledge Features
print('WeeBit Corpus Knowledge Features:', WBKF)
print('OneStopEng Corpus Knowledge Features:', OSKF)
Hi Bruce,
The solution to this bug is easy.
In the file _AdvancedSemantic/OSKF.py form line 90 it is necessary to change:
"BRich" with "ORich", "BClar" with "OClar", "BNois" with "ONois" and "BTopc" with "OTopc"
Obviously you know this, but I wrote the solution to help others.
Thank you.
Hello, thanks for this great project!
Recently we are trying to reproduce the experimental results in your paper:
Just found that the OSKF method in lingfeat returned exactly the same 16 feature names as in WBKF. Please see examples below:
Terminal Output
According to Appendix B of the above paper, the feature names in OSKF should start with 'O', e.g. 'ORich05_S', 'ORich10_S', etc.
This bug yields 239 distinct feature names (not 255 features as introduced in the paper). Accordingly, in another open-source project of this paper:
https://github.com/brucewlee/pushingonreadability_traditional_ML
The csv files in Research_Data included only 239 linguistic features which we believe were caused by these duplicate feature names.
The text was updated successfully, but these errors were encountered: