Course materials (tools) for an introduction into NLP for linguists.
- portable tools (we thus aim for a Docker container)
- prioritize didactics, not performance (we don't aim for SOTA solutions, but for those that relate to common knowledge of linguists, in particular, we have a substantial share of symbolic techniques)
- don't address programmers (we don't require people to have even rudimentary programming skills before the middle of the course)
- don't overload (one tool for one level of description)
- crawling: wget
- converters: pandoc, etc.
- preprocessing with regular expressions: just use
perl -pe "..."
? - XML (xmllint, xsltproc)
- SFST (morphology): documentation is mediocre, but SFST is quite intuitive to linguists, if the notation is understood
- CFG (phrase structure parsing): a light wrapper for the NLTK
nltk.parse
module - Parse (dependency parsing): UDpipe v.1
- ?PyTorch
- SciKit Learn?, cf. https://www.datacamp.com/tutorial/svm-classification-scikit-learn-python
- ?CoNLL-Merge
- ?HMMs (morphosyntax), HMMLearn? (older SciKit Learn HMM deprecated), maybe NLTK?, vgl. https://spotintelligence.com/2023/01/05/hidden-markov-model-hmm-nlp/, https://www.inf.ed.ac.uk/teaching/courses/fnlp/Tutorials/3_HMMs/lab3.pdf, https://www.kaggle.com/code/akshat0007/parts-of-speech-tagging-using-hmm, http://damir.cavar.me/pynotebooks/Python_Tutorial_HMM.html