Code für das Text Engineering Seminar (siehe Seminarplan )
Inhalt | Ressourcen/Dependencies | Literatur | |
basic | Korpus, Lineare Suche, Term-Dokument-Matrix | Shakespeare | IIR Kap. 1 |
boole | Invertierter Index, Listen-Intersection, Vorverarbeitung, Positional Index, PositionalIntersect | IIR Kap. 1 + 2 | |
ranked | Ranked Retrieval: Termgewichtung, Vector Space Model | IIR Kap. 6 + 7 | |
evaluation | Evaluation: Precision, Recall, F-Maß | IIR Kap. 8 | |
lucene | Lucene: Indexer und Searcher | lucene-core, lucene-queryparser, lucene-analyzers-common | Lucene in Action |
web | Crawler, WebDocument | commons-io, nekohtml, jrobotx | IIR Kap. 19 + 20 |
Inhalt | Ressourcen/Dependencies | Literatur | |
document | Document, Topics, TermIndex, FeatureVector | ||
corpus | Korpus, DB, DocumentIndex, Crawler | db4o, crawler (siehe package ir.web ) | |
classification | TextClassifier, Naive Bayes |