Skip to content

Correct corpus size to 450 texts per corpus (724 unique), not 400#205

Merged
DanilSko merged 1 commit into
mainfrom
fix/corpus-size-400-to-450
Jun 29, 2026
Merged

Correct corpus size to 450 texts per corpus (724 unique), not 400#205
DanilSko merged 1 commit into
mainfrom
fix/corpus-size-400-to-450

Conversation

@DanilSko

Copy link
Copy Markdown
Collaborator

Each of the two corpora contains 450 texts (50 per decade × 9 decades, 1810s–1890s), confirmed by running the filtering notebook: Korpus I = 450, Korpus II = 450, overlap = 176 → 724 unique texts — matching the "724" figure already stated in the semantic-field analysis.

  • analysis.md: "pro Korpus 400 Datenpunkte" → 450 (one point per text).
  • nlp-annotation.ipynb: the annotation-time estimate said "jeweils 400 Texte" / "800 Texten". Base it on the 724 unique texts actually annotated (the two samples overlap, so the union is annotated once): 724 × 15 s ≈ 10.860 s ≈ 181 min, so the "~3 Stunden" conclusion holds.

Each of the two corpora contains 450 texts (50 per decade × 9 decades,
1810s–1890s), confirmed by running the filtering notebook: Korpus I = 450,
Korpus II = 450, overlap = 176 → 724 unique texts — matching the "724"
figure already stated in the semantic-field analysis.

- analysis.md: "pro Korpus 400 Datenpunkte" → 450 (one point per text).
- nlp-annotation.ipynb: the annotation-time estimate said "jeweils 400
  Texte" / "800 Texten". Base it on the 724 unique texts actually
  annotated (the two samples overlap, so the union is annotated once):
  724 × 15 s ≈ 10.860 s ≈ 181 min, so the "~3 Stunden" conclusion holds.
@DanilSko DanilSko merged commit e955826 into main Jun 29, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant