Improve phrasing in the main article

scriptin · Nov 10, 2023 · f6894f5 · f6894f5
1 parent e23d251
commit f6894f5
Showing 1 changed file with 7 additions and 8 deletions.
diff --git a/src/pages/index.mdx b/src/pages/index.mdx
@@ -107,15 +107,14 @@ appears in ~26% of documents.
 
 ## Time distribution
 
-The style of text, grammar, vocabulary, and usage of certain kanji
-may depend on when a particular text was written. Also, texts which
+Texts from different epochs may have different kanji usage patterns
+due to differences in vocabulary and grammar rules. Also, texts which
 discuss events of a certain time period may have statistical biases,
 e.g. newspapers from 2020-2022 use COVID- and medicine-related
 words and kanji more often compared to previous years.
 
-That's why it's important to collect texts which are distributed
-across wider time periods to avoid biases and have representative
-datasets.
+It's important to collect texts which are distributed
+across wider time periods to avoid these biases.
 
 - **Aozora**: most texts are in public domain due to
   expiration of copyright terms, which is currently
@@ -153,8 +152,8 @@ The data in the old version was collected from the following sources:
 - Twitter (now knows as X)
 
 However, this first attempt lacked sufficient research and technical effort,
-and the resulting dataset had multiple issues, described in the
-[attached readme](https://github.com/scriptin/kanji-frequency/tree/master/data2015/README.md).
+and the resulting dataset had multiple issues, described in the attached
+[readme](https://github.com/scriptin/kanji-frequency/tree/master/data2015/README.md).
 
 ### Current version
 
@@ -165,7 +164,7 @@ but unfortunately has some new problems:
   - Twitter API no longer has a free tier
   - Changes in the organization management and staff layoffs at Twitter
     resulted in insufficient content moderation.
-    I wanted to avoid including any hate speech in the data
+    I preferred to avoid including any hate speech in the data
 - **News dataset is much smaller**:
   - Most news on popular websites are now behind paywalls,
     making it impractical and illegal to create crawlers/scrapers