Fix some typos in seminar & hw in week01

yandexdataschool · Sep 24, 2023 · 4f83fb8 · 4f83fb8
1 parent 274c43c
commit 4f83fb8
Show file tree

Hide file tree

Showing 2 changed files with 6 additions and 6 deletions.
diff --git a/week01_embeddings/homework.ipynb b/week01_embeddings/homework.ipynb
@@ -38,7 +38,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Frament of the Swadesh list for some slavic languages\n",
+    "### Fragment of the Swadesh list for some slavic languages\n",
     "\n",
     "The Swadesh list is a lexicostatistical stuff. It's named after American linguist Morris Swadesh and contains basic lexis. This list are used to define subgroupings of languages, its relatedness.\n",
     "\n",
@@ -419,7 +419,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now we are ready to make simple word-based translator: for earch word in source language in shared embedding space we find the nearest in target language.\n"
+    "Now we are ready to make simple word-based translator: for each word in source language in shared embedding space we find the nearest in target language.\n"
    ]
   },
   {

diff --git a/week01_embeddings/seminar.ipynb b/week01_embeddings/seminar.ipynb
@@ -6,7 +6,7 @@
    "source": [
     "## Seminar 1: Fun with Word Embeddings (3 points)\n",
     "\n",
-    "Today we gonna play with word embeddings: train our own little embedding, load one from   gensim model zoo and use it to visualize text corpora.\n",
+    "Today we gonna play with word embeddings: train our own little embeddings, load one from gensim model zoo and use it to visualize text corpora.\n",
     "\n",
     "This whole thing is gonna happen on top of embedding dataset.\n",
     "\n",
@@ -46,10 +46,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "__Tokenization:__ a typical first step for an nlp task is to split raw data into words.\n",
+    "__Tokenization:__ a typical first step for an NLP task is to split raw data into words.\n",
     "The text we're working with is in raw format: with all the punctuation and smiles attached to some words, so a simple str.split won't do.\n",
     "\n",
-    "Let's use __`nltk`__ - a library that handles many nlp tasks like tokenization, stemming or part-of-speech tagging."
+    "Let's use __`nltk`__ - a library that handles many NLP tasks like tokenization, stemming or part-of-speech tagging."
    ]
   },
   {
@@ -555,7 +555,7 @@
     "* Try running TSNE on all data, not just 1000 phrases\n",
     "* See what other embeddings are there in the model zoo: `gensim.downloader.info()`\n",
     "* Take a look at [FastText](https://github.com/facebookresearch/fastText) embeddings\n",
-    "* Optimize find_nearest with locality-sensitive hashing: use [nearpy](https://github.com/pixelogik/NearPy) or `sklearn.neighbors`."
+    "* Optimize `find_nearest` with locality-sensitive hashing: use [nearpy](https://github.com/pixelogik/NearPy) or `sklearn.neighbors`."
    ]
   }
  ],