Skip to content

Commit

Permalink
Fix some typos in seminar & hw in week01
Browse files Browse the repository at this point in the history
  • Loading branch information
NikitaChampion committed Sep 24, 2023
1 parent 274c43c commit 4f83fb8
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 6 deletions.
4 changes: 2 additions & 2 deletions week01_embeddings/homework.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Frament of the Swadesh list for some slavic languages\n",
"### Fragment of the Swadesh list for some slavic languages\n",
"\n",
"The Swadesh list is a lexicostatistical stuff. It's named after American linguist Morris Swadesh and contains basic lexis. This list are used to define subgroupings of languages, its relatedness.\n",
"\n",
Expand Down Expand Up @@ -419,7 +419,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we are ready to make simple word-based translator: for earch word in source language in shared embedding space we find the nearest in target language.\n"
"Now we are ready to make simple word-based translator: for each word in source language in shared embedding space we find the nearest in target language.\n"
]
},
{
Expand Down
8 changes: 4 additions & 4 deletions week01_embeddings/seminar.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"source": [
"## Seminar 1: Fun with Word Embeddings (3 points)\n",
"\n",
"Today we gonna play with word embeddings: train our own little embedding, load one from gensim model zoo and use it to visualize text corpora.\n",
"Today we gonna play with word embeddings: train our own little embeddings, load one from gensim model zoo and use it to visualize text corpora.\n",
"\n",
"This whole thing is gonna happen on top of embedding dataset.\n",
"\n",
Expand Down Expand Up @@ -46,10 +46,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"__Tokenization:__ a typical first step for an nlp task is to split raw data into words.\n",
"__Tokenization:__ a typical first step for an NLP task is to split raw data into words.\n",
"The text we're working with is in raw format: with all the punctuation and smiles attached to some words, so a simple str.split won't do.\n",
"\n",
"Let's use __`nltk`__ - a library that handles many nlp tasks like tokenization, stemming or part-of-speech tagging."
"Let's use __`nltk`__ - a library that handles many NLP tasks like tokenization, stemming or part-of-speech tagging."
]
},
{
Expand Down Expand Up @@ -555,7 +555,7 @@
"* Try running TSNE on all data, not just 1000 phrases\n",
"* See what other embeddings are there in the model zoo: `gensim.downloader.info()`\n",
"* Take a look at [FastText](https://github.com/facebookresearch/fastText) embeddings\n",
"* Optimize find_nearest with locality-sensitive hashing: use [nearpy](https://github.com/pixelogik/NearPy) or `sklearn.neighbors`."
"* Optimize `find_nearest` with locality-sensitive hashing: use [nearpy](https://github.com/pixelogik/NearPy) or `sklearn.neighbors`."
]
}
],
Expand Down

0 comments on commit 4f83fb8

Please sign in to comment.