Skip to content

Commit

Permalink
correcting mistake
Browse files Browse the repository at this point in the history
  • Loading branch information
todd-cook committed Dec 17, 2021
1 parent 0ce9e71 commit 3af08ba
Showing 1 changed file with 5 additions and 13 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Closely inspecting the DataFrame reveals that some rows are near duplicates with differing amounts in the occurence counts and description fields. We wish to retain the higher of the two occurence counts, and otherwise colasce multiple rows."
"## Closely inspecting the DataFrame reveals that some rows are near duplicates with differing amounts in the occurence counts and description fields. We wish to retain the higher of the two occurence counts, and otherwise coalesce multiple rows."
]
},
{
Expand Down Expand Up @@ -831,7 +831,6 @@
"source": [
"# Strategies for Deciding Which Items to Label\n",
"* random selections\n",
"* Kmeans clustering (on embeddings)\n",
"* suffix analysis"
]
},
Expand All @@ -850,19 +849,12 @@
" fout.write(f'\"{word}\",\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## KMeans\n",
"KMeans clustering can increase the chances that you'll a representative sampling by selecting from clusters in the feature space. This is very helpful with multidimensional data. However, with single string occupation labeling such as we are doing, sampling from character features--such as via suffix analysis, is likely more helpful."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Suffix Analysis\n",
"With single string occupation labeling such as we are doing, sampling from character features--such as via suffix analysis, is likely helpful.\n",
"A large amount of domain information is typically encoded into the suffix of word. Let's group by the word endings and take a look."
]
},
Expand Down Expand Up @@ -2021,7 +2013,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -2035,9 +2027,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
}

0 comments on commit 3af08ba

Please sign in to comment.