Skip to content

Commit 20dcb51

Browse files
Rebuilt docs
1 parent 422d00c commit 20dcb51

File tree

11 files changed

+51
-1
lines changed

11 files changed

+51
-1
lines changed
7 Bytes
Binary file not shown.
-832 Bytes
Binary file not shown.
-798 Bytes
Binary file not shown.

docs/_build/html/_sources/api_reference.rst.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,6 @@ API reference
55

66
.. automodule:: neofuzz.process
77
:members:
8+
9+
.. automodule:: neofuzz.tokenization
10+
:members:

docs/_build/html/_sources/metadata.rst.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Sometimes it is, however beneficial to be able to access metadata about the entr
77
The most sensible way to handle this is to store your metadata in a table that is in the same order as the corpus.
88

99
.. code-block:: python
10+
1011
import pandas as pd
1112
1213
corpus: list[str] = [...]
@@ -19,6 +20,7 @@ The most sensible way to handle this is to store your metadata in a table that i
1920
Then you can use the query() method to retrieve indices and distances instead of passages:
2021

2122
.. code-block:: python
23+
2224
from neofuzz import Process
2325
2426
process = Process(...)

docs/_build/html/_sources/persistence.rst.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Neofuzz can serialize indexed Process objects for you using `joblib`.
77
You can save indexed processes like so:
88

99
.. code-block:: python
10+
1011
from neofuzz import char_ngram_process
1112
from neofuzz.tokenization import SubWordVectorizer
1213
@@ -19,6 +20,7 @@ You can save indexed processes like so:
1920
And then load them in a production environment:
2021

2122
.. code-block:: python
23+
2224
from neofuzz import Process
2325
2426
process = Process.from_disk("process.joblib")

docs/_build/html/metadata.html

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,7 +216,32 @@ <h1>Handling Metadata<a class="headerlink" href="#handling-metadata" title="Perm
216216
<p>Neofuzz makes it easy to do fuzzy search in text corpora.
217217
Sometimes it is, however beneficial to be able to access metadata about the entries retrieved in fuzzy search.</p>
218218
<p>The most sensible way to handle this is to store your metadata in a table that is in the same order as the corpus.</p>
219+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
220+
221+
<span class="n">corpus</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="o">...</span><span class="p">]</span>
222+
<span class="n">metadata</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
223+
224+
<span class="c1"># The tenth element in both corresponds to the same entry</span>
225+
<span class="n">tenth_text</span> <span class="o">=</span> <span class="n">corpus</span><span class="p">[</span><span class="mi">9</span><span class="p">]</span>
226+
<span class="n">tenth_metadata_entry</span> <span class="o">=</span> <span class="n">metadata</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="mi">9</span><span class="p">]</span>
227+
</pre></div>
228+
</div>
219229
<p>Then you can use the query() method to retrieve indices and distances instead of passages:</p>
230+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">neofuzz</span> <span class="kn">import</span> <span class="n">Process</span>
231+
232+
<span class="n">process</span> <span class="o">=</span> <span class="n">Process</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
233+
<span class="n">process</span><span class="o">.</span><span class="n">index</span><span class="p">(</span><span class="n">corpus</span><span class="p">)</span>
234+
235+
<span class="c1"># Both results will be arrays shaped (len(search_terms), limit)</span>
236+
<span class="n">indices</span><span class="p">,</span> <span class="n">distances</span> <span class="o">=</span> <span class="n">process</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="n">search_terms</span><span class="o">=</span><span class="p">[</span><span class="s2">&quot;Search term 1&quot;</span><span class="p">,</span> <span class="s2">&quot;Search term 2&quot;</span><span class="p">],</span> <span class="n">limit</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
237+
238+
<span class="n">results_for_term1</span> <span class="o">=</span> <span class="p">[</span><span class="n">corpus</span><span class="p">[</span><span class="n">idx</span><span class="p">]</span> <span class="k">for</span> <span class="n">idx</span> <span class="ow">in</span> <span class="n">indices</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span>
239+
<span class="n">metadata_for_term1</span> <span class="o">=</span> <span class="n">metadata</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="n">indices</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span>
240+
241+
<span class="n">results_for_term2</span> <span class="o">=</span> <span class="p">[</span><span class="n">corpus</span><span class="p">[</span><span class="n">idx</span><span class="p">]</span> <span class="k">for</span> <span class="n">idx</span> <span class="ow">in</span> <span class="n">indices</span><span class="p">[</span><span class="mi">1</span><span class="p">]]</span>
242+
<span class="n">metadata_for_term2</span> <span class="o">=</span> <span class="n">metadata</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="n">indices</span><span class="p">[</span><span class="mi">1</span><span class="p">]]</span>
243+
</pre></div>
244+
</div>
220245
</section>
221246

222247
</article>

docs/_build/html/persistence.html

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,7 +216,21 @@ <h1>Persistence<a class="headerlink" href="#persistence" title="Permalink to thi
216216
<p>You might want to persist processes to disk and reuses them in production pipelines.
217217
Neofuzz can serialize indexed Process objects for you using <cite>joblib</cite>.</p>
218218
<p>You can save indexed processes like so:</p>
219+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">neofuzz</span> <span class="kn">import</span> <span class="n">char_ngram_process</span>
220+
<span class="kn">from</span> <span class="nn">neofuzz.tokenization</span> <span class="kn">import</span> <span class="n">SubWordVectorizer</span>
221+
222+
<span class="n">process</span> <span class="o">=</span> <span class="n">char_ngram_process</span><span class="p">()</span>
223+
<span class="n">process</span><span class="o">.</span><span class="n">index</span><span class="p">(</span><span class="n">corpus</span><span class="p">)</span>
224+
225+
<span class="n">process</span><span class="o">.</span><span class="n">to_disk</span><span class="p">(</span><span class="s2">&quot;process.joblib&quot;</span><span class="p">)</span>
226+
</pre></div>
227+
</div>
219228
<p>And then load them in a production environment:</p>
229+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">neofuzz</span> <span class="kn">import</span> <span class="n">Process</span>
230+
231+
<span class="n">process</span> <span class="o">=</span> <span class="n">Process</span><span class="o">.</span><span class="n">from_disk</span><span class="p">(</span><span class="s2">&quot;process.joblib&quot;</span><span class="p">)</span>
232+
</pre></div>
233+
</div>
220234
</section>
221235

222236
</article>

docs/_build/html/searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/metadata.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Sometimes it is, however beneficial to be able to access metadata about the entr
77
The most sensible way to handle this is to store your metadata in a table that is in the same order as the corpus.
88

99
.. code-block:: python
10+
1011
import pandas as pd
1112
1213
corpus: list[str] = [...]
@@ -19,6 +20,7 @@ The most sensible way to handle this is to store your metadata in a table that i
1920
Then you can use the query() method to retrieve indices and distances instead of passages:
2021

2122
.. code-block:: python
23+
2224
from neofuzz import Process
2325
2426
process = Process(...)

0 commit comments

Comments
 (0)