Skip to content

Commit 3505afe

Browse files
authored
Merge pull request #28 from JuliaAI/dev
For a 0.2.2 release
2 parents e73694b + f44ad05 commit 3505afe

File tree

5 files changed

+30
-24
lines changed

5 files changed

+30
-24
lines changed

.github/codecov.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
coverage:
2+
status:
3+
project:
4+
default:
5+
threshold: 0.5%

.github/workflows/ci.yml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ jobs:
1717
fail-fast: false
1818
matrix:
1919
version:
20-
- '1.3'
20+
- '1.6'
2121
- '1' # automatically expands to the latest stable 1.x release of Julia.
2222
os:
2323
- ubuntu-latest
@@ -44,6 +44,7 @@ jobs:
4444
env:
4545
JULIA_NUM_THREADS: 2
4646
- uses: julia-actions/julia-processcoverage@v1
47-
- uses: codecov/codecov-action@v1
47+
- uses: codecov/codecov-action@v3
4848
with:
49-
file: lcov.info
49+
files: lcov.info
50+

Project.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "MLJText"
22
uuid = "5e27fcf9-6bac-46ba-8580-b5712f3d6387"
33
authors = ["Chris Alexander <[email protected]>, Anthony D. Blaom <[email protected]>"]
4-
version = "0.2.1"
4+
version = "0.2.2"
55

66
[deps]
77
CorpusLoaders = "214a0ac2-f95b-54f7-a80b-442ed9c2c9e8"
@@ -18,7 +18,7 @@ MLJModelInterface = "1.4"
1818
ScientificTypes = "2.2.2, 3"
1919
ScientificTypesBase = "2.2.0, 3"
2020
TextAnalysis = "0.7.3"
21-
julia = "1.3"
21+
julia = "1.6"
2222

2323
[extras]
2424
MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"

src/bm25_transformer.jl

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ The transformer converts a collection of documents, tokenized or pre-parsed as b
119119
words/ngrams, to a matrix of [Okapi BM25 document-word
120120
statistics](https://en.wikipedia.org/wiki/Okapi_BM25). The BM25 scoring function uses both
121121
term frequency (TF) and inverse document frequency (IDF, defined below), as in
122-
[`TfidfTransformer`](ref), but additionally adjusts for the probability that a user will
122+
[`TfidfTransformer`](@ref), but additionally adjusts for the probability that a user will
123123
consider a search result relevant based, on the terms in the search query and those in
124124
each document.
125125
@@ -137,21 +137,21 @@ In MLJ or MLJBase, bind an instance `model` to data with
137137
138138
mach = machine(model, X)
139139
140-
$DOC_IDF
140+
$DOC_TRANSFORMER_INPUTS
141141
142142
Train the machine using `fit!(mach, rows=...)`.
143143
144144
# Hyper-parameters
145145
146-
- `max_doc_freq=1.0`: Restricts the vocabulary that the transformer will consider.
147-
Terms that occur in `> max_doc_freq` documents will not be considered by the
148-
transformer. For example, if `max_doc_freq` is set to 0.9, terms that are in more than
149-
90% of the documents will be removed.
146+
- `max_doc_freq=1.0`: Restricts the vocabulary that the transformer will consider. Terms
147+
that occur in `> max_doc_freq` documents will not be considered by the transformer. For
148+
example, if `max_doc_freq` is set to 0.9, terms that are in more than 90% of the
149+
documents will be removed.
150150
151-
- `min_doc_freq=0.0`: Restricts the vocabulary that the transformer will consider.
152-
Terms that occur in `< max_doc_freq` documents will not be considered by the
153-
transformer. A value of 0.01 means that only terms that are at least in 1% of the
154-
documents will be included.
151+
- `min_doc_freq=0.0`: Restricts the vocabulary that the transformer will consider. Terms
152+
that occur in `< max_doc_freq` documents will not be considered by the transformer. A
153+
value of 0.01 means that only terms that are at least in 1% of the documents will be
154+
included.
155155
156156
- `κ=2`: The term frequency saturation characteristic. Higher values represent slower
157157
saturation. What we mean by saturation is the degree to which a term occurring extra

src/count_transformer.jl

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -94,15 +94,15 @@ Train the machine using `fit!(mach, rows=...)`.
9494
9595
# Hyper-parameters
9696
97-
- `max_doc_freq=1.0`: Restricts the vocabulary that the transformer will consider.
98-
Terms that occur in `> max_doc_freq` documents will not be considered by the
99-
transformer. For example, if `max_doc_freq` is set to 0.9, terms that are in more than
100-
90% of the documents will be removed.
101-
102-
- `min_doc_freq=0.0`: Restricts the vocabulary that the transformer will consider.
103-
Terms that occur in `< max_doc_freq` documents will not be considered by the
104-
transformer. A value of 0.01 means that only terms that are at least in 1% of the
105-
documents will be included.
97+
- `max_doc_freq=1.0`: Restricts the vocabulary that the transformer will consider. Terms
98+
that occur in `> max_doc_freq` documents will not be considered by the transformer. For
99+
example, if `max_doc_freq` is set to 0.9, terms that are in more than 90% of the
100+
documents will be removed.
101+
102+
- `min_doc_freq=0.0`: Restricts the vocabulary that the transformer will consider. Terms
103+
that occur in `< max_doc_freq` documents will not be considered by the transformer. A
104+
value of 0.01 means that only terms that are at least in 1% of the documents will be
105+
included.
106106
107107
# Operations
108108

0 commit comments

Comments
 (0)