Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add absolute confidence metric based on unique and most common ngrams #419

Merged
merged 93 commits into from
Feb 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
137e71e
Enhance model for Afrikaans
pemistahl Dec 31, 2024
b758b54
Enhance model for Arabic
pemistahl Dec 31, 2024
2f1d525
Enhance model for Azerbaijani
pemistahl Dec 31, 2024
ac5feab
Enhance model for Belarusian
pemistahl Dec 31, 2024
f420a7b
Enhance model for Bulgarian
pemistahl Dec 31, 2024
2b5504f
Enhance model for Bengali
pemistahl Dec 31, 2024
5b98117
Enhance model for Bosnian
pemistahl Dec 31, 2024
48447ad
Enhance model for Catalan
pemistahl Dec 31, 2024
efcac94
Enhance model for Czech
pemistahl Dec 31, 2024
21dc1bb
Enhance model for Welsh
pemistahl Dec 31, 2024
2af82df
Enhance model for Danish
pemistahl Dec 31, 2024
4fe0221
Enhance model for German
pemistahl Dec 31, 2024
87e7b85
Enhance model for Greek
pemistahl Dec 31, 2024
39e0b8d
Enhance model for English
pemistahl Dec 31, 2024
f151d82
Enhance model for Esperanto
pemistahl Dec 31, 2024
7017457
Enhance model for Spanish
pemistahl Dec 31, 2024
cd05e25
Enhance model for Estonian
pemistahl Dec 31, 2024
519f6b2
Enhance model for Basque
pemistahl Dec 31, 2024
165f184
Enhance model for Persian
pemistahl Dec 31, 2024
60427fa
Enhance model for Finnish
pemistahl Dec 31, 2024
5d10b71
Enhance model for French
pemistahl Dec 31, 2024
ca88c95
Enhance model for Irish
pemistahl Dec 31, 2024
bf76575
Enhance model for Gujarati
pemistahl Dec 31, 2024
ffed05b
Enhance model for Hebrew
pemistahl Dec 31, 2024
00cd1a2
Enhance model for Hindi
pemistahl Dec 31, 2024
d2ea706
Enhance model for Croatian
pemistahl Dec 31, 2024
bb5a022
Enhance model for Hungarian
pemistahl Dec 31, 2024
297c112
Enhance model for Armenian
pemistahl Dec 31, 2024
2138f8a
Enhance model for Indonesian
pemistahl Dec 31, 2024
44970d9
Enhance model for Icelandic
pemistahl Dec 31, 2024
f1e8cb7
Enhance model for Italian
pemistahl Dec 31, 2024
743a741
Enhance model for Japanese
pemistahl Dec 31, 2024
ad26100
Enhance model for Georgian
pemistahl Dec 31, 2024
e5a3eac
Enhance model for Kazakh
pemistahl Dec 31, 2024
f0e7ccd
Enhance model for Korean
pemistahl Dec 31, 2024
570b995
Enhance model for Latin
pemistahl Dec 31, 2024
5eb5227
Enhance model for Ganda
pemistahl Dec 31, 2024
0f47f02
Enhance model for Lithuanian
pemistahl Dec 31, 2024
e193b8f
Enhance model for Latvian
pemistahl Dec 31, 2024
f2eb156
Enhance model for Maori
pemistahl Dec 31, 2024
f3e926c
Enhance model for Macedonian
pemistahl Dec 31, 2024
14d5b18
Enhance model for Mongolian
pemistahl Dec 31, 2024
7eb0da9
Enhance model for Marathi
pemistahl Dec 31, 2024
3e84974
Enhance model for Malay
pemistahl Dec 31, 2024
8ce97fe
Enhance model for Bokmal
pemistahl Dec 31, 2024
8b3ca7e
Enhance model for Dutch
pemistahl Dec 31, 2024
d4104ec
Enhance model for Nynorsk
pemistahl Dec 31, 2024
0848e05
Enhance model for Punjabi
pemistahl Dec 31, 2024
e38bd2a
Enhance model for Polish
pemistahl Dec 31, 2024
8c256c4
Enhance model for Portuguese
pemistahl Dec 31, 2024
ce4bd6b
Enhance model for Romanian
pemistahl Dec 31, 2024
3ab0235
Enhance model for Russian
pemistahl Dec 31, 2024
e84795e
Enhance model for Slovak
pemistahl Dec 31, 2024
864166d
Enhance model for Slovene
pemistahl Dec 31, 2024
d793e0f
Enhance model for Shona
pemistahl Dec 31, 2024
c48302c
Enhance model for Somali
pemistahl Dec 31, 2024
f70ed90
Enhance model for Albanian
pemistahl Dec 31, 2024
27f1335
Enhance model for Serbian
pemistahl Dec 31, 2024
2c8315f
Enhance model for Sotho
pemistahl Dec 31, 2024
24fd288
Enhance model for Swedish
pemistahl Dec 31, 2024
42e2715
Enhance model for Swahili
pemistahl Dec 31, 2024
a3ce31c
Enhance model for Tamil
pemistahl Dec 31, 2024
d806ee6
Enhance model for Telugu
pemistahl Dec 31, 2024
28e43cc
Enhance model for Thai
pemistahl Dec 31, 2024
18d0cfd
Enhance model for Tagalog
pemistahl Dec 31, 2024
bb8ac34
Enhance model for Tswana
pemistahl Dec 31, 2024
12baa1b
Enhance model for Turkish
pemistahl Dec 31, 2024
226deca
Enhance model for Tsonga
pemistahl Dec 31, 2024
5c9948a
Enhance model for Ukrainian
pemistahl Dec 31, 2024
ff2489f
Enhance model for Urdu
pemistahl Dec 31, 2024
9dae0eb
Enhance model for Vietnamese
pemistahl Dec 31, 2024
7af5b6c
Enhance model for Xhosa
pemistahl Dec 31, 2024
d4ddc2c
Enhance model for Yoruba
pemistahl Dec 31, 2024
36fbb7c
Enhance model for Chinese
pemistahl Dec 31, 2024
e7f7cd4
Enhance model for Zulu
pemistahl Dec 31, 2024
30c67ee
Add `Language::all_with_single_unique_script()`
pemistahl Dec 31, 2024
347aa3e
Remove struct `TestDataLanguageModel`
pemistahl Jan 2, 2025
449bb70
Refactor language model serialization
pemistahl Jan 3, 2025
93d53a0
Replace `RwLock<HashMap>` with `DashMap`
pemistahl Jan 9, 2025
a3e410c
Implement logic in detector
pemistahl Jan 14, 2025
4061396
Apply changes to builder
pemistahl Jan 14, 2025
c0c4367
Fix WASM tests
pemistahl Jan 15, 2025
8f3968b
Fix Python tests
pemistahl Jan 16, 2025
a1f839a
Refactor accuracy reports script
pemistahl Jan 28, 2025
623eafa
Merge branch 'main' into issue-413-absolute-confidence-metric
pemistahl Jan 28, 2025
d15ade9
Update lock file
pemistahl Jan 28, 2025
bb09ab3
Add single-language detectors to accuracy reports script
pemistahl Jan 30, 2025
25e9ee0
Add clap to accuracy reports script
pemistahl Feb 5, 2025
1b3ac19
Merge remote-tracking branch 'origin/main' into issue-413-absolute-co…
pemistahl Feb 5, 2025
cdf29ab
Update lock file
pemistahl Feb 5, 2025
6da6755
Sort dataframe correctly
pemistahl Feb 10, 2025
266569e
Extend GitHub workflow with binary tests
pemistahl Feb 10, 2025
2666f23
Simplify GitHub workflow
pemistahl Feb 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
16 changes: 8 additions & 8 deletions .github/workflows/rust-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,18 +70,18 @@ jobs:
target/
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}

- name: Build library in debug mode
run: cargo build --locked
- name: Build and test library
run: cargo test --locked

- name: Build binary in debug mode
run: cargo build --locked --bin accuracy_reports --features accuracy-reports
- name: Build and test binary
run: cargo test --locked --bin accuracy_reports --features accuracy-reports

- name: Test in debug mode
run: cargo test

- name: Check Clippy lints
- name: Check Clippy lints in library
run: cargo clippy -- -Dwarnings

- name: Check Clippy lints in binary
run: cargo clippy --bin accuracy_reports --features accuracy-reports -- -Dwarnings

wasm-build:
name: WASM Build
needs: rust-build
Expand Down
Loading