AltMorph is a tool for expanding Norwegian text by finding morphological alternatives for each word. It combines the Ordbank API with NLP techniques to provide alternatives that fit the surrounding context.
- π― Context-sensitive filtering: Uses BERT-based acceptability scoring for ambiguous cases
- π Lemma coverage: Finds morphological forms across multiple lemmas
- π Position-specific analysis: Looks at each word in its syntactic context
- β‘ Caching: Persistent file-based caching to improve performance
- π£οΈ Multiple verbosity levels: From silent operation to detailed pipeline insights
- π Language support: Norwegian BokmΓ₯l (
nob
) and Nynorsk (nno
) - π§ POS-aware: Uses NbAiLab BERT models for part-of-speech tagging
- π Parallel processing: Runs concurrent API calls
- Python 3.8+
- Ordbank API key (free registration at Ordbank)
pip install -r requirements.txt
- Register at https://www.ordbank.no/
- Obtain your API key from your account dashboard
- Set the environment variable:
Or pass it directly with
export ORDBANK_API_KEY="your_api_key_here"
--api_key
flag
python altmorph.py --sentence "Katta ligger pΓ₯ matta." --lang nob
Output:
"{Katta, Katten} ligger pΓ₯ {matta, matten}."
python altmorph.py \
--sentence "Katta ligger pΓ₯ matta." \
--lang nob \
--api_key "your_api_key_here"
The tool takes sentence context into account:
Simple example:
python altmorph.py --sentence "Katta ligger pΓ₯ matta." --lang nob
# Output: "{Katta, Katten} ligger pΓ₯ {matta, matten}."
# Shows different morphological forms for the same words
Complex context:
python altmorph.py --sentence "Katta ligger pΓ₯ matta i stua." --lang nob
# Output: "{Katta, Katten} ligger pΓ₯ {matta, matten} i stua."
# BERT-based filtering keeps alternatives that work in the sentence
python altmorph.py --sentence "Katta ligger pΓ₯ matta." --lang nob
# Each word occurrence is analyzed in its specific syntactic context
Option | Default | Description |
---|---|---|
--sentence |
required | Input sentence to process |
--lang |
nob |
Language code (nob or nno ) |
--api_key |
$ORDBANK_API_KEY |
Ordbank API key |
--verbosity |
0 |
Verbosity level (0-3) |
--logit-threshold |
3.0 |
BERT acceptability threshold |
--timeout |
6.0 |
HTTP timeout per request |
--max_workers |
4 |
Parallel API requests |
--no-cache |
False |
Disable caching |
--delete-cache |
False |
Clear cache and exit |
python altmorph.py --sentence "Katta ligger pΓ₯ matta." --verbosity 0
Output: Just the final result
"{Katta, Katten} ligger pΓ₯ {matta, matten}."
python altmorph.py --sentence "Katta ligger pΓ₯ matta." --verbosity 1
Output: Basic progress information
2025-XX-XX 12:00:00 INFO Loading POS tagger...
2025-XX-XX 12:00:02 INFO POS tagger loaded
"{Katta, Katten} ligger pΓ₯ {matta, matten}."
python altmorph.py --sentence "Katta ligger pΓ₯ matta." --verbosity 2
Output: Processing details (POS tags, API lookups, alternatives found)
π― PROCESSING: Katta ligger pΓ₯ matta.
π WORDS: ['katta', 'ligger', 'pΓ₯', 'matta']
π·οΈ POS TAGS:
katta: NOUN
ligger: VERB
pΓ₯: ADP
matta: NOUN
π‘ API LOOKUP: katta (POS: NOUN)
β
katta: 2 alternatives: ['katta', 'katten']
...
β¨ RESULT: "{Katta, Katten} ligger pΓ₯ {matta, matten}."
python altmorph.py --sentence "Katta ligger pΓ₯ matta." --verbosity 3
Output: Everything including cache operations, lemma analysis, BERT filtering
π― PROCESSING: Katta ligger pΓ₯ matta.
π FOUND 2 LEMMAS for katta
πΎ CACHE HIT: lemmas for 'katta' (POS: NOUN)
π§ ACCEPTABILITY FILTERING (threshold: 3.00)
π ANALYZING: katta (position 0)
Context: [Katta] ligger pΓ₯ matta.
Alternatives: ['katta', 'katten']
π CACHE STATS: 8 hits, 0 misses (100.0% hit rate)
...
AltMorph includes caching to improve performance:
- Cache location:
~/.ordbank_cache/
- Cache types: Lemma searches and inflection data
- Performance: ~95%+ hit rate for repeated usage
- Management:
--no-cache
: Disable caching--delete-cache
: Clear all cache files
Performance impact:
- First run: ~3-4 seconds (API calls)
- Cached runs: ~0.5 seconds
π Complete Code Walkthrough - Detailed technical explanation of how AltMorph works for developers who need implementation details.
- Input Processing: Tokenization preserving whitespace and punctuation
- POS Tagging: NbAiLab/nb-bert-base-pos for accurate grammatical analysis
- Lemma Discovery: Comprehensive search across all relevant Ordbank lemmas
- Inflection Analysis: Full morphological paradigm extraction
- Acceptability Scoring: NbAiLab/nb-bert-base for context-sensitive filtering
- Output Generation: Case-preserving alternative presentation
- POS Tagging:
NbAiLab/nb-bert-base-pos
- Acceptability:
NbAiLab/nb-bert-base
- API: Ordbank - Norwegian morphological database
- Comprehensive lemma matching: Finds all lemmas containing target word
- Position-specific analysis: Each word occurrence analyzed in context
- Logit-based filtering: Acceptability thresholding (default: 3.0)
- Prioritization: Balances morphological coverage with contextual fit
- Single sentence: 0.5-4 seconds (depending on cache state)
- Cache hit rate: Typically 95%+ for repeated usage
- API efficiency: Parallel requests with batching
- Memory usage: ~500MB (loaded BERT models)
- Concurrent requests: Configurable via
--max_workers
- Timeout handling: Robust error recovery with retries
- Rate limiting: Respectful API usage patterns
AltMorph includes additional tools for batch processing and testing:
tools/process_jsonl.py
: Batch process JSONL files by adding morphological alternatives to text fieldstools/pos_tester.py
: Compare POS tagging across multiple Norwegian NLP models
See tools/README.md
for detailed documentation and usage examples.
altmorph/
βββ altmorph.py # Main application
βββ tools/
β βββ README.md # Tools documentation
β βββ process_jsonl.py # JSONL batch processor
β βββ pos_tester.py # POS tagging comparison tool
βββ data/
β βββ sample_input.jsonl # Sample data for testing
βββ README.md # Main documentation
βββ setup.py # Legacy packaging
βββ pyproject.toml # Modern packaging
βββ requirements.txt # Dependencies
βββ ~/.ordbank_cache/ # Cache directory (auto-created)
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure code follows existing style
- Submit a pull request
# Test basic functionality
python altmorph.py --sentence "Katta ligger pΓ₯ matta." --lang nob
# Test cache functionality
python altmorph.py --delete-cache
python altmorph.py --sentence "Katta ligger pΓ₯ matta." --lang nob --verbosity 3
# Test without cache
python altmorph.py --sentence "Katta ligger pΓ₯ matta." --lang nob --no-cache
# Test POS comparison tool
python tools/pos_tester.py --text "Katta ligger pΓ₯ matta."
# Test batch processing with sample data
python tools/process_jsonl.py --input_file data/sample_input.jsonl --output_file test_output.jsonl --verbosity 2
- AltWER: Depends on AltMorph's output format for Norwegian text evaluation
- Ordbank Team: For providing the comprehensive Norwegian morphological API
- Clarino/UiB: For hosting the API infrastructure
- NbAiLab: For the Norwegian BERT models
- AltMorph: Idea and coding by Magnus Breder Birkenes and Per Egil Kummervold