Skip to content

Commit 968e4cc

Browse files
committed
Update German stemmer to the latest standard
Basically, it the German2 variant is recommended now. This feature was present before, but was turned off by default. Now German2 is the default.
1 parent a051e12 commit 968e4cc

File tree

2 files changed

+15
-4
lines changed

2 files changed

+15
-4
lines changed

Changelog.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
## Change Log
22

33
### 2023.1 Release
4-
- Updated the Swedish stemmer to the latest Snowball standard.
4+
- Updated the Swedish and German stemmers to the latest Snowball 2.3 standard.
55

66
### 2023 Release
77
- Updated Spanish, Russian, Italian, and French stemmers to the latest Snowball standard.

src/german_stem.h

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,16 @@ namespace stemming
2727
2828
@par Algorithm:
2929
30+
<b>Step 0:</b>
31+
32+
- Replace ß with ss, ae with ä, oe with ö, ue with ü (unless preceded by q).
33+
34+
The rules here for ae, oe and ue were
35+
added in Snowball 2.3.0, but were previously present as a variant of the
36+
algorithm termed"german2". The condition on the replacement of ue prevents
37+
the unwanted changing of quelle. Also note that feuer is not modified
38+
because the first part of the rule changes it to feUer, so ue is not found.
39+
3040
<b>Step 1:</b>
3141
3242
Search for the longest among the following suffixes:
@@ -70,8 +80,9 @@ namespace stemming
7080
class german_stem final : public stem<string_typeT>
7181
{
7282
public:
73-
german_stem() noexcept : m_transliterate_umlauts(false) {}
74-
/** @brief Set to true to use the variant algorithm that expands "ä" to "ae", etc...
83+
/** @brief Set to @c true (the default) to use the algorithm that expands "ä" to "ae", etc...
84+
@details This should only be @c false if preferring to use the German algorithm prior
85+
to the Snowball 2.3.0 standard.
7586
@param transliterate_umlauts Whether to transliterate umlauted vowels.*/
7687
void should_transliterate_umlauts(const bool transliterate_umlauts)
7788
{ m_transliterate_umlauts = transliterate_umlauts; }
@@ -361,7 +372,7 @@ namespace stemming
361372
}
362373
}
363374
}
364-
bool m_transliterate_umlauts;
375+
bool m_transliterate_umlauts{ true };
365376
};
366377
}
367378

0 commit comments

Comments
 (0)