Skip to content

Commit 0038e10

Browse files
committed
2 parents fba73f3 + 08800d0 commit 0038e10

File tree

1 file changed

+39
-0
lines changed

1 file changed

+39
-0
lines changed
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
function RemoveAccents {
2+
<#
3+
.synopsis
4+
Strip accents from text using cyrillic encoding
5+
.notes
6+
warning: this is a simple method, but does remove non-accented characters that weren't encodable
7+
it's a single-byte encoding
8+
.links
9+
https://en.wikipedia.org/wiki/Windows-1251
10+
#>
11+
param( [string] $Text )
12+
$enc = [Text.Encoding]::GetEncoding('iso-8859-5')
13+
$enc.GetString( $enc.GetBytes( $Text ) )
14+
}
15+
16+
RemoveAccents 'foo bår'
17+
# output: foo bar
18+
19+
RemoveAccents 'foo 🐒 bar'
20+
# output: foo ?? bar
21+
22+
<#
23+
I'm not 100% this is the best cyrillic to use, there's a few
24+
25+
Pwsh> [Text.Encoding]::GetEncodings() | ? displayname -Match 'cyr|cry'
26+
27+
CodePage Name DisplayName
28+
-------- ---- -----------
29+
20880 IBM880 IBM EBCDIC (Cyrillic Russian)
30+
866 cp866 Cyrillic (DOS)
31+
21866 koi8-u Cyrillic (KOI8-U)
32+
1251 windows-1251 Cyrillic (Windows)
33+
10007 x-mac-cyrillic Cyrillic (Mac)
34+
28595 iso-8859-5 Cyrillic (ISO)
35+
20866 koi8-r Cyrillic (KOI8-R)
36+
855 IBM855 OEM Cyrillic
37+
21025 cp1025 IBM EBCDIC (Cyrillic Serbian-Bulgarian)
38+
39+
#>

0 commit comments

Comments
 (0)