How to match Cyrillic characters with a regular expression

RegexUnicodeCharacter Properties

Regex Problem Overview


How do I match French and Russian Cyrillic alphabet characters with a regular expression? I only want to do the alpha characters, no numbers or special characters. Right now I have

[A-Za-z]

Regex Solutions


Solution 1 - Regex

If your regex flavor supports Unicode blocks ([\p{IsCyrillic}]), you can match Cyrillic characters with:

[\p{IsCyrillic}] or [\p{Cyrillic}]

Otherwise try using:

[U+0400–U+04FF]

For PHP use:

[\x{0400}-\x{04FF}]

Explanation:

[\p{IsCyrillic}]

Match a character from the Unicode block “Cyrillic” (U+0400–U+04FF) «[\p{IsCyrillic}]»

Note:

Unicode Characters list and Numeric HTML Entities of [U+0400–U+04FF] .

Solution 2 - Regex

It depends on your regex flavor. If it supports Unicode character classes (like .NET, for instance), \p{L} matches a letter character (in any character set).

Solution 3 - Regex

To match only Russian Cyrillic characters use:

[\u0401\u0451\u0410-\u044f]

which is the equivalent of:

[ЁёА-я]

where А is Cyrillic, not Latin. (Despite looking the same they have different codes)

\p{IsCyrillic}, \p{Cyrillic}, [\u0400-\u04FF] which others suggested will match all variants of Cyrillic, not only Russian

Solution 4 - Regex

If you use modern PHP version - just:

preg_match("/^[\p{L}]+$/u");

Don't forget the u flag for unicode support!

Solution 5 - Regex

Regex to match cyrillic alphabets with normal(english) alphabets :

^[A-Za-z.!@?#"$%&:;() *\+,\/;\-=[\\\]\^_{|}<>\u0400-\u04FF]*$

It matches special chars,cyrillic alphabets,english alphabets.

Solution 6 - Regex

Various regex dialects use [:alpha:] for any alphanumeric character in the current locale. (You may need to put that in a character class, e.g. [[:alpha:]].)

Solution 7 - Regex

this worked for me

[a-z\u0400-\u04FF]

Solution 8 - Regex

If you use Elixir:

String.match?(string, ~r/^\p{Cyrillic}*$/u)

You need to add the u flag for unicode support.

Solution 9 - Regex

For modern PHP (source):

$string = 'тест тест Тест Обязателльно Stackoverflow >!<';
var_dump(preg_replace('/[\x{0410}-\x{042F}]+.*[\x{0410}-\x{042F}]+/iu', '', $string));

Solution 10 - Regex

In Java to match Cyrillic letters and space use the following pattern

^[\p{InCyrillic}\s]+$

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGreg FinzerView Question on Stackoverflow
Solution 1 - RegexPedro LobitoView Answer on Stackoverflow
Solution 2 - RegexTim PietzckerView Answer on Stackoverflow
Solution 3 - RegexCITBLView Answer on Stackoverflow
Solution 4 - RegexОлег ВсильдеревьевView Answer on Stackoverflow
Solution 5 - RegexDipti GhumbreView Answer on Stackoverflow
Solution 6 - RegexRoger PateView Answer on Stackoverflow
Solution 7 - Regexlili.bView Answer on Stackoverflow
Solution 8 - RegexMarvin RabeView Answer on Stackoverflow
Solution 9 - RegexRobert SinclairView Answer on Stackoverflow
Solution 10 - RegexTony ThanuvelilView Answer on Stackoverflow