Replacing accented characters php

PhpStringPreg ReplaceNon Ascii-Characters

Php Problem Overview


I am trying to replace accented characters with the normal replacements. Below is what I am currently doing.

    $string = "Éric Cantona";
    $strict = strtolower($string);

    echo "After Lower: ".$strict;

    $patterns[0] = '/[á|â|à|å|ä]/';
    $patterns[1] = '/[ð|é|ê|è|ë]/';
    $patterns[2] = '/[í|î|ì|ï]/';
    $patterns[3] = '/[ó|ô|ò|ø|õ|ö]/';
    $patterns[4] = '/[ú|û|ù|ü]/';
    $patterns[5] = '/æ/';
    $patterns[6] = '/ç/';
    $patterns[7] = '/ß/';
    $replacements[0] = 'a';
    $replacements[1] = 'e';
    $replacements[2] = 'i';
    $replacements[3] = 'o';
    $replacements[4] = 'u';
    $replacements[5] = 'ae';
    $replacements[6] = 'c';
    $replacements[7] = 'ss';

    $strict = preg_replace($patterns, $replacements, $strict);
    echo "Final: ".$strict;

This gives me:

    After Lower: éric cantona
    Final: ric cantona

The above gives me ric cantona I want the output to be eric cantona.

can anyone help me with where I am going wrong?

Php Solutions


Solution 1 - Php

I have tried all sorts based on the variations listed in the answers, but the following worked:

$unwanted_array = array(    'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
                            'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
                            'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c',
                            'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
                            'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y' );
$str = strtr( $str, $unwanted_array );

Solution 2 - Php

To remove the diacritics, use iconv:

$val = iconv('ISO-8859-1','ASCII//TRANSLIT',$val);

or

$val = iconv('UTF-8','ASCII//TRANSLIT',$val);

note that php has some weird bug in that it (sometimes?) needs to have a locale set to make these conversions work, using setlocale().

edit tested, it gets all of your diacritics out of the box:

$val = "á|â|à|å|ä ð|é|ê|è|ë í|î|ì|ï ó|ô|ò|ø|õ|ö ú|û|ù|ü æ ç ß abc ABC 123";
echo iconv('UTF-8','ASCII//TRANSLIT',$val); 

output (updated 2019-12-30)

a|a|a|a|a d|e|e|e|e i|i|i|i o|o|o|o|o|o u|u|u|u ae c ss abc ABC 123

Note that ð is correctly transliterated to d instead of o, as in the accepted answer.

Solution 3 - Php

I just came accross the answer from Lizard which is extremely helpful - especially when you do some sorting. Isn't is beautiful how many chars we need to say mostly the same ;)

If anyone else if looking for a all-in solution (as far as the comments above tell), here's the copy&paste:

/**
 * Replace language-specific characters by ASCII-equivalents.
 * @param string $s
 * @return string
 */
public static function normalizeChars($s) {
	$replace = array(
	 	'ъ'=>'-', 'Ь'=>'-', 'Ъ'=>'-', 'ь'=>'-',
		'Ă'=>'A', 'Ą'=>'A', 'À'=>'A', 'Ã'=>'A', 'Á'=>'A', 'Æ'=>'A', 'Â'=>'A', 'Å'=>'A', 'Ä'=>'Ae',
		'Þ'=>'B',
		'Ć'=>'C', 'ץ'=>'C', 'Ç'=>'C',
		'È'=>'E', 'Ę'=>'E', 'É'=>'E', 'Ë'=>'E', 'Ê'=>'E',
		'Ğ'=>'G',
		'İ'=>'I', 'Ï'=>'I', 'Î'=>'I', 'Í'=>'I', 'Ì'=>'I',
		'Ł'=>'L',
		'Ñ'=>'N', 'Ń'=>'N',
		'Ø'=>'O', 'Ó'=>'O', 'Ò'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'Oe',
		'Ş'=>'S', 'Ś'=>'S', 'Ș'=>'S', 'Š'=>'S',
		'Ț'=>'T',
		'Ù'=>'U', 'Û'=>'U', 'Ú'=>'U', 'Ü'=>'Ue',
		'Ý'=>'Y',
		'Ź'=>'Z', 'Ž'=>'Z', 'Ż'=>'Z',
		'â'=>'a', 'ǎ'=>'a', 'ą'=>'a', 'á'=>'a', 'ă'=>'a', 'ã'=>'a', 'Ǎ'=>'a', 'а'=>'a', 'А'=>'a', 'å'=>'a', 'à'=>'a', 'א'=>'a', 'Ǻ'=>'a', 'Ā'=>'a', 'ǻ'=>'a', 'ā'=>'a', 'ä'=>'ae', 'æ'=>'ae', 'Ǽ'=>'ae', 'ǽ'=>'ae',
		'б'=>'b', 'ב'=>'b', 'Б'=>'b', 'þ'=>'b',
		'ĉ'=>'c', 'Ĉ'=>'c', 'Ċ'=>'c', 'ć'=>'c', 'ç'=>'c', 'ц'=>'c', 'צ'=>'c', 'ċ'=>'c', 'Ц'=>'c', 'Č'=>'c', 'č'=>'c', 'Ч'=>'ch', 'ч'=>'ch',
		'ד'=>'d', 'ď'=>'d', 'Đ'=>'d', 'Ď'=>'d', 'đ'=>'d', 'д'=>'d', 'Д'=>'D', 'ð'=>'d',
		'є'=>'e', 'ע'=>'e', 'е'=>'e', 'Е'=>'e', 'Ə'=>'e', 'ę'=>'e', 'ĕ'=>'e', 'ē'=>'e', 'Ē'=>'e', 'Ė'=>'e', 'ė'=>'e', 'ě'=>'e', 'Ě'=>'e', 'Є'=>'e', 'Ĕ'=>'e', 'ê'=>'e', 'ə'=>'e', 'è'=>'e', 'ë'=>'e', 'é'=>'e',
		'ф'=>'f', 'ƒ'=>'f', 'Ф'=>'f',
		'ġ'=>'g', 'Ģ'=>'g', 'Ġ'=>'g', 'Ĝ'=>'g', 'Г'=>'g', 'г'=>'g', 'ĝ'=>'g', 'ğ'=>'g', 'ג'=>'g', 'Ґ'=>'g', 'ґ'=>'g', 'ģ'=>'g',
		'ח'=>'h', 'ħ'=>'h', 'Х'=>'h', 'Ħ'=>'h', 'Ĥ'=>'h', 'ĥ'=>'h', 'х'=>'h', 'ה'=>'h',
		'î'=>'i', 'ï'=>'i', 'í'=>'i', 'ì'=>'i', 'į'=>'i', 'ĭ'=>'i', 'ı'=>'i', 'Ĭ'=>'i', 'И'=>'i', 'ĩ'=>'i', 'ǐ'=>'i', 'Ĩ'=>'i', 'Ǐ'=>'i', 'и'=>'i', 'Į'=>'i', 'י'=>'i', 'Ї'=>'i', 'Ī'=>'i', 'І'=>'i', 'ї'=>'i', 'і'=>'i', 'ī'=>'i', 'ij'=>'ij', 'IJ'=>'ij',
		'й'=>'j', 'Й'=>'j', 'Ĵ'=>'j', 'ĵ'=>'j', 'я'=>'ja', 'Я'=>'ja', 'Э'=>'je', 'э'=>'je', 'ё'=>'jo', 'Ё'=>'jo', 'ю'=>'ju', 'Ю'=>'ju',
		'ĸ'=>'k', 'כ'=>'k', 'Ķ'=>'k', 'К'=>'k', 'к'=>'k', 'ķ'=>'k', 'ך'=>'k',
		'Ŀ'=>'l', 'ŀ'=>'l', 'Л'=>'l', 'ł'=>'l', 'ļ'=>'l', 'ĺ'=>'l', 'Ĺ'=>'l', 'Ļ'=>'l', 'л'=>'l', 'Ľ'=>'l', 'ľ'=>'l', 'ל'=>'l',
		'מ'=>'m', 'М'=>'m', 'ם'=>'m', 'м'=>'m',
		'ñ'=>'n', 'н'=>'n', 'Ņ'=>'n', 'ן'=>'n', 'ŋ'=>'n', 'נ'=>'n', 'Н'=>'n', 'ń'=>'n', 'Ŋ'=>'n', 'ņ'=>'n', 'ʼn'=>'n', 'Ň'=>'n', 'ň'=>'n',
		'о'=>'o', 'О'=>'o', 'ő'=>'o', 'õ'=>'o', 'ô'=>'o', 'Ő'=>'o', 'ŏ'=>'o', 'Ŏ'=>'o', 'Ō'=>'o', 'ō'=>'o', 'ø'=>'o', 'ǿ'=>'o', 'ǒ'=>'o', 'ò'=>'o', 'Ǿ'=>'o', 'Ǒ'=>'o', 'ơ'=>'o', 'ó'=>'o', 'Ơ'=>'o', 'œ'=>'oe', 'Œ'=>'oe', 'ö'=>'oe',
		'פ'=>'p', 'ף'=>'p', 'п'=>'p', 'П'=>'p',
		'ק'=>'q',
		'ŕ'=>'r', 'ř'=>'r', 'Ř'=>'r', 'ŗ'=>'r', 'Ŗ'=>'r', 'ר'=>'r', 'Ŕ'=>'r', 'Р'=>'r', 'р'=>'r',
		'ș'=>'s', 'с'=>'s', 'Ŝ'=>'s', 'š'=>'s', 'ś'=>'s', 'ס'=>'s', 'ş'=>'s', 'С'=>'s', 'ŝ'=>'s', 'Щ'=>'sch', 'щ'=>'sch', 'ш'=>'sh', 'Ш'=>'sh', 'ß'=>'ss',
		'т'=>'t', 'ט'=>'t', 'ŧ'=>'t', 'ת'=>'t', 'ť'=>'t', 'ţ'=>'t', 'Ţ'=>'t', 'Т'=>'t', 'ț'=>'t', 'Ŧ'=>'t', 'Ť'=>'t', '™'=>'tm',
		'ū'=>'u', 'у'=>'u', 'Ũ'=>'u', 'ũ'=>'u', 'Ư'=>'u', 'ư'=>'u', 'Ū'=>'u', 'Ǔ'=>'u', 'ų'=>'u', 'Ų'=>'u', 'ŭ'=>'u', 'Ŭ'=>'u', 'Ů'=>'u', 'ů'=>'u', 'ű'=>'u', 'Ű'=>'u', 'Ǖ'=>'u', 'ǔ'=>'u', 'Ǜ'=>'u', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'У'=>'u', 'ǚ'=>'u', 'ǜ'=>'u', 'Ǚ'=>'u', 'Ǘ'=>'u', 'ǖ'=>'u', 'ǘ'=>'u', 'ü'=>'ue',
		'в'=>'v', 'ו'=>'v', 'В'=>'v',
		'ש'=>'w', 'ŵ'=>'w', 'Ŵ'=>'w',
		'ы'=>'y', 'ŷ'=>'y', 'ý'=>'y', 'ÿ'=>'y', 'Ÿ'=>'y', 'Ŷ'=>'y',
		'Ы'=>'y', 'ž'=>'z', 'З'=>'z', 'з'=>'z', 'ź'=>'z', 'ז'=>'z', 'ż'=>'z', 'ſ'=>'z', 'Ж'=>'zh', 'ж'=>'zh'
	);
	return strtr($s, $replace);
}

Note some slight changes regarding the German umlauts (ä => ae)

Edit: Included more characters based on the posting from user3682119 (except for the copyright symbol) and the comment from daker.

Solution 4 - Php

In PHP 5.4 the intl extension provides a new class named Transliterator.

I believe that's the best way to remove diacritics for two reasons:

  1. Transliterator is based on ICU, so you're using the tables of the ICU library. ICU is a great project, developed over the year to provide comprehensive tables and functionalities. Whatever table you want to write yourself, it will never be as complete as the one from ICU.

  2. In UTF-8, characters could be represented differently. For example, the character ñ could be saved as a single (multi-byte) character, or as the combination of characters ˜ (multibyte) and n. In addition to this, some characters in Unicode are homograph: they look the same while having different codepoints. For this reason it's also important to normalize the string.

Here's a sample code, taken from an old answer of mine:

<?php
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', Transliterator::FORWARD);
$test = ['abcd', 'èe', '€', 'àòùìéëü', 'àòùìéëü', 'tiësto'];
foreach($test as $e) {
	$normalized = $transliterator->transliterate($e);
	echo $e. ' --> '.$normalized."\n";
}
?>

Result:

abcd --> abcd
èe --> ee--> €
àòùìéëü --> aouieeu
àòùìéëü --> aouieeu
tiësto --> tiesto

The first argument for the Transliterator class performs the removal of diacritics as well as the normalization of the string.

Solution 5 - Php

An updated answer based on @BurninLeo's answer

function replace_spec_char($subject) {
    $char_map = array(
        "ъ" => "-", "ь" => "-", "Ъ" => "-", "Ь" => "-",
        "А" => "A", "Ă" => "A", "Ǎ" => "A", "Ą" => "A", "À" => "A", "Ã" => "A", "Á" => "A", "Æ" => "A", "Â" => "A", "Å" => "A", "Ǻ" => "A", "Ā" => "A", "א" => "A",
        "Б" => "B", "ב" => "B", "Þ" => "B",
        "Ĉ" => "C", "Ć" => "C", "Ç" => "C", "Ц" => "C", "צ" => "C", "Ċ" => "C", "Č" => "C", "©" => "C", "ץ" => "C",
        "Д" => "D", "Ď" => "D", "Đ" => "D", "ד" => "D", "Ð" => "D",
        "È" => "E", "Ę" => "E", "É" => "E", "Ë" => "E", "Ê" => "E", "Е" => "E", "Ē" => "E", "Ė" => "E", "Ě" => "E", "Ĕ" => "E", "Є" => "E", "Ə" => "E", "ע" => "E",
        "Ф" => "F", "Ƒ" => "F",
        "Ğ" => "G", "Ġ" => "G", "Ģ" => "G", "Ĝ" => "G", "Г" => "G", "ג" => "G", "Ґ" => "G",
        "ח" => "H", "Ħ" => "H", "Х" => "H", "Ĥ" => "H", "ה" => "H",
        "I" => "I", "Ï" => "I", "Î" => "I", "Í" => "I", "Ì" => "I", "Į" => "I", "Ĭ" => "I", "I" => "I", "И" => "I", "Ĩ" => "I", "Ǐ" => "I", "י" => "I", "Ї" => "I", "Ī" => "I", "І" => "I",
        "Й" => "J", "Ĵ" => "J",
        "ĸ" => "K", "כ" => "K", "Ķ" => "K", "К" => "K", "ך" => "K",
        "Ł" => "L", "Ŀ" => "L", "Л" => "L", "Ļ" => "L", "Ĺ" => "L", "Ľ" => "L", "ל" => "L",
        "מ" => "M", "М" => "M", "ם" => "M",
        "Ñ" => "N", "Ń" => "N", "Н" => "N", "Ņ" => "N", "ן" => "N", "Ŋ" => "N", "נ" => "N", "ʼn" => "N", "Ň" => "N",
        "Ø" => "O", "Ó" => "O", "Ò" => "O", "Ô" => "O", "Õ" => "O", "О" => "O", "Ő" => "O", "Ŏ" => "O", "Ō" => "O", "Ǿ" => "O", "Ǒ" => "O", "Ơ" => "O",
        "פ" => "P", "ף" => "P", "П" => "P",
        "ק" => "Q",
        "Ŕ" => "R", "Ř" => "R", "Ŗ" => "R", "ר" => "R", "Р" => "R", "®" => "R",
        "Ş" => "S", "Ś" => "S", "Ș" => "S", "Š" => "S", "С" => "S", "Ŝ" => "S", "ס" => "S",
        "Т" => "T", "Ț" => "T", "ט" => "T", "Ŧ" => "T", "ת" => "T", "Ť" => "T", "Ţ" => "T",
        "Ù" => "U", "Û" => "U", "Ú" => "U", "Ū" => "U", "У" => "U", "Ũ" => "U", "Ư" => "U", "Ǔ" => "U", "Ų" => "U", "Ŭ" => "U", "Ů" => "U", "Ű" => "U", "Ǖ" => "U", "Ǜ" => "U", "Ǚ" => "U", "Ǘ" => "U",
        "В" => "V", "ו" => "V",
        "Ý" => "Y", "Ы" => "Y", "Ŷ" => "Y", "Ÿ" => "Y",
        "Ź" => "Z", "Ž" => "Z", "Ż" => "Z", "З" => "Z", "ז" => "Z",
        "а" => "a", "ă" => "a", "ǎ" => "a", "ą" => "a", "à" => "a", "ã" => "a", "á" => "a", "æ" => "a", "â" => "a", "å" => "a", "ǻ" => "a", "ā" => "a", "א" => "a",
        "б" => "b", "ב" => "b", "þ" => "b",
        "ĉ" => "c", "ć" => "c", "ç" => "c", "ц" => "c", "צ" => "c", "ċ" => "c", "č" => "c", "©" => "c", "ץ" => "c",
        "Ч" => "ch", "ч" => "ch",
        "д" => "d", "ď" => "d", "đ" => "d", "ד" => "d", "ð" => "d",
        "è" => "e", "ę" => "e", "é" => "e", "ë" => "e", "ê" => "e", "е" => "e", "ē" => "e", "ė" => "e", "ě" => "e", "ĕ" => "e", "є" => "e", "ə" => "e", "ע" => "e",
        "ф" => "f", "ƒ" => "f",
        "ğ" => "g", "ġ" => "g", "ģ" => "g", "ĝ" => "g", "г" => "g", "ג" => "g", "ґ" => "g",
        "ח" => "h", "ħ" => "h", "х" => "h", "ĥ" => "h", "ה" => "h",
        "i" => "i", "ï" => "i", "î" => "i", "í" => "i", "ì" => "i", "į" => "i", "ĭ" => "i", "ı" => "i", "и" => "i", "ĩ" => "i", "ǐ" => "i", "י" => "i", "ї" => "i", "ī" => "i", "і" => "i",
        "й" => "j", "Й" => "j", "Ĵ" => "j", "ĵ" => "j",
        "ĸ" => "k", "כ" => "k", "ķ" => "k", "к" => "k", "ך" => "k",
        "ł" => "l", "ŀ" => "l", "л" => "l", "ļ" => "l", "ĺ" => "l", "ľ" => "l", "ל" => "l",
        "מ" => "m", "м" => "m", "ם" => "m",
        "ñ" => "n", "ń" => "n", "н" => "n", "ņ" => "n", "ן" => "n", "ŋ" => "n", "נ" => "n", "ʼn" => "n", "ň" => "n",
        "ø" => "o", "ó" => "o", "ò" => "o", "ô" => "o", "õ" => "o", "о" => "o", "ő" => "o", "ŏ" => "o", "ō" => "o", "ǿ" => "o", "ǒ" => "o", "ơ" => "o",
        "פ" => "p", "ף" => "p", "п" => "p",
        "ק" => "q",
        "ŕ" => "r", "ř" => "r", "ŗ" => "r", "ר" => "r", "р" => "r", "®" => "r",
        "ş" => "s", "ś" => "s", "ș" => "s", "š" => "s", "с" => "s", "ŝ" => "s", "ס" => "s",
        "т" => "t", "ț" => "t", "ט" => "t", "ŧ" => "t", "ת" => "t", "ť" => "t", "ţ" => "t",
        "ù" => "u", "û" => "u", "ú" => "u", "ū" => "u", "у" => "u", "ũ" => "u", "ư" => "u", "ǔ" => "u", "ų" => "u", "ŭ" => "u", "ů" => "u", "ű" => "u", "ǖ" => "u", "ǜ" => "u", "ǚ" => "u", "ǘ" => "u",
        "в" => "v", "ו" => "v",
        "ý" => "y", "ы" => "y", "ŷ" => "y", "ÿ" => "y",
        "ź" => "z", "ž" => "z", "ż" => "z", "з" => "z", "ז" => "z", "ſ" => "z",
        "™" => "tm",
        "@" => "at",
        "Ä" => "ae", "Ǽ" => "ae", "ä" => "ae", "æ" => "ae", "ǽ" => "ae",
        "ij" => "ij", "IJ" => "ij",
        "я" => "ja", "Я" => "ja",
        "Э" => "je", "э" => "je",
        "ё" => "jo", "Ё" => "jo",
        "ю" => "ju", "Ю" => "ju",
        "œ" => "oe", "Œ" => "oe", "ö" => "oe", "Ö" => "oe",
        "щ" => "sch", "Щ" => "sch",
        "ш" => "sh", "Ш" => "sh",
        "ß" => "ss",
        "Ü" => "ue",
        "Ж" => "zh", "ж" => "zh",
    );
    return strtr($subject, $char_map);
}

$string = "Ħí ŧħə®ë, юßť å test!";
echo replace_spec_char($string);

Ħí ŧħə®ë, юßť å test! => Hi there, jusst a test!

This does not mix up upper and lower case chars except for longer chars (eg: ss,ch, sch) , added @ ® ©

Also if you want to build regex matching regardless to special chars :

rss => '[rŕřŘŗŖרŔРр](?:[sșсŜšśסşСŝ][sșсŜšśסşСŝ]|[ß])'

A vala implementation of this : https://code.launchpad.net/~jeremy-munsch/synapse-project/ascii-smart/+merge/277477

Here is the base list you could work with, with regex replacing (in sublime text) or small script you can build anything from this array to fill your needs.

"-" => "ъьЪЬ",
"A" => "АĂǍĄÀÃÁÆÂÅǺĀא",
"B" => "БבÞ",
"C" => "ĈĆÇЦצĊČ©ץ",
"D" => "ДĎĐדÐ",
"E" => "ÈĘÉËÊЕĒĖĚĔЄƏע",
"F" => "ФƑ",
"G" => "ĞĠĢĜГגҐ",
"H" => "חĦХĤה",
"I" => "IÏÎÍÌĮĬIИĨǏיЇĪІ",
"J" => "ЙĴ",
"K" => "ĸכĶКך",
"L" => "ŁĿЛĻĹĽל",
"M" => "מМם",
"N" => "ÑŃНŅןŊנʼnŇ",
"O" => "ØÓÒÔÕОŐŎŌǾǑƠ",
"P" => "פףП",
"Q" => "ק",
"R" => "ŔŘŖרР®",
"S" => "ŞŚȘŠСŜס",
"T" => "ТȚטŦתŤŢ",
"U" => "ÙÛÚŪУŨƯǓŲŬŮŰǕǛǙǗ",
"V" => "Вו",
"Y" => "ÝЫŶŸ",
"Z" => "ŹŽŻЗז",
"a" => "аăǎąàãáæâåǻāא",
"b" => "бבþ",
"c" => "ĉćçцצċč©ץ",
"ch" => "ч",
"d" => "дďđדð",
"e" => "èęéëêеēėěĕєəע",
"f" => "фƒ",
"g" => "ğġģĝгגґ",
"h" => "חħхĥה",
"i" => "iïîíìįĭıиĩǐיїīі",
"j" => "йĵ",
"k" => "ĸכķкך",
"l" => "łŀлļĺľל",
"m" => "מмם",
"n" => "ñńнņןŋנʼnň",
"o" => "øóòôõоőŏōǿǒơ",
"p" => "פףп",
"q" => "ק",
"r" => "ŕřŗרр®",
"s" => "şśșšсŝס",
"t" => "тțטŧתťţ",
"u" => "ùûúūуũưǔųŭůűǖǜǚǘ",
"v" => "вו",
"y" => "ýыŷÿ",
"z" => "źžżзזſ",
"tm" => "™",
"at" => "@",
"ae" => "ÄǼäæǽ",
"ch" => "Чч",
"ij" => "ijIJ",
"j" => "йЙĴĵ",
"ja" => "яЯ",
"je" => "Ээ",
"jo" => "ёЁ",
"ju" => "юЮ",
"oe" => "œŒöÖ",
"sch" => "щЩ",
"sh" => "шШ",
"ss" => "ß",
"tm" => "™",
"ue" => "Ü",
"zh" => "Жж"

Solution 6 - Php

So I found this on php.net page for preg_replace function

// replace accented chars

$string = "Zacarías Ferreíra"; // my definition for string variable
$accents = '/&([A-Za-z]{1,2})(grave|acute|circ|cedil|uml|lig);/';

$string_encoded = htmlentities($string,ENT_NOQUOTES,'UTF-8');

$string = preg_replace($accents,'$1',$string_encoded);

If you have encoding issues you may get someting like this "Zacarías Ferreíra", just decode the string and use said code above

$string = utf8_decode("Zacarías Ferreíra");

Solution 7 - Php

I found this way to be a good one, without having to worry too much about charsets and arrays, or iconv:

function replace_accents($str) {
   $str = htmlentities($str, ENT_COMPAT, "UTF-8");
   $str = preg_replace('/&([a-zA-Z])(uml|acute|grave|circ|tilde|ring);/','$1',$str);
   return html_entity_decode($str);
}

Solution 8 - Php

This worked for me:

<?php
setlocale(LC_ALL, "en_US.utf8"); 
$val = iconv('UTF-8','ASCII//TRANSLIT',$val);
?>

Solution 9 - Php

if you have http://php.net/manual/en/book.intl.php available, this will solve your problem:

$string = "Éric Cantona";
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD);
echo $normalized = $transliterator->transliterate($string);

EDIT

To install the php extension in ubuntu:

apt-get install php-intl

Don't forget, in composer, to require the extension ext-intl to ensure it properly fits into deployed systems.

Solution 10 - Php

protected $_convertTable = array(
    '&amp;' => 'and',   '@' => 'at',    '©' => 'c', '®' => 'r', 'À' => 'a',
    'Á' => 'a', 'Â' => 'a', 'Ä' => 'a', 'Å' => 'a', 'Æ' => 'ae','Ç' => 'c',
    'È' => 'e', 'É' => 'e', 'Ë' => 'e', 'Ì' => 'i', 'Í' => 'i', 'Î' => 'i',
    'Ï' => 'i', 'Ò' => 'o', 'Ó' => 'o', 'Ô' => 'o', 'Õ' => 'o', 'Ö' => 'o',
    'Ø' => 'o', 'Ù' => 'u', 'Ú' => 'u', 'Û' => 'u', 'Ü' => 'u', 'Ý' => 'y',
    'ß' => 'ss','à' => 'a', 'á' => 'a', 'â' => 'a', 'ä' => 'a', 'å' => 'a',
    'æ' => 'ae','ç' => 'c', 'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e',
    'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'ò' => 'o', 'ó' => 'o',
    'ô' => 'o', 'õ' => 'o', 'ö' => 'o', 'ø' => 'o', 'ù' => 'u', 'ú' => 'u',
    'û' => 'u', 'ü' => 'u', 'ý' => 'y', 'þ' => 'p', 'ÿ' => 'y', 'Ā' => 'a',
    'ā' => 'a', 'Ă' => 'a', 'ă' => 'a', 'Ą' => 'a', 'ą' => 'a', 'Ć' => 'c',
    'ć' => 'c', 'Ĉ' => 'c', 'ĉ' => 'c', 'Ċ' => 'c', 'ċ' => 'c', 'Č' => 'c',
    'č' => 'c', 'Ď' => 'd', 'ď' => 'd', 'Đ' => 'd', 'đ' => 'd', 'Ē' => 'e',
    'ē' => 'e', 'Ĕ' => 'e', 'ĕ' => 'e', 'Ė' => 'e', 'ė' => 'e', 'Ę' => 'e',
    'ę' => 'e', 'Ě' => 'e', 'ě' => 'e', 'Ĝ' => 'g', 'ĝ' => 'g', 'Ğ' => 'g',
    'ğ' => 'g', 'Ġ' => 'g', 'ġ' => 'g', 'Ģ' => 'g', 'ģ' => 'g', 'Ĥ' => 'h',
    'ĥ' => 'h', 'Ħ' => 'h', 'ħ' => 'h', 'Ĩ' => 'i', 'ĩ' => 'i', 'Ī' => 'i',
    'ī' => 'i', 'Ĭ' => 'i', 'ĭ' => 'i', 'Į' => 'i', 'į' => 'i', 'İ' => 'i',
    'ı' => 'i', 'IJ' => 'ij','ij' => 'ij','Ĵ' => 'j', 'ĵ' => 'j', 'Ķ' => 'k',
    'ķ' => 'k', 'ĸ' => 'k', 'Ĺ' => 'l', 'ĺ' => 'l', 'Ļ' => 'l', 'ļ' => 'l',
    'Ľ' => 'l', 'ľ' => 'l', 'Ŀ' => 'l', 'ŀ' => 'l', 'Ł' => 'l', 'ł' => 'l',
    'Ń' => 'n', 'ń' => 'n', 'Ņ' => 'n', 'ņ' => 'n', 'Ň' => 'n', 'ň' => 'n',
    'ʼn' => 'n', 'Ŋ' => 'n', 'ŋ' => 'n', 'Ō' => 'o', 'ō' => 'o', 'Ŏ' => 'o',
    'ŏ' => 'o', 'Ő' => 'o', 'ő' => 'o', 'Œ' => 'oe','œ' => 'oe','Ŕ' => 'r',
    'ŕ' => 'r', 'Ŗ' => 'r', 'ŗ' => 'r', 'Ř' => 'r', 'ř' => 'r', 'Ś' => 's',
    'ś' => 's', 'Ŝ' => 's', 'ŝ' => 's', 'Ş' => 's', 'ş' => 's', 'Š' => 's',
    'š' => 's', 'Ţ' => 't', 'ţ' => 't', 'Ť' => 't', 'ť' => 't', 'Ŧ' => 't',
    'ŧ' => 't', 'Ũ' => 'u', 'ũ' => 'u', 'Ū' => 'u', 'ū' => 'u', 'Ŭ' => 'u',
    'ŭ' => 'u', 'Ů' => 'u', 'ů' => 'u', 'Ű' => 'u', 'ű' => 'u', 'Ų' => 'u',
    'ų' => 'u', 'Ŵ' => 'w', 'ŵ' => 'w', 'Ŷ' => 'y', 'ŷ' => 'y', 'Ÿ' => 'y',
    'Ź' => 'z', 'ź' => 'z', 'Ż' => 'z', 'ż' => 'z', 'Ž' => 'z', 'ž' => 'z',
    'ſ' => 'z', 'Ə' => 'e', 'ƒ' => 'f', 'Ơ' => 'o', 'ơ' => 'o', 'Ư' => 'u',
    'ư' => 'u', 'Ǎ' => 'a', 'ǎ' => 'a', 'Ǐ' => 'i', 'ǐ' => 'i', 'Ǒ' => 'o',
    'ǒ' => 'o', 'Ǔ' => 'u', 'ǔ' => 'u', 'Ǖ' => 'u', 'ǖ' => 'u', 'Ǘ' => 'u',
    'ǘ' => 'u', 'Ǚ' => 'u', 'ǚ' => 'u', 'Ǜ' => 'u', 'ǜ' => 'u', 'Ǻ' => 'a',
    'ǻ' => 'a', 'Ǽ' => 'ae','ǽ' => 'ae','Ǿ' => 'o', 'ǿ' => 'o', 'ə' => 'e',
    'Ё' => 'jo','Є' => 'e', 'І' => 'i', 'Ї' => 'i', 'А' => 'a', 'Б' => 'b',
    'В' => 'v', 'Г' => 'g', 'Д' => 'd', 'Е' => 'e', 'Ж' => 'zh','З' => 'z',
    'И' => 'i', 'Й' => 'j', 'К' => 'k', 'Л' => 'l', 'М' => 'm', 'Н' => 'n',
    'О' => 'o', 'П' => 'p', 'Р' => 'r', 'С' => 's', 'Т' => 't', 'У' => 'u',
    'Ф' => 'f', 'Х' => 'h', 'Ц' => 'c', 'Ч' => 'ch','Ш' => 'sh','Щ' => 'sch',
    'Ъ' => '-', 'Ы' => 'y', 'Ь' => '-', 'Э' => 'je','Ю' => 'ju','Я' => 'ja',
    'а' => 'a', 'б' => 'b', 'в' => 'v', 'г' => 'g', 'д' => 'd', 'е' => 'e',
    'ж' => 'zh','з' => 'z', 'и' => 'i', 'й' => 'j', 'к' => 'k', 'л' => 'l',
    'м' => 'm', 'н' => 'n', 'о' => 'o', 'п' => 'p', 'р' => 'r', 'с' => 's',
    'т' => 't', 'у' => 'u', 'ф' => 'f', 'х' => 'h', 'ц' => 'c', 'ч' => 'ch',
    'ш' => 'sh','щ' => 'sch','ъ' => '-','ы' => 'y', 'ь' => '-', 'э' => 'je',
    'ю' => 'ju','я' => 'ja','ё' => 'jo','є' => 'e', 'і' => 'i', 'ї' => 'i',
    'Ґ' => 'g', 'ґ' => 'g', 'א' => 'a', 'ב' => 'b', 'ג' => 'g', 'ד' => 'd',
    'ה' => 'h', 'ו' => 'v', 'ז' => 'z', 'ח' => 'h', 'ט' => 't', 'י' => 'i',
    'ך' => 'k', 'כ' => 'k', 'ל' => 'l', 'ם' => 'm', 'מ' => 'm', 'ן' => 'n',
    'נ' => 'n', 'ס' => 's', 'ע' => 'e', 'ף' => 'p', 'פ' => 'p', 'ץ' => 'C',
    'צ' => 'c', 'ק' => 'q', 'ר' => 'r', 'ש' => 'w', 'ת' => 't', '™' => 'tm',
);

From magento, im using it for basically everything

Solution 11 - Php

I've searched and your idea for accent striping is quite awesome and cost-effective but your regex is wrongly done and misses 2 extra params. Long story short the regex must be:

$patterns[0] = '/[áâàåä]/ui';
$patterns[1] = '/[ðéêèë]/ui';
$patterns[2] = '/[íîìï]/ui';
$patterns[3] = '/[óôòøõö]/ui';
$patterns[4] = '/[úûùü]/ui';
$patterns[5] = '/æ/ui';
$patterns[6] = '/ç/ui';
$patterns[7] = '/ß/ui';
$replacements[0] = 'a';
$replacements[1] = 'e';
$replacements[2] = 'i';
$replacements[3] = 'o';
$replacements[4] = 'u';
$replacements[5] = 'ae';
$replacements[6] = 'c';
$replacements[7] = 'ss';

As you can see is quite similar but the most important thing is the paramas after the second slash of the regular expression. When a regualr expression is like this /[someCoolRegex]/ui the u specifies that it must use unicode and the i specifies that is case insensitive, I've tested my own and with the ansewer in this forum I must say is more cost efective than using strtr.

Hope someone reads this answer.

Solution 12 - Php

> Disclaimer: I'm not supporting this answer anymore (I was blind at that time). But thanks for the up-votes =P

You can take this as basis. From WordPress, used to generate pretty urls (the entry point is the slugify() function):

/**
 * Converts all accent characters to ASCII characters.
 *
 * If there are no accent characters, then the string given is just returned.
 *
 * @param string $string Text that might have accent characters
 * @return string Filtered string with replaced "nice" characters.
 */

function remove_accents($string) {
 if (!preg_match('/[\x80-\xff]/', $string))
  return $string;
 if (seems_utf8($string)) {
  $chars = array(
  // Decompositions for Latin-1 Supplement
  chr(195).chr(128) => 'A', chr(195).chr(129) => 'A',
  chr(195).chr(130) => 'A', chr(195).chr(131) => 'A',
  chr(195).chr(132) => 'A', chr(195).chr(133) => 'A',
  chr(195).chr(135) => 'C', chr(195).chr(136) => 'E',
  chr(195).chr(137) => 'E', chr(195).chr(138) => 'E',
  chr(195).chr(139) => 'E', chr(195).chr(140) => 'I',
  chr(195).chr(141) => 'I', chr(195).chr(142) => 'I',
  chr(195).chr(143) => 'I', chr(195).chr(145) => 'N',
  chr(195).chr(146) => 'O', chr(195).chr(147) => 'O',
  chr(195).chr(148) => 'O', chr(195).chr(149) => 'O',
  chr(195).chr(150) => 'O', chr(195).chr(153) => 'U',
  chr(195).chr(154) => 'U', chr(195).chr(155) => 'U',
  chr(195).chr(156) => 'U', chr(195).chr(157) => 'Y',
  chr(195).chr(159) => 's', chr(195).chr(160) => 'a',
  chr(195).chr(161) => 'a', chr(195).chr(162) => 'a',
  chr(195).chr(163) => 'a', chr(195).chr(164) => 'a',
  chr(195).chr(165) => 'a', chr(195).chr(167) => 'c',
  chr(195).chr(168) => 'e', chr(195).chr(169) => 'e',
  chr(195).chr(170) => 'e', chr(195).chr(171) => 'e',
  chr(195).chr(172) => 'i', chr(195).chr(173) => 'i',
  chr(195).chr(174) => 'i', chr(195).chr(175) => 'i',
  chr(195).chr(177) => 'n', chr(195).chr(178) => 'o',
  chr(195).chr(179) => 'o', chr(195).chr(180) => 'o',
  chr(195).chr(181) => 'o', chr(195).chr(182) => 'o',
  chr(195).chr(182) => 'o', chr(195).chr(185) => 'u',
  chr(195).chr(186) => 'u', chr(195).chr(187) => 'u',
  chr(195).chr(188) => 'u', chr(195).chr(189) => 'y',
  chr(195).chr(191) => 'y',
  // Decompositions for Latin Extended-A
  chr(196).chr(128) => 'A', chr(196).chr(129) => 'a',
  chr(196).chr(130) => 'A', chr(196).chr(131) => 'a',
  chr(196).chr(132) => 'A', chr(196).chr(133) => 'a',
  chr(196).chr(134) => 'C', chr(196).chr(135) => 'c',
  chr(196).chr(136) => 'C', chr(196).chr(137) => 'c',
  chr(196).chr(138) => 'C', chr(196).chr(139) => 'c',
  chr(196).chr(140) => 'C', chr(196).chr(141) => 'c',
  chr(196).chr(142) => 'D', chr(196).chr(143) => 'd',
  chr(196).chr(144) => 'D', chr(196).chr(145) => 'd',
  chr(196).chr(146) => 'E', chr(196).chr(147) => 'e',
  chr(196).chr(148) => 'E', chr(196).chr(149) => 'e',
  chr(196).chr(150) => 'E', chr(196).chr(151) => 'e',
  chr(196).chr(152) => 'E', chr(196).chr(153) => 'e',
  chr(196).chr(154) => 'E', chr(196).chr(155) => 'e',
  chr(196).chr(156) => 'G', chr(196).chr(157) => 'g',
  chr(196).chr(158) => 'G', chr(196).chr(159) => 'g',
  chr(196).chr(160) => 'G', chr(196).chr(161) => 'g',
  chr(196).chr(162) => 'G', chr(196).chr(163) => 'g',
  chr(196).chr(164) => 'H', chr(196).chr(165) => 'h',
  chr(196).chr(166) => 'H', chr(196).chr(167) => 'h',
  chr(196).chr(168) => 'I', chr(196).chr(169) => 'i',
  chr(196).chr(170) => 'I', chr(196).chr(171) => 'i',
  chr(196).chr(172) => 'I', chr(196).chr(173) => 'i',
  chr(196).chr(174) => 'I', chr(196).chr(175) => 'i',
  chr(196).chr(176) => 'I', chr(196).chr(177) => 'i',
  chr(196).chr(178) => 'IJ',chr(196).chr(179) => 'ij',
  chr(196).chr(180) => 'J', chr(196).chr(181) => 'j',
  chr(196).chr(182) => 'K', chr(196).chr(183) => 'k',
  chr(196).chr(184) => 'k', chr(196).chr(185) => 'L',
  chr(196).chr(186) => 'l', chr(196).chr(187) => 'L',
  chr(196).chr(188) => 'l', chr(196).chr(189) => 'L',
  chr(196).chr(190) => 'l', chr(196).chr(191) => 'L',
  chr(197).chr(128) => 'l', chr(197).chr(129) => 'L',
  chr(197).chr(130) => 'l', chr(197).chr(131) => 'N',
  chr(197).chr(132) => 'n', chr(197).chr(133) => 'N',
  chr(197).chr(134) => 'n', chr(197).chr(135) => 'N',
  chr(197).chr(136) => 'n', chr(197).chr(137) => 'N',
  chr(197).chr(138) => 'n', chr(197).chr(139) => 'N',
  chr(197).chr(140) => 'O', chr(197).chr(141) => 'o',
  chr(197).chr(142) => 'O', chr(197).chr(143) => 'o',
  chr(197).chr(144) => 'O', chr(197).chr(145) => 'o',
  chr(197).chr(146) => 'OE',chr(197).chr(147) => 'oe',
  chr(197).chr(148) => 'R',chr(197).chr(149) => 'r',
  chr(197).chr(150) => 'R',chr(197).chr(151) => 'r',
  chr(197).chr(152) => 'R',chr(197).chr(153) => 'r',
  chr(197).chr(154) => 'S',chr(197).chr(155) => 's',
  chr(197).chr(156) => 'S',chr(197).chr(157) => 's',
  chr(197).chr(158) => 'S',chr(197).chr(159) => 's',
  chr(197).chr(160) => 'S', chr(197).chr(161) => 's',
  chr(197).chr(162) => 'T', chr(197).chr(163) => 't',
  chr(197).chr(164) => 'T', chr(197).chr(165) => 't',
  chr(197).chr(166) => 'T', chr(197).chr(167) => 't',
  chr(197).chr(168) => 'U', chr(197).chr(169) => 'u',
  chr(197).chr(170) => 'U', chr(197).chr(171) => 'u',
  chr(197).chr(172) => 'U', chr(197).chr(173) => 'u',
  chr(197).chr(174) => 'U', chr(197).chr(175) => 'u',
  chr(197).chr(176) => 'U', chr(197).chr(177) => 'u',
  chr(197).chr(178) => 'U', chr(197).chr(179) => 'u',
  chr(197).chr(180) => 'W', chr(197).chr(181) => 'w',
  chr(197).chr(182) => 'Y', chr(197).chr(183) => 'y',
  chr(197).chr(184) => 'Y', chr(197).chr(185) => 'Z',
  chr(197).chr(186) => 'z', chr(197).chr(187) => 'Z',
  chr(197).chr(188) => 'z', chr(197).chr(189) => 'Z',
  chr(197).chr(190) => 'z', chr(197).chr(191) => 's',
  // Euro Sign
  chr(226).chr(130).chr(172) => 'E',
  // GBP (Pound) Sign
  chr(194).chr(163) => '');
  $string = strtr($string, $chars);
 } else {
  // Assume ISO-8859-1 if not UTF-8
  $chars['in'] = chr(128).chr(131).chr(138).chr(142).chr(154).chr(158)
   .chr(159).chr(162).chr(165).chr(181).chr(192).chr(193).chr(194)
   .chr(195).chr(196).chr(197).chr(199).chr(200).chr(201).chr(202)
   .chr(203).chr(204).chr(205).chr(206).chr(207).chr(209).chr(210)
   .chr(211).chr(212).chr(213).chr(214).chr(216).chr(217).chr(218)
   .chr(219).chr(220).chr(221).chr(224).chr(225).chr(226).chr(227)
   .chr(228).chr(229).chr(231).chr(232).chr(233).chr(234).chr(235)
   .chr(236).chr(237).chr(238).chr(239).chr(241).chr(242).chr(243)
   .chr(244).chr(245).chr(246).chr(248).chr(249).chr(250).chr(251)
   .chr(252).chr(253).chr(255);
  $chars['out'] = "EfSZszYcYuAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy";
  $string = strtr($string, $chars['in'], $chars['out']);
  $double_chars['in'] = array(chr(140), chr(156), chr(198), chr(208), chr(222), chr(223), chr(230), chr(240), chr(254));
  $double_chars['out'] = array('OE', 'oe', 'AE', 'DH', 'TH', 'ss', 'ae', 'dh', 'th');
  $string = str_replace($double_chars['in'], $double_chars['out'], $string);
 }
 return $string;
}

/**
 * Checks to see if a string is utf8 encoded.
 *
 * @author bmorel at ssi dot fr
 *
 * @param string $Str The string to be checked
 * @return bool True if $Str fits a UTF-8 model, false otherwise.
 */
function seems_utf8($Str) { # by bmorel at ssi dot fr
 $length = strlen($Str);
 for ($i = 0; $i < $length; $i++) {
  if (ord($Str[$i]) < 0x80) continue; # 0bbbbbbb
  elseif ((ord($Str[$i]) & 0xE0) == 0xC0) $n = 1; # 110bbbbb
  elseif ((ord($Str[$i]) & 0xF0) == 0xE0) $n = 2; # 1110bbbb
  elseif ((ord($Str[$i]) & 0xF8) == 0xF0) $n = 3; # 11110bbb
  elseif ((ord($Str[$i]) & 0xFC) == 0xF8) $n = 4; # 111110bb
  elseif ((ord($Str[$i]) & 0xFE) == 0xFC) $n = 5; # 1111110b
  else return false; # Does not match any model
  for ($j = 0; $j < $n; $j++) { # n bytes matching 10bbbbbb follow ?
   if ((++$i == $length) || ((ord($Str[$i]) & 0xC0) != 0x80))
   return false;
  }
 }
 return true;
}

function utf8_uri_encode($utf8_string, $length = 0) {
 $unicode = '';
 $values = array();
 $num_octets = 1;
 $unicode_length = 0;
 $string_length = strlen($utf8_string);
 for ($i = 0; $i < $string_length; $i++) {
  $value = ord($utf8_string[$i]);
  if ($value < 128) {
   if ($length && ($unicode_length >= $length))
    break;
   $unicode .= chr($value);
   $unicode_length++;
  } else {
   if (count($values) == 0) $num_octets = ($value < 224) ? 2 : 3;
   $values[] = $value;
   if ($length && ($unicode_length + ($num_octets * 3)) > $length)
    break;
   if (count( $values ) == $num_octets) {
    if ($num_octets == 3) {
     $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]) . '%' . dechex($values[2]);
     $unicode_length += 9;
    } else {
     $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]);
     $unicode_length += 6;
    }
    $values = array();
    $num_octets = 1;
   }
  }
 }
 return $unicode;
}

/**
 * Sanitizes title, replacing whitespace with dashes.
 *
 * Limits the output to alphanumeric characters, underscore (_) and dash (-).
 * Whitespace becomes a dash.
 *
 * @param string $title The title to be sanitized.
 * @return string The sanitized title.
 */
function slugify($title) {
 $title = strip_tags($title);
 // Preserve escaped octets.
 $title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
 // Remove percent signs that are not part of an octet.
 $title = str_replace('%', '', $title);
 // Restore octets.
 $title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);
 $title = remove_accents($title);
 if (seems_utf8($title)) {
  if (function_exists('mb_strtolower')) {
   $title = mb_strtolower($title, 'UTF-8');
  }
  $title = utf8_uri_encode($title, 200);
 }
 $title = strtolower($title);
 $title = preg_replace('/&.+?;/', '', $title); // kill entities
 $title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
 $title = preg_replace('/\s+/', '-', $title);
 $title = preg_replace('|-+|', '-', $title);
 $title = trim($title, '-');
 return $title;
}

Solution 13 - Php

strtolower only works on iso-8859-1 encoded strings. You could try with mb_strtolower.

Or, if you have to mangle with multibyte-extensions, you might as well use iconv's transliteration support:

iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text);

Edit:

It seems I was a bit fast. You appear to use iso-8859-1, so your current strategy will work. You just need to write the regexp's properly. Eg.:

'/(ð|é|ê|è|ë)/'

not:

'/[ð|é|ê|è|ë]/'

Solution 14 - Php

You can use PHP strtr() function to get rid of accented characters :

$string = "Éric Cantona";
$accented_array = array('Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E','Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U','Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c','è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o','ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y' );

$required_str = strtr( $string, $accented_array );

Solution 15 - Php

I know, that question has been asked a long long time ago...

I was looking for a short and elegant solution, but couldn't find satisfaction for two reasons:

First, most of the existing solutions replace a list of characters by a list of other characters. Unfortunately, it require to use a specific encoding for the php script file itself which might be unwanted.

Second, using iconv seems to be a good way, but it's not enough as the result of a converted character could be one or two characters, or a Fatal Exception.

So I wrote that small function which does the job :

function replaceAccent($string, $replacement = '_')
{
    $alnumPattern = '/^[a-zA-Z0-9 ]+$/';

    if (preg_match($alnumPattern, $string)) {
        return $string;
    }

    $ret = array_map(
        function ($chr) use ($alnumPattern, $replacement) {
            if (preg_match($alnumPattern, $chr)) {
                return $chr;
            } else {
                $chr = @iconv('ISO-8859-1', 'ASCII//TRANSLIT', $chr);
                if (strlen($chr) == 1) {
                    return $chr;
                } elseif (strlen($chr) > 1) {
                    $ret = '';
                    foreach (str_split($chr) as $char2) {
                        if (preg_match($alnumPattern, $char2)) {
                            $ret .= $char2;
                        }
                    }
                    return $ret;
                } else {
                    // replace whatever iconv fail to convert by something else
                    return $replacement;
                }
            }
        },
        str_split($string)
    );

    return implode($ret);
}

Solution 16 - Php

As an alternative (a bit more complex in nature through), have a look at how wordpress does accent removal. Made some changes below to make it run independently without referencing wordpress functions.

     function mbstring_binary_safe_encoding($reset = false)
{
    static $encodings  = array();
    static $overloaded = null;

    if (is_null($overloaded)) {
        $overloaded = function_exists('mb_internal_encoding') && (ini_get('mbstring.func_overload') & 2);
    }

    if (false === $overloaded) {
        return;
    }

    if (!$reset) {
        $encoding = mb_internal_encoding();
        array_push($encodings, $encoding);
        mb_internal_encoding('ISO-8859-1');
    }

    if ($reset && $encodings) {
        $encoding = array_pop($encodings);
        mb_internal_encoding($encoding);
    }
}

function seems_utf8($str)
{
    mbstring_binary_safe_encoding();
    $length = strlen($str);
    mbstring_binary_safe_encoding(true);
    for ($i = 0; $i < $length; $i++) {
        $c = ord($str[$i]);
        if ($c < 0x80) {
            $n = 0;
        }
        // 0bbbbbbb
        elseif (($c & 0xE0) == 0xC0) {
            $n = 1;
        }
        // 110bbbbb
        elseif (($c & 0xF0) == 0xE0) {
            $n = 2;
        }
        // 1110bbbb
        elseif (($c & 0xF8) == 0xF0) {
            $n = 3;
        }
        // 11110bbb
        elseif (($c & 0xFC) == 0xF8) {
            $n = 4;
        }
        // 111110bb
        elseif (($c & 0xFE) == 0xFC) {
            $n = 5;
        }
        // 1111110b
        else {
                return false;
            }
            // Does not match any model
            for ($j = 0; $j < $n; $j++) {
                // n bytes matching 10bbbbbb follow ?
                if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80)) {
                    return false;
                }

            }
        }
        return true;
    }

    function remove_accents($string)
{
        if (!preg_match('/[\x80-\xff]/', $string)) {
            return $string;
        }

        if (seems_utf8($string)) {
            $chars = array(
                // Decompositions for Latin-1 Supplement
                'ª' => 'a', 'º'  => 'o',
                'À' => 'A', 'Á'  => 'A',
                'Â' => 'A', 'Ã'  => 'A',
                'Ä' => 'A', 'Å'  => 'A',
                'Æ' => 'AE', 'Ç' => 'C',
                'È' => 'E', 'É'  => 'E',
                'Ê' => 'E', 'Ë'  => 'E',
                'Ì' => 'I', 'Í'  => 'I',
                'Î' => 'I', 'Ï'  => 'I',
                'Ð' => 'D', 'Ñ'  => 'N',
                'Ò' => 'O', 'Ó'  => 'O',
                'Ô' => 'O', 'Õ'  => 'O',
                'Ö' => 'O', 'Ù'  => 'U',
                'Ú' => 'U', 'Û'  => 'U',
                'Ü' => 'U', 'Ý'  => 'Y',
                'Þ' => 'TH', 'ß' => 's',
                'à' => 'a', 'á'  => 'a',
                'â' => 'a', 'ã'  => 'a',
                'ä' => 'a', 'å'  => 'a',
                'æ' => 'ae', 'ç' => 'c',
                'è' => 'e', 'é'  => 'e',
                'ê' => 'e', 'ë'  => 'e',
                'ì' => 'i', 'í'  => 'i',
                'î' => 'i', 'ï'  => 'i',
                'ð' => 'd', 'ñ'  => 'n',
                'ò' => 'o', 'ó'  => 'o',
                'ô' => 'o', 'õ'  => 'o',
                'ö' => 'o', 'ø'  => 'o',
                'ù' => 'u', 'ú'  => 'u',
                'û' => 'u', 'ü'  => 'u',
                'ý' => 'y', 'þ'  => 'th',
                'ÿ' => 'y', 'Ø'  => 'O',
                // Decompositions for Latin Extended-A
                'Ā' => 'A', 'ā'  => 'a',
                'Ă' => 'A', 'ă'  => 'a',
                'Ą' => 'A', 'ą'  => 'a',
                'Ć' => 'C', 'ć'  => 'c',
                'Ĉ' => 'C', 'ĉ'  => 'c',
                'Ċ' => 'C', 'ċ'  => 'c',
                'Č' => 'C', 'č'  => 'c',
                'Ď' => 'D', 'ď'  => 'd',
                'Đ' => 'D', 'đ'  => 'd',
                'Ē' => 'E', 'ē'  => 'e',
                'Ĕ' => 'E', 'ĕ'  => 'e',
                'Ė' => 'E', 'ė'  => 'e',
                'Ę' => 'E', 'ę'  => 'e',
                'Ě' => 'E', 'ě'  => 'e',
                'Ĝ' => 'G', 'ĝ'  => 'g',
                'Ğ' => 'G', 'ğ'  => 'g',
                'Ġ' => 'G', 'ġ'  => 'g',
                'Ģ' => 'G', 'ģ'  => 'g',
                'Ĥ' => 'H', 'ĥ'  => 'h',
                'Ħ' => 'H', 'ħ'  => 'h',
                'Ĩ' => 'I', 'ĩ'  => 'i',
                'Ī' => 'I', 'ī'  => 'i',
                'Ĭ' => 'I', 'ĭ'  => 'i',
                'Į' => 'I', 'į'  => 'i',
                'İ' => 'I', 'ı'  => 'i',
                'IJ' => 'IJ', 'ij' => 'ij',
                'Ĵ' => 'J', 'ĵ'  => 'j',
                'Ķ' => 'K', 'ķ'  => 'k',
                'ĸ' => 'k', 'Ĺ'  => 'L',
                'ĺ' => 'l', 'Ļ'  => 'L',
                'ļ' => 'l', 'Ľ'  => 'L',
                'ľ' => 'l', 'Ŀ'  => 'L',
                'ŀ' => 'l', 'Ł'  => 'L',
                'ł' => 'l', 'Ń'  => 'N',
                'ń' => 'n', 'Ņ'  => 'N',
                'ņ' => 'n', 'Ň'  => 'N',
                'ň' => 'n', 'ʼn'  => 'n',
                'Ŋ' => 'N', 'ŋ'  => 'n',
                'Ō' => 'O', 'ō'  => 'o',
                'Ŏ' => 'O', 'ŏ'  => 'o',
                'Ő' => 'O', 'ő'  => 'o',
                'Œ' => 'OE', 'œ' => 'oe',
                'Ŕ' => 'R', 'ŕ'  => 'r',
                'Ŗ' => 'R', 'ŗ'  => 'r',
                'Ř' => 'R', 'ř'  => 'r',
                'Ś' => 'S', 'ś'  => 's',
                'Ŝ' => 'S', 'ŝ'  => 's',
                'Ş' => 'S', 'ş'  => 's',
                'Š' => 'S', 'š'  => 's',
                'Ţ' => 'T', 'ţ'  => 't',
                'Ť' => 'T', 'ť'  => 't',
                'Ŧ' => 'T', 'ŧ'  => 't',
                'Ũ' => 'U', 'ũ'  => 'u',
                'Ū' => 'U', 'ū'  => 'u',
                'Ŭ' => 'U', 'ŭ'  => 'u',
                'Ů' => 'U', 'ů'  => 'u',
                'Ű' => 'U', 'ű'  => 'u',
                'Ų' => 'U', 'ų'  => 'u',
                'Ŵ' => 'W', 'ŵ'  => 'w',
                'Ŷ' => 'Y', 'ŷ'  => 'y',
                'Ÿ' => 'Y', 'Ź'  => 'Z',
                'ź' => 'z', 'Ż'  => 'Z',
                'ż' => 'z', 'Ž'  => 'Z',
                'ž' => 'z', 'ſ'  => 's',
                // Decompositions for Latin Extended-B
                'Ș' => 'S', 'ș'  => 's',
                'Ț' => 'T', 'ț'  => 't',
                // Euro Sign
                '€' => 'E',
                // GBP (Pound) Sign
                '£' => '',
                // Vowels with diacritic (Vietnamese)
                // unmarked
                'Ơ' => 'O', 'ơ'  => 'o',
                'Ư' => 'U', 'ư'  => 'u',
                // grave accent
                'Ầ' => 'A', 'ầ'  => 'a',
                'Ằ' => 'A', 'ằ'  => 'a',
                'Ề' => 'E', 'ề'  => 'e',
                'Ồ' => 'O', 'ồ'  => 'o',
                'Ờ' => 'O', 'ờ'  => 'o',
                'Ừ' => 'U', 'ừ'  => 'u',
                'Ỳ' => 'Y', 'ỳ'  => 'y',
                // hook
                'Ả' => 'A', 'ả'  => 'a',
                'Ẩ' => 'A', 'ẩ'  => 'a',
                'Ẳ' => 'A', 'ẳ'  => 'a',
                'Ẻ' => 'E', 'ẻ'  => 'e',
                'Ể' => 'E', 'ể'  => 'e',
                'Ỉ' => 'I', 'ỉ'  => 'i',
                'Ỏ' => 'O', 'ỏ'  => 'o',
                'Ổ' => 'O', 'ổ'  => 'o',
                'Ở' => 'O', 'ở'  => 'o',
                'Ủ' => 'U', 'ủ'  => 'u',
                'Ử' => 'U', 'ử'  => 'u',
                'Ỷ' => 'Y', 'ỷ'  => 'y',
                // tilde
                'Ẫ' => 'A', 'ẫ'  => 'a',
                'Ẵ' => 'A', 'ẵ'  => 'a',
                'Ẽ' => 'E', 'ẽ'  => 'e',
                'Ễ' => 'E', 'ễ'  => 'e',
                'Ỗ' => 'O', 'ỗ'  => 'o',
                'Ỡ' => 'O', 'ỡ'  => 'o',
                'Ữ' => 'U', 'ữ'  => 'u',
                'Ỹ' => 'Y', 'ỹ'  => 'y',
                // acute accent
                'Ấ' => 'A', 'ấ'  => 'a',
                'Ắ' => 'A', 'ắ'  => 'a',
                'Ế' => 'E', 'ế'  => 'e',
                'Ố' => 'O', 'ố'  => 'o',
                'Ớ' => 'O', 'ớ'  => 'o',
                'Ứ' => 'U', 'ứ'  => 'u',
                // dot below
                'Ạ' => 'A', 'ạ'  => 'a',
                'Ậ' => 'A', 'ậ'  => 'a',
                'Ặ' => 'A', 'ặ'  => 'a',
                'Ẹ' => 'E', 'ẹ'  => 'e',
                'Ệ' => 'E', 'ệ'  => 'e',
                'Ị' => 'I', 'ị'  => 'i',
                'Ọ' => 'O', 'ọ'  => 'o',
                'Ộ' => 'O', 'ộ'  => 'o',
                'Ợ' => 'O', 'ợ'  => 'o',
                'Ụ' => 'U', 'ụ'  => 'u',
                'Ự' => 'U', 'ự'  => 'u',
                'Ỵ' => 'Y', 'ỵ'  => 'y',
                // Vowels with diacritic (Chinese, Hanyu Pinyin)
                'ɑ' => 'a',
                // macron
                'Ǖ' => 'U', 'ǖ'  => 'u',
                // acute accent
                'Ǘ' => 'U', 'ǘ'  => 'u',
                // caron
                'Ǎ' => 'A', 'ǎ'  => 'a',
                'Ǐ' => 'I', 'ǐ'  => 'i',
                'Ǒ' => 'O', 'ǒ'  => 'o',
                'Ǔ' => 'U', 'ǔ'  => 'u',
                'Ǚ' => 'U', 'ǚ'  => 'u',
                // grave accent
                'Ǜ' => 'U', 'ǜ'  => 'u',
            );

            $string = strtr($string, $chars);
        } else {
            $chars = array();
            // Assume ISO-8859-1 if not UTF-8
            $chars['in'] = "\x80\x83\x8a\x8e\x9a\x9e"
                . "\x9f\xa2\xa5\xb5\xc0\xc1\xc2"
                . "\xc3\xc4\xc5\xc7\xc8\xc9\xca"
                . "\xcb\xcc\xcd\xce\xcf\xd1\xd2"
                . "\xd3\xd4\xd5\xd6\xd8\xd9\xda"
                . "\xdb\xdc\xdd\xe0\xe1\xe2\xe3"
                . "\xe4\xe5\xe7\xe8\xe9\xea\xeb"
                . "\xec\xed\xee\xef\xf1\xf2\xf3"
                . "\xf4\xf5\xf6\xf8\xf9\xfa\xfb"
                . "\xfc\xfd\xff";

            $chars['out'] = "EfSZszYcYuAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy";

            $string              = strtr($string, $chars['in'], $chars['out']);
            $double_chars        = array();
            $double_chars['in']  = array("\x8c", "\x9c", "\xc6", "\xd0", "\xde", "\xdf", "\xe6", "\xf0", "\xfe");
            $double_chars['out'] = array('OE', 'oe', 'AE', 'DH', 'TH', 'ss', 'ae', 'dh', 'th');
            $string              = str_replace($double_chars['in'], $double_chars['out'], $string);
        }

        return $string;
    }

Solution 17 - Php

Vietnamese characters for those who need them

'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
                            'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
                            'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c',
                            'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
                            'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y' );
$str = strtr( $str, $unwanted_array );

Solution 18 - Php

You can try this one

class Diacritic
{
    public function replaceDiacritic($input)
    {
        $input = iconv('UTF-8','ASCII//TRANSLIT',$input);
        $input = preg_replace("/['|^|`|~|]/","",$input);
        $input = preg_replace('/["]/','',$input);
        return preg_replace('/[" "]/','_',$input);
    }
}

Solution 19 - Php

Adding a little bit to what Lizard said, it worked to display correctly on web page, but added some other codes to complete what I was looking for replacing my tags to search correctly into my database with special characters. Thanks in advance.

$unwanted_array = array(    'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
							'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
							'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c',
							'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
							'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 
							'&#225;'=>'a', '&#233;'=>'e', '&#237;'=>'i', '&#243;'=>'o', '&#250;'=>'u',  
							'&#193;'=>'A', '&#201;'=>'E', '&#205;'=>'I', '&#211;'=>'O', '&#218;'=>'U',
							'&#209;'=>'N', '&#241;'=>'n' );
$newtag = strtr( $newtag, $unwanted_array );

Solution 20 - Php

For All who wants to transform this umlauts to germany they can use this method:

public function handleGermanUmlauts(string $name) : string
{
// we need this line for preg_replace can work
	$name = htmlentities($name, ENT_COMPAT, 'UTF-8');
// this line is adding `e` character instead of suffix, except for `ee`
	$name = preg_replace('/&([a-df-zA-DF-Z])(uml|acute|grave|circ|tilde|ring);/', '$1e', $name);
// this line will make next line working for using iconv method
	$name = html_entity_decode($name);
// with iconv we are transferring all other characters like EUR and etc.
	$name = str_replace(array("\"", "'", "`", "^", "~"), "", iconv("utf-8", "ASCII//TRANSLIT", $name));

	return $name;
}

Solution 21 - Php

It's worked like magically, i have used only array, this pattern is worked for me. check this pattern

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionLizardView Question on Stackoverflow
Solution 1 - PhpLizardView Answer on Stackoverflow
Solution 2 - PhpmvdsView Answer on Stackoverflow
Solution 3 - PhpBurninLeoView Answer on Stackoverflow
Solution 4 - PhpItalyPaleAleView Answer on Stackoverflow
Solution 5 - PhpKwaadpepperView Answer on Stackoverflow
Solution 6 - PhpKasey ThomasView Answer on Stackoverflow
Solution 7 - PhpJazzpathsView Answer on Stackoverflow
Solution 8 - PhpStergios Zg.View Answer on Stackoverflow
Solution 9 - PhpgaboView Answer on Stackoverflow
Solution 10 - Phpuser3682119View Answer on Stackoverflow
Solution 11 - PhpColormanView Answer on Stackoverflow
Solution 12 - PhpKeyne VianaView Answer on Stackoverflow
Solution 13 - PhptroelsknView Answer on Stackoverflow
Solution 14 - PhpRahul GuptaView Answer on Stackoverflow
Solution 15 - PhpfrenusView Answer on Stackoverflow
Solution 16 - PhpSelayView Answer on Stackoverflow
Solution 17 - PhpPhuong LeView Answer on Stackoverflow
Solution 18 - PhpPokeView Answer on Stackoverflow
Solution 19 - PhpLuis H CabrejoView Answer on Stackoverflow
Solution 20 - PhpFarid shahidiView Answer on Stackoverflow
Solution 21 - PhpSIT DevelopersView Answer on Stackoverflow