String slugification in Python

Python Problem Overview

I am in search of the best way to "slugify" string what "slug" is, and my current solution is based on this recipe

I have changed it a little bit to:

s = 'String to slugify'

slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)

Anyone see any problems with this code? It is working fine, but maybe I am missing something or you know a better way?

Python Solutions

Solution 1 - Python

There is a python package named python-slugify, which does a pretty good job of slugifying:

pip install python-slugify

Works like this:

from slugify import slugify

txt = "This is a test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = "This -- is a ## test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = 'C\'est déjà l\'été.'
r = slugify(txt)
self.assertEquals(r, "cest-deja-lete")

txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
r = slugify(txt)
self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

txt = 'Компьютер'
r = slugify(txt)
self.assertEquals(r, "kompiuter")

txt = 'jaja---lol-méméméoo--a'
r = slugify(txt)
self.assertEquals(r, "jaja-lol-mememeoo-a")

See More examples

This package does a bit more than what you posted (take a look at the source, it's just one file). The project is still active (got updated 2 days before I originally answered, over nine years later (last checked 2022-03-30), it still gets updated).

careful: There is a second package around, named slugify. If you have both of them, you might get a problem, as they have the same name for import. The one just named slugify didn't do all I quick-checked: "Ich heiße" became "ich-heie" (should be "ich-heisse"), so be sure to pick the right one, when using pip or easy_install.

Solution 2 - Python

Install unidecode form from here for unicode support > pip install unidecode

# -*- coding: utf-8 -*-
import re
import unidecode

def slugify(text):
    text = unidecode.unidecode(text).lower()
    return re.sub(r'[\W_]+', '-', text)

text = u"My custom хелло ворлд"
print slugify(text)

>>>> my-custom-khello-vorld

Solution 3 - Python

There is python package named awesome-slugify:

pip install awesome-slugify

Works like this:

from slugify import slugify

slugify('one kožušček')  # one-kozuscek

awesome-slugify github page

Solution 4 - Python

It works well in Django, so I don't see why it wouldn't be a good general purpose slugify function.

Are you having any problems with it?

Solution 5 - Python

The problem is with the ascii normalization line:

slug = unicodedata.normalize('NFKD', s)

It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:

Mørdag -> mrdag
Æther -> ther

A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:

import unidecode
slug = unidecode.unidecode(s)

You get better results for the above strings and for many Greek and Russian characters too:

Mørdag -> mordag
Æther -> aether

Solution 6 - Python

def slugify(value):
    """
    Converts to lowercase, removes non-word characters (alphanumerics and
    underscores) and converts spaces to hyphens. Also strips leading and
    trailing whitespace.
    """
    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
    value = re.sub('[^\w\s-]', '', value).strip().lower()
    return mark_safe(re.sub('[-\s]+', '-', value))
slugify = allow_lazy(slugify, six.text_type)

This is the slugify function present in django.utils.text This should suffice your requirement.

Solution 7 - Python

Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one

Solution 8 - Python

A couple of options on GitHub:

Each supports slightly different parameters for its API, so you'll need to look through to figure out what you prefer.

In particular, pay attention to the different options they provide for dealing with non-ASCII characters. Pydanny wrote a very helpful blog post illustrating some of the unicode handling differences in these slugify'ing libraries: http://www.pydanny.com/awesome-slugify-human-readable-url-slugs-from-any-string.html This blog post is slightly outdated because Mozilla's unicode-slugify is no longer Django-specific.

Also note that currently awesome-slugify is GPLv3, though there's an open issue where the author says they'd prefer to release as MIT/BSD, just not sure of the legality: https://github.com/dimka665/awesome-slugify/issues/24

Solution 9 - Python

You might consider changing the last line to

slug=re.sub(r'--+',r'-',slug)

since the pattern [-]+ is no different than -+, and you don't really care about matching just one hyphen, only two or more.

But, of course, this is quite minor.

Solution 10 - Python

Another option is boltons.strutils.slugify. Boltons has quite a few other useful functions as well, and is distributed under a BSD license.

Solution 11 - Python

By your example, a fast manner to do that could be:

s = 'String to slugify'

slug = s.replace(" ", "-").lower()

Content Type	Original Author	Original Content on Stackoverflow
Question	Zygimantas	View Question on Stackoverflow
Solution 1 - Python	kratenko	View Answer on Stackoverflow
Solution 2 - Python	Normunds	View Answer on Stackoverflow
Solution 3 - Python	voronin	View Answer on Stackoverflow
Solution 4 - Python	Nick Presta	View Answer on Stackoverflow
Solution 5 - Python	Björn Lindqvist	View Answer on Stackoverflow
Solution 6 - Python	Animesh Sharma	View Answer on Stackoverflow
Solution 7 - Python	Mikhail Korobov	View Answer on Stackoverflow
Solution 8 - Python	Jeff Widman	View Answer on Stackoverflow
Solution 9 - Python	unutbu	View Answer on Stackoverflow
Solution 10 - Python	ostrokach	View Answer on Stackoverflow
Solution 11 - Python	claudius	View Answer on Stackoverflow