Remove all special characters and case from string in bash
RegexLinuxBashShellParsingRegex Problem Overview
I am writing a bash script that needs to parse filenames.
It will need to remove all special characters (including space): "!?.-_ and change all uppercase letters to lowercase. Something like this:
Some_randoM data1-A
More Data0
to:
somerandomdata1a
moredata0
I have seen lots of questions to do this in many different programming languages, but not in bash. Is there a good way to do this?
Regex Solutions
Solution 1 - Regex
cat yourfile.txt | tr -dc '[:alnum:]\n\r' | tr '[:upper:]' '[:lower:]'
The first tr
deletes special characters. d
means delete, c
means complement (invert the character set). So, -dc
means delete all characters except those specified. The \n
and \r
are included to preserve linux or windows style newlines, which I assume you want.
The second one translates uppercase characters to lowercase.
Solution 2 - Regex
Pure BASH 4+ solution:
$ filename='Some_randoM data1-A'
$ f=${filename//[^[:alnum:]]/}
$ echo "$f"
SomerandoMdata1A
$ echo "${f,,}"
somerandomdata1a
A function for this:
clean() {
local a=${1//[^[:alnum:]]/}
echo "${a,,}"
}
Try it:
$ clean "More Data0"
moredata0
Solution 3 - Regex
if you are using mkelement0 and Dan Bliss approach. You can also look into sed + POSIX regular expression.
cat yourfile.txt | sed 's/[^a-zA-Z0-9]//g'
Sed matches all other characters that are not contained within the brackets except letters and numbers and remove them.
Solution 4 - Regex
I've used tr
to remove any characters that are not part of [:print:]
class
cat file.txt | tr -dc '[:print:]'
or
echo "..." | tr -dc '[:print:]'
Additionally you might want to |
(pipe) the output to od -c
to confirm the result
cat file.txt | tr -dc '[:print:]' | od -c