How can I remove punctuation from input text in Java?

JavaRegexStringFormatting

Java Problem Overview


I am trying to get a sentence using input from the user in Java, and i need to make it lowercase and remove all punctuation. Here is my code:

    String[] words = instring.split("\\s+");
	for (int i = 0; i < words.length; i++) {
		words[i] = words[i].toLowerCase();
	}
	String[] wordsout = new String[50];
	Arrays.fill(wordsout,"");
	int e = 0;
	for (int i = 0; i < words.length; i++) {
		if (words[i] != "") {
			wordsout[e] = words[e];
			wordsout[e] = wordsout[e].replaceAll(" ", "");
			e++;
		}
	}
	return wordsout;

I cant seem to find any way to remove all non-letter characters. I have tried using regexes and iterators with no luck. Thanks for any help.

Java Solutions


Solution 1 - Java

This first removes all non-letter characters, folds to lowercase, then splits the input, doing all the work in a single line:

String[] words = instring.replaceAll("[^a-zA-Z ]", "").toLowerCase().split("\\s+");

Spaces are initially left in the input so the split will still work.

By removing the rubbish characters before splitting, you avoid having to loop through the elements.

Solution 2 - Java

You can use following regular expression construct

> Punctuation: One of !"#$%&'()*+,-./:;<=>?@[]^_`{|}~

inputString.replaceAll("\\p{Punct}", "");

Solution 3 - Java

You may try this:-

Scanner scan = new Scanner(System.in);
System.out.println("Type a sentence and press enter.");
String input = scan.nextLine();
String strippedInput = input.replaceAll("\\W", "");
System.out.println("Your string: " + strippedInput);

[^\w] matches a non-word character, so the above regular expression will match and remove all non-word characters.

Solution 4 - Java

If you don't want to use RegEx (which seems highly unnecessary given your problem), perhaps you should try something like this:

public String modified(final String input){
    final StringBuilder builder = new StringBuilder();
    for(final char c : input.toCharArray())
        if(Character.isLetterOrDigit(c))
            builder.append(Character.isLowerCase(c) ? c : Character.toLowerCase(c));
    return builder.toString();
}

It loops through the underlying char[] in the String and only appends the char if it is a letter or digit (filtering out all symbols, which I am assuming is what you are trying to accomplish) and then appends the lower case version of the char.

Solution 5 - Java

I don't like to use regex, so here is another simple solution.

public String removePunctuations(String s) {
    String res = "";
    for (Character c : s.toCharArray()) {
        if(Character.isLetterOrDigit(c))
            res += c;
    }
    return res;
}

Note: This will include both Letters and Digits

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTheDoctorView Question on Stackoverflow
Solution 1 - JavaBohemianView Answer on Stackoverflow
Solution 2 - JavaravthiruView Answer on Stackoverflow
Solution 3 - JavaRahul TripathiView Answer on Stackoverflow
Solution 4 - JavaJosh MView Answer on Stackoverflow
Solution 5 - JavaNerzidView Answer on Stackoverflow