What's the difference between () and [] in regular expression patterns?

Regex

Regex Problem Overview


What is the difference between encasing part of a regular expression in () (parentheses) and doing it in [] (square brackets)?

How does this:

[a-z0-9]

differ from this:

(a-z0-9)

?

Regex Solutions


Solution 1 - Regex

[] denotes a character class. () denotes a capturing group.

[a-z0-9] -- One character that is in the range of a-z OR 0-9

(a-z0-9) -- Explicit capture of a-z0-9. No ranges.

a -- Can be captured by [a-z0-9].

a-z0-9 -- Can be captured by (a-z0-9) and then can be referenced in a replacement and/or later in the expression.

Solution 2 - Regex

(…) is a group that groups the contents like in math; (a-z0-9) is the grouped sequence of a-z0-9. Groups are particularly used with quantifiers that allow the preceding expression to be repeated as a whole: a*b* matches any number of a’s followed by any number of b’s, e.g. a, aaab, bbbbb, etc.; in contrast to that, (ab)* matches any number of ab’s, e.g. ab, abababab, etc.

[…] is a character class that describes the options for one single character; [a-z0-9] describes one single character that can be of the range az or 09.

Solution 3 - Regex

The [] construct in a regex is essentially shorthand for an | on all of the contents. For example [abc] matches a, b or c. Additionally the - character has special meaning inside of a []. It provides a range construct. The regex [a-z] will match any letter a through z.

The () construct is a grouping construct establishing a precedence order (it also has impact on accessing matched substrings but that's a bit more of an advanced topic). The regex (abc) will match the string "abc".

Solution 4 - Regex

[a-z0-9] will match any lowercase letter or number. (a-z0-9) will match the exact string "a-z0-9" and allows two additional things: You can apply modifiers like * and ? and + to the whole group, and you can reference this match after the match with $1 or \1. Not useful with your example, though.

Solution 5 - Regex

Try ([a-z0-9]) to capture a mixed string of lowercase letters and numbers, as well as capture for back references (or extraction).

Solution 6 - Regex

[a-z0-9] will match one of abcdefghijklmnopqrstuvwxyz0123456789. In other words, square brackets match exactly one character.

(a-z0-9) will match two characters, the first is one of abcdefghijklmnopqrstuvwxyz, the second is one of 0123456789, just as if the parenthesis weren't there. The () will allow you to read exactly which characters were matched. Parenthesis are also useful for OR'ing two expressions with the bar | character. For example, (a-z|0-9) will match one character -- any of the lowercase alpha or digit.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionKatieKView Question on Stackoverflow
Solution 1 - RegexJeff RupertView Answer on Stackoverflow
Solution 2 - RegexGumboView Answer on Stackoverflow
Solution 3 - RegexJaredParView Answer on Stackoverflow
Solution 4 - RegexMatt KView Answer on Stackoverflow
Solution 5 - RegexburkestarView Answer on Stackoverflow
Solution 6 - Regexlevis501View Answer on Stackoverflow