Scala capture group using regex

RegexStringScalaCapturing Group

Regex Problem Overview


Let's say I have this code:

val string = "one493two483three"
val pattern = """two(\d+)three""".r
pattern.findAllIn(string).foreach(println)

I expected findAllIn to only return 483, but instead, it returned two483three. I know I could use unapply to extract only that part, but I'd have to have a pattern for the entire string, something like:

 val pattern = """one.*two(\d+)three""".r
 val pattern(aMatch) = string
 println(aMatch) // prints 483

Is there another way of achieving this, without using the classes from java.util directly, and without using unapply?

Regex Solutions


Solution 1 - Regex

Here's an example of how you can access group(1) of each match:

val string = "one493two483three"
val pattern = """two(\d+)three""".r
pattern.findAllIn(string).matchData foreach {
   m => println(m.group(1))
}

This prints "483" (as seen on ideone.com).


The lookaround option

Depending on the complexity of the pattern, you can also use lookarounds to only match the portion you want. It'll look something like this:

val string = "one493two483three"
val pattern = """(?<=two)\d+(?=three)""".r
pattern.findAllIn(string).foreach(println)

The above also prints "483" (as seen on ideone.com).

References

Solution 2 - Regex

val string = "one493two483three"
val pattern = """.*two(\d+)three.*""".r

string match {
  case pattern(a483) => println(a483) //matched group(1) assigned to variable a483
  case _ => // no match
}

Solution 3 - Regex

Starting Scala 2.13, as an alternative to regex solutions, it's also possible to pattern match a String by unapplying a string interpolator:

"one493two483three" match { case s"${x}two${y}three" => y }
// String = "483"

Or even:

val s"${x}two${y}three" = "one493two483three"
// x: String = one493
// y: String = 483

If you expect non matching input, you can add a default pattern guard:

"one493deux483three" match {
  case s"${x}two${y}three" => y
  case _                   => "no match"
}
// String = "no match"

Solution 4 - Regex

You want to look at group(1), you're currently looking at group(0), which is "the entire matched string".

See http://daily-scala.blogspot.com/2010/01/regular-expression-1-basics-and.html">this regex tutorial.

Solution 5 - Regex

def extractFileNameFromHttpFilePathExpression(expr: String) = {
//define regex
val regex = "http4.*\\/(\\w+.(xlsx|xls|zip))$".r
// findFirstMatchIn/findAllMatchIn returns Option[Match] and Match has methods to access capture groups.
regex.findFirstMatchIn(expr) match {
  case Some(i) => i.group(1)
  case None => "regex_error"
}
}
extractFileNameFromHttpFilePathExpression(
    "http4://testing.bbmkl.com/document/sth1234.zip")

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGeoView Question on Stackoverflow
Solution 1 - RegexpolygenelubricantsView Answer on Stackoverflow
Solution 2 - RegexcaiiiycukView Answer on Stackoverflow
Solution 3 - RegexXavier GuihotView Answer on Stackoverflow
Solution 4 - RegexStephenView Answer on Stackoverflow
Solution 5 - RegexGaurav KhareView Answer on Stackoverflow