Case insensitive XPath contains() possible?

JavascriptHtmlXmlXsltXpath

Javascript Problem Overview


I'm running over all textnodes of my DOM and check if the nodeValue contains a certain string.

/html/body//text()[contains(.,'test')]

This is case sensitive. However, I also want to catch Test, TEST or TesT. Is that possible with XPath (in JavaScript)?

Javascript Solutions


Solution 1 - Javascript

This is for XPath 1.0. If your environment supports XPath 2.0, see here.


Yes. Possible, but not beautiful.

/html/body//text()[  contains(    translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),    'test'  )]

This would work for search strings where the alphabet is known beforehand. Add any accented characters you expect to see.


If you can, mark the text that interests you with some other means, like enclosing it in a <span> that has a certain class while building the HTML. Such things are much easier to locate with XPath than substrings in the element text.

If that's not an option, you can let JavaScript (or any other host language that you are using to execute XPath) help you with building an dynamic XPath expression:

function xpathPrepare(xpath, searchString) {
  return xpath.replace("$u", searchString.toUpperCase())
              .replace("$l", searchString.toLowerCase())
              .replace("$s", searchString.toLowerCase());
}

xp = xpathPrepare("//text()[contains(translate(., '$u', '$l'), '$s')]", "Test");
// -> "//text()[contains(translate(., 'TEST', 'test'), 'test')]"

(Hat tip to @KirillPolishchuk's answer - of course you only need to translate those characters you're actually searching for.)

This approach would work for any search string whatsoever, without requiring prior knowledge of the alphabet, which is a big plus.

Both of the methods above fail when search strings can contain single quotes, in which case things get more complicated.

Solution 2 - Javascript

Case-insensitive contains

/html/body//text()[contains(translate(., 'EST', 'est'), 'test')]

Solution 3 - Javascript

XPath 2.0 Solutions

  1. Use lower-case():

    /html/body//text()[contains(lower-case(.),'test')]

  2. Use matches() regex matching with its case-insensitive flag:

    /html/body//text()[matches(.,'test', 'i')]

Solution 4 - Javascript

Yes. You can use translate to convert the text you want to match to lower case as follows:

/html/body//text()[contains(translate(.,                                       'ABCDEFGHIJKLMNOPQRSTUVWXYZ',                                      'abcdefghijklmnopqrstuvwxyz'),                   'test')]

Solution 5 - Javascript

The way i always did this was by using the "translate" function in XPath. I won't say its very pretty but it works correctly.

/html/body//text()[contains(translate(.,'abcdefghijklmnopqrstuvwxyz',                                        'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),'TEST')]

hope this helps,

Solution 6 - Javascript

If you're using XPath 2.0 then you can specify a collation as the third argument to contains(). However, collation URIs are not standardized so the details depend on the product that you are using.

Note that the solutions given earlier using translate() all assume that you are only using the 26-letter English alphabet.

UPDATE: XPath 3.1 defines a standard collation URI for case-blind matching.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAron WoostView Question on Stackoverflow
Solution 1 - JavascriptTomalakView Answer on Stackoverflow
Solution 2 - JavascriptKirill PolishchukView Answer on Stackoverflow
Solution 3 - JavascriptkjhughesView Answer on Stackoverflow
Solution 4 - JavascriptAndyView Answer on Stackoverflow
Solution 5 - JavascriptMarvin SmitView Answer on Stackoverflow
Solution 6 - JavascriptMichael KayView Answer on Stackoverflow