xpath expression to remove whitespace

Xpath

Xpath Problem Overview


I have this HTML:

 <tr class="even  expanded first>
   <td class="score-time status">
     <a href="/matches/2012/08/02/europe/uefa-cup/">
            
            16 : 00
            
     </a>
    </td>        
  </tr>

I want to extract the (16 : 00) string without the extra whitespace. Is this possible?

Xpath Solutions


Solution 1 - Xpath

I. Use this single XPath expression:

translate(normalize-space(/tr/td/a), ' ', '')

Explanation:

  1. normalize-space() produces a new string from its argument, in which any leading or trailing white-space (space, tab, NL or CR characters) is deleted and any intermediary white-space is replaced by a single space character.

  2. translate() takes the result produced by normalize-space() and produces a new string in which each of the remaining intermediary spaces is replaced by the empty string.


II. Alternatively:

translate(/tr/td/a, ' &#9;&#10;&#13', '')

Solution 2 - Xpath

Please try the below xpath expression :

//td[@class='score-time status']/a[normalize-space() = '16 : 00']

Solution 3 - Xpath

You can use XPath's normalize-space() as in //a[normalize-space()="16 : 00"]

Solution 4 - Xpath

I came across this thread when I was having my own issue similar to above.

HTML

<div class="d-flex">
<h4 class="flex-auto min-width-0 pr-2 pb-1 commit-title">
  <a href="/nsomar/OAStackView/releases/tag/1.0.1">
      
    1.0.1
  </a>

XPath start command

tree.xpath('//div[@class="d-flex"]/h4/a/text()')

However this grabbed random whitespace and gave me the output of:

['\n          ', '\n        1.0.1\n      ']

Using normalize-space, it removed the first blank space node and left me with just what I wanted

tree.xpath('//div[@class="d-flex"]/h4/a/text()[normalize-space()]')

['\n        1.0.1\n      ']

I could then grab the first element of the list, and use strip() to remove any further whitespace

XPath final command

tree.xpath('//div[@class="d-flex"]/h4/a/text()[normalize-space()]')[0].strip()

Which left me with exactly what I required:

1.0.1

Solution 5 - Xpath

  • you can check if text() nodes are empty.

    /path/text()[not(.='')]

it may be useful with axes like following-sibling:: if these are no containers, or with child::.

  • you can use string() or the regex() function of xpath 2.

NOTE: some comments say that xpath cannot do string manipulation... even if it's not really designed for that you can do basic things: contains(), starts-with(), replace().

if you want to check whitespace nodes it's much harder, as you will generally have a nodelist result set, and most xpath functions, like match or replace, only operate one node.

  • you can separate node and string manipulation

So you may use xpath to retrieve a container, or a list of text nodes, and then process it with another language. (java, php, python, perl for instance).

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionadellamView Question on Stackoverflow
Solution 1 - XpathDimitre NovatchevView Answer on Stackoverflow
Solution 2 - XpathEbyView Answer on Stackoverflow
Solution 3 - XpathUdhav SarvaiyaView Answer on Stackoverflow
Solution 4 - XpathjerrythebumView Answer on Stackoverflow
Solution 5 - XpathN4553RView Answer on Stackoverflow