Using XPath, How do I select a node based on its text content and value of an attribute?

XmlXpathXquery

Xml Problem Overview


Given this XML:

<DocText>
<WithQuads>
	<Page pageNumber="3">
		<Word>
			July
			<Quad>
				<P1 X="84" Y="711.25" />
				<P2 X="102.062" Y="711.25" />
				<P3 X="102.062" Y="723.658" />
				<P4 X="84.0" Y="723.658" />
			</Quad>
		</Word>
		<Word>
		</Word>
		<Word>
			30,
			<Quad>
				<P1 X="104.812" Y="711.25" />
				<P2 X="118.562" Y="711.25" />
				<P3 X="118.562" Y="723.658" />
				<P4 X="104.812" Y="723.658" />
			</Quad>
		</Word>
	</Page>
</WithQuads>

I'd like to find the nodes that have text of 'July' and a Quad/P1/X attribute Greater than 90. Thus, in this case, it should not return any matches. However, if I use GT (>) or LT (<), I get a match on the first Word element. If I use eq (=), I get no match.

So:

//Word[text()='July' and //P1[@X < 90]]

will return true, as will

//Word[text()='July' and //P1[@X > 90]]

How do I constrain this properly on the P1@X attribute?

In addition, imagine I have multiple Page elements, for different page numbers. How would I additionally constrain the above search to find Nodes with text()='July', P1@X < 90, and Page@pageNumber=3?

Xml Solutions


Solution 1 - Xml

Generally I would consider the use of an unprefixed // as a bad smell in an XPath.

Try this:-

/DocText/WithQuads/Page/Word[text()='July' and Quad/P1/@X > 90]

Your problem is that you use the //P1[@X < 90] which starts back at the beginning of the document and starts hunting any P1 hence it will always be true. Similarly //P1[@X > 90] is always true.

Solution 2 - Xml

Apart form the "//" issue, this XML is a very weird use of mixed content. The predicate text()='July' will match the element if any child text node is exactly equal to July, which isn't true in your example because of surrounding whitespace. Depending on the exact definition of the source XML, I would go for [text()[normalize-space(.)='July'] and Quad/P1/@X > 90]

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionmarc esherView Question on Stackoverflow
Solution 1 - XmlAnthonyWJonesView Answer on Stackoverflow
Solution 2 - XmlMichael KayView Answer on Stackoverflow