Python: Using xpath locally / on a specific element

PythonXpathLxml

Python Problem Overview


I'm trying to get the links from a page with xpath. The problem is that I only want the links inside a table, but if I apply the xpath expression on the whole page I'll capture links which I don't want.

For example:

tree = lxml.html.parse(some_response)
links = tree.xpath("//a[contains(@href, 'http://www.example.com/filter/')]")

The problem is that applies the expression to the whole document. I located the element I want, for example:

tree = lxml.html.parse(some_response)
root = tree.getroot()
table = root[1][5] #for example
links = table.xpath("//a[contains(@href, 'http://www.example.com/filter/')]")

But that seems to be performing the query in the whole document as well, as I still am capturing the links outside of the table. http://codespeak.net/lxml/xpathxslt.html">This page says that "When xpath() is used on an Element, the XPath expression is evaluated against the element (if relative) or against the root tree (if absolute):". So, what I using is an absolute expression and I need to make it relative? Is that it?

Basically, how can I go about filtering only elements that exist inside of this table?

Python Solutions


Solution 1 - Python

Your xpath starts with a slash (/) and is therefore absolute. Add a dot (.) in front to make it relative to the current element i.e.

links = table.xpath(".//a[contains(@href, 'http://www.example.com/filter/')]")

Solution 2 - Python

Another option would be to ask directly for elements inside your table. For instance:

tree = lxml.html.parse(some_response)
links = tree.xpath("//table[**criteria**]//a[contains(@href, 'http://www.example.com/filter/')]")

Where **criteria** is necessary if there are many tables in the page. Some possible criteria would be to filter based on the table id or class. For instance:

links = tree.xpath("//table[@id='my_table_id']//a[contains(@href, 'http://www.example.com/filter/')]")

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionpvt pnsView Question on Stackoverflow
Solution 1 - PythonphihagView Answer on Stackoverflow
Solution 2 - PythonPablo GuerreroView Answer on Stackoverflow