Python: Using xpath locally / on a specific element
PythonXpathLxmlPython Problem Overview
I'm trying to get the links from a page with xpath. The problem is that I only want the links inside a table, but if I apply the xpath expression on the whole page I'll capture links which I don't want.
For example:
tree = lxml.html.parse(some_response)
links = tree.xpath("//a[contains(@href, 'http://www.example.com/filter/')]")
The problem is that applies the expression to the whole document. I located the element I want, for example:
tree = lxml.html.parse(some_response)
root = tree.getroot()
table = root[1][5] #for example
links = table.xpath("//a[contains(@href, 'http://www.example.com/filter/')]")
But that seems to be performing the query in the whole document as well, as I still am capturing the links outside of the table. http://codespeak.net/lxml/xpathxslt.html">This page says that "When xpath() is used on an Element, the XPath expression is evaluated against the element (if relative) or against the root tree (if absolute):". So, what I using is an absolute expression and I need to make it relative? Is that it?
Basically, how can I go about filtering only elements that exist inside of this table?
Python Solutions
Solution 1 - Python
Your xpath starts with a slash (/
) and is therefore absolute. Add a dot (.
) in front to make it relative to the current element i.e.
links = table.xpath(".//a[contains(@href, 'http://www.example.com/filter/')]")
Solution 2 - Python
Another option would be to ask directly for elements inside your table. For instance:
tree = lxml.html.parse(some_response)
links = tree.xpath("//table[**criteria**]//a[contains(@href, 'http://www.example.com/filter/')]")
Where **criteria**
is necessary if there are many tables in the page. Some possible criteria would be to filter based on the table id or class. For instance:
links = tree.xpath("//table[@id='my_table_id']//a[contains(@href, 'http://www.example.com/filter/')]")