How to find tags with only certain attributes - BeautifulSoup

PythonBeautifulsoup

Python Problem Overview


How would I, using BeautifulSoup, search for tags containing ONLY the attributes I search for?

For example, I want to find all <td valign="top"> tags.

The following code: raw_card_data = soup.fetch('td', {'valign':re.compile('top')})

gets all of the data I want, but also grabs any <td> tag that has the attribute valign:top

I also tried: raw_card_data = soup.findAll(re.compile('<td valign="top">')) and this returns nothing (probably because of bad regex)

I was wondering if there was a way in BeautifulSoup to say "Find <td> tags whose only attribute is valign:top"

UPDATE FOr example, if an HTML document contained the following <td> tags:

<td valign="top">.....</td><br />
<td width="580" valign="top">.......</td><br />
<td>.....</td><br />

I would want only the first <td> tag (<td width="580" valign="top">) to return

Python Solutions


Solution 1 - Python

As explained on the BeautifulSoup documentation

You may use this :

soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : "top"})

EDIT :

To return tags that have only the valign="top" attribute, you can check for the length of the tag attrs property :

from BeautifulSoup import BeautifulSoup

html = '<td valign="top">.....</td>\
        <td width="580" valign="top">.......</td>\
        <td>.....</td>'

soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : "top"})

for result in results :
    if len(result.attrs) == 1 :
        print result

That returns :

<td valign="top">.....</td>

Solution 2 - Python

You can use lambda functions in findAll as explained in documentation. So that in your case to search for td tag with only valign = "top" use following:

td_tag_list = soup.findAll(
                lambda tag:tag.name == "td" and
                len(tag.attrs) == 1 and
                tag["valign"] == "top")

Solution 3 - Python

if you want to only search with attribute name with any value

from bs4 import BeautifulSoup
import re

soup= BeautifulSoup(html.text,'lxml')
results = soup.findAll("td", {"valign" : re.compile(r".*")})

as per Steve Lorimer better to pass True instead of regex

results = soup.findAll("td", {"valign" : True})

Solution 4 - Python

The easiest way to do this is with the new CSS style select method:

soup = BeautifulSoup(html)
results = soup.select('td[valign="top"]')

Solution 5 - Python

Just pass it as an argument of findAll:

>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup("""
... <html>
... <head><title>My Title!</title></head>
... <body><table>
... <tr><td>First!</td>
... <td valign="top">Second!</td></tr>
... </table></body><html>
... """)
>>>
>>> soup.findAll('td')
[<td>First!</td>, <td valign="top">Second!</td>]
>>>
>>> soup.findAll('td', valign='top')
[<td valign="top">Second!</td>]

Solution 6 - Python

find using an attribute in any tag

<th class="team" data-sort="team">Team</th>    
soup.find_all(attrs={"class": "team"}) 

<th data-sort="team">Team</th>  
soup.find_all(attrs={"data-sort": "team"}) 
 


Solution 7 - Python

Adding a combination of Chris Redford's and Amr's answer, you can also search for an attribute name with any value with the select command:

from bs4 import BeautifulSoup as Soup
html = '<td valign="top">.....</td>\
    <td width="580" valign="top">.......</td>\
    <td>.....</td>'
soup = Soup(html, 'lxml')
results = soup.select('td[valign]')

Solution 8 - Python

If you are looking to pull all tags where a particular attribute is present at all, you can use the same code as the accepted answer, but instead of specifying a value for the tag, just put True.

soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : True})

This will return all td tags that have valign attributes. This is useful if your project involves pulling info from a tag like div that is used all over, but can handle very specific attributes that you might be looking for.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSnaxibView Question on Stackoverflow
Solution 1 - PythonLoïc G.View Answer on Stackoverflow
Solution 2 - PythonYogeshView Answer on Stackoverflow
Solution 3 - PythonAmrView Answer on Stackoverflow
Solution 4 - PythonChris RedfordView Answer on Stackoverflow
Solution 5 - PythonjuliomalegriaView Answer on Stackoverflow
Solution 6 - PythonShah VipulView Answer on Stackoverflow
Solution 7 - PythonGrazingScientistView Answer on Stackoverflow
Solution 8 - PythonMichaelView Answer on Stackoverflow