How to find children of nodes using BeautifulSoup

PythonHtmlBeautifulsoup

Python Problem Overview


I want to get all the <a> tags which are children of <li>:

<div>
<li class="test">
    <a>link1</a>
    <ul> 
       <li>  
          <a>link2</a> 
       </li>
    </ul>
</li>
</div>

I know how to find element with particular class like this:

soup.find("li", { "class" : "test" }) 

But I don't know how to find all <a> which are children of <li class=test> but not any others.

Like I want to select:

<a>link1</a>

Python Solutions


Solution 1 - Python

Try this

li = soup.find('li', {'class': 'text'})
children = li.findChildren("a" , recursive=False)
for child in children:
    print(child)

Solution 2 - Python

There's a super small section in the DOCs that shows how to find/find_all direct children.

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-recursive-argument

In your case as you want link1 which is first direct child:

# for only first direct child
soup.find("li", { "class" : "test" }).find("a", recursive=False)

If you want all direct children:

# for all direct children
soup.find("li", { "class" : "test" }).findAll("a", recursive=False)

Solution 3 - Python

Perhaps you want to do

soup.find("li", { "class" : "test" }).find('a')

Solution 4 - Python

try this:

li = soup.find("li", { "class" : "test" })
children = li.find_all("a") # returns a list of all <a> children of li

other reminders:

The find method only gets the first occurring child element. The find_all method gets all descendant elements and are stored in a list.

Solution 5 - Python

"How to find all a which are children of <li class=test> but not any others?"

Given the HTML below (I added another <a> to show te difference between select and select_one):

<div>
  <li class="test">
    <a>link1</a>
    <ul>
      <li>
        <a>link2</a>
      </li>
    </ul>
    <a>link3</a>
  </li>
</div>

The solution is to use child combinator (>) that is placed between two CSS selectors:

>>> soup.select('li.test > a')
[<a>link1</a>, <a>link3</a>]

In case you want to find only the first child:

>>> soup.select_one('li.test > a')
<a>link1</a>

Solution 6 - Python

Yet another method - create a filter function that returns True for all desired tags:

def my_filter(tag):
    return (tag.name == 'a' and
        tag.parent.name == 'li' and
        'test' in tag.parent['class'])

Then just call find_all with the argument:

for a in soup(my_filter): # or soup.find_all(my_filter)
    print a

Solution 7 - Python

Just came across this answer and checked the documentation to see that soup.findChildren is deprecated (BS 4.9). You can use soup.children instead, which only considers an element's direct children, not its descendants.

li = soup.find('li', {'class': 'text'})
for child in li.children:
    print(child)

Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#contents-and-children

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questiontej.tanView Question on Stackoverflow
Solution 1 - PythoncerberosView Answer on Stackoverflow
Solution 2 - PythonstriderView Answer on Stackoverflow
Solution 3 - PythonBemmuView Answer on Stackoverflow
Solution 4 - PythonkiiruView Answer on Stackoverflow
Solution 5 - PythonradzakView Answer on Stackoverflow
Solution 6 - PythonDedek MrazView Answer on Stackoverflow
Solution 7 - Pythonjerry_View Answer on Stackoverflow