jquery-like HTML parsing in Python?

Python Jquery Css Selectors Html Parsing

Python Problem Overview

Is there any Python library that allows me to parse an HTML document similar to what jQuery does?

i.e. I'd like to be able to use CSS selectors syntax to grab an arbitrary set of nodes from the document, read their content/attributes, etc.

The only Python HTML parsing lib I've used before was BeautifulSoup, and even though it's fine I keep thinking it would be faster to do my parsing if I had jQuery syntax available. :D

Python Solutions

Solution 1 - Python

If you are fluent with BeautifulSoup, you could just add soupselect to your libs.
Soupselect is a CSS selector extension for BeautifulSoup.

Usage:

from bs4 import BeautifulSoup as Soup
from soupselect import select
import urllib
soup = Soup(urllib.urlopen('http://slashdot.org/'))
select(soup, 'div.title h3')

    [<h3><span><a href='//science.slashdot.org/'>Science</a>:</span></h3>,
     <h3><a href='//slashdot.org/articles/07/02/28/0120220.shtml'>Star Trek</h3>,
    ..]

Solution 2 - Python

Consider PyQuery:

http://packages.python.org/pyquery/

>>> from pyquery import PyQuery as pq
>>> from lxml import etree
>>> import urllib
>>> d = pq("<html></html>")
>>> d = pq(etree.fromstring("<html></html>"))
>>> d = pq(url='http://google.com/')
>>> d = pq(url='http://google.com/', opener=lambda url: urllib.urlopen(url).read())
>>> d = pq(filename=path_to_html_file)
>>> d("#hello")
[<p#hello.hello>]
>>> p = d("#hello")
>>> p.html()
'Hello world !'
>>> p.html("you know <a href='http://python.org/'>Python</a> rocks")
[<p#hello.hello>]
>>> p.html()
u'you know <a href="http://python.org/">Python</a> rocks'
>>> p.text()
'you know Python rocks'

Solution 3 - Python

The http://lxml.de/">lxml</a> library supports http://lxml.de/cssselect.html">CSS selectors.

Solution 4 - Python

BeautifulSoup, now has support for `css selectors`

import requests
from bs4 import BeautifulSoup as Soup
html = requests.get('https://stackoverflow.com/questions/3051295').content
soup = Soup(html)

Title of this question

soup.select('h1.grid--cell :first-child')[0].text

Number of question upvotes

# first item 
soup.select_one('[itemprop="upvoteCount"]').text

using Python Requests to get the html page

Content Type	Original Author	Original Content on Stackoverflow
Question	Roy Tang	View Question on Stackoverflow
Solution 1 - Python	systempuntoout	View Answer on Stackoverflow
Solution 2 - Python	Luke Stanley	View Answer on Stackoverflow
Solution 3 - Python	Ignacio Vazquez-Abrams	View Answer on Stackoverflow
Solution 4 - Python	iambr	View Answer on Stackoverflow

jquery-like HTML parsing in Python?

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Solution 3 - Python

Solution 4 - Python

BeautifulSoup, now has support for `css selectors`

Define click event for UISegmentedControl

Multiple CSS Classes: Properties Overlapping based on the order defined

Attributions

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Solution 3 - Python

Solution 4 - Python

BeautifulSoup, now has support for css selectors

Define click event for UISegmentedControl

Multiple CSS Classes: Properties Overlapping based on the order defined

Attributions

BeautifulSoup, now has support for `css selectors`