What should I use to open a url instead of urlopen in urllib3

PythonWeb ScrapingBeautifulsoupUrllib3

Python Problem Overview


I wanted to write a piece of code like the following:

from bs4 import BeautifulSoup
import urllib2

url = 'http://www.thefamouspeople.com/singers.php'
html = urllib2.urlopen(url)
soup = BeautifulSoup(html)

But I found that I have to install urllib3 package now.

Moreover, I couldn't find any tutorial or example to understand how to rewrite the above code, for example, urllib3 does not have urlopen.

Any explanation or example, please?!

P/S: I'm using python 3.4.

Python Solutions


Solution 1 - Python

urllib3 is a different library from urllib and urllib2. It has lots of additional features to the urllibs in the standard library, if you need them, things like re-using connections. The documentation is here: https://urllib3.readthedocs.org/

If you'd like to use urllib3, you'll need to pip install urllib3. A basic example looks like this:

from bs4 import BeautifulSoup
import urllib3

http = urllib3.PoolManager()

url = 'http://www.thefamouspeople.com/singers.php'
response = http.request('GET', url)
soup = BeautifulSoup(response.data)

Solution 2 - Python

You do not have to install urllib3. You can choose any HTTP-request-making library that fits your needs and feed the response to BeautifulSoup. The choice is though usually requests because of the rich feature set and convenient API. You can install requests by entering pip install requests in the command line. Here is a basic example:

from bs4 import BeautifulSoup
import requests

url = "url"
response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")

Solution 3 - Python

The new urllib3 library has a nice documentation here
In order to get your desired result you shuld follow that:

Import urllib3
from bs4 import BeautifulSoup

url = 'http://www.thefamouspeople.com/singers.php'

http = urllib3.PoolManager()
response = http.request('GET', url)
soup = BeautifulSoup(response.data.decode('utf-8'))

The "decode utf-8" part is optional. It worked without it when i tried, but i posted the option anyway.
Source: User Guide

Solution 4 - Python

With gazpacho you could pipeline the page straight into a parse-able soup object:

from gazpacho import Soup
url = "http://www.thefamouspeople.com/singers.php"
soup = Soup.get(url)

And run finds on top of it:

soup.find("div")

Solution 5 - Python

In urlip3 there's no .urlopen, instead try this:

import requests
html = requests.get(url)

Solution 6 - Python

You should use urllib.reuqest, not urllib3.

import urllib.request   # not urllib - important!
urllib.request.urlopen('https://...')

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionniloofarView Question on Stackoverflow
Solution 1 - PythonshazowView Answer on Stackoverflow
Solution 2 - PythonalecxeView Answer on Stackoverflow
Solution 3 - PythonLan VukušičView Answer on Stackoverflow
Solution 4 - PythonemehexView Answer on Stackoverflow
Solution 5 - PythonHeba Allah. HashimView Answer on Stackoverflow
Solution 6 - PythonmirekView Answer on Stackoverflow