Why doesn't requests.get() return? What is the default timeout that requests.get() uses?

PythonGetPython Requests

Python Problem Overview


In my script, requests.get never returns:

import requests

print ("requesting..")

# This call never returns!
r = requests.get(
    "http://www.some-site.com",
    proxies = {'http': '222.255.169.74:8080'},
)

print(r.ok)

What could be the possible reason(s)? Any remedy? What is the default timeout that get uses?

Python Solutions


Solution 1 - Python

> What is the default timeout that get uses?

The default timeout is None, which means it'll wait (hang) until the connection is closed.

Just specify a timeout value, like this:

r = requests.get(
    'http://www.justdial.com',
    proxies={'http': '222.255.169.74:8080'},
    timeout=5
)

Solution 2 - Python

From requests documentation:

> You can tell Requests to stop waiting for a response after a given > number of seconds with the timeout parameter: > > >>> requests.get('http://github.com';, timeout=0.001) > Traceback (most recent call last): > File "", line 1, in > requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001) > > Note: > > timeout is not a time limit on the entire response download; rather, > an exception is raised if the server has not issued a response for > timeout seconds (more precisely, if no bytes have been received on the > underlying socket for timeout seconds).

It happens a lot to me that requests.get() takes a very long time to return even if the timeout is 1 second. There are a few way to overcome this problem:

1. Use the TimeoutSauce internal class

From: https://github.com/kennethreitz/requests/issues/1928#issuecomment-35811896

> import requests from requests.adapters import TimeoutSauce > > class MyTimeout(TimeoutSauce): > def init(self, *args, **kwargs): > if kwargs['connect'] is None: > kwargs['connect'] = 5 > if kwargs['read'] is None: > kwargs['read'] = 5 > super(MyTimeout, self).init(*args, **kwargs) > > requests.adapters.TimeoutSauce = MyTimeout > > > This code should cause us to set the read timeout as equal to the > connect timeout, which is the timeout value you pass on your > Session.get() call. (Note that I haven't actually tested this code, so > it may need some quick debugging, I just wrote it straight into the > GitHub window.)

2. Use a fork of requests from kevinburke: https://github.com/kevinburke/requests/tree/connect-timeout

From its documentation: https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst

> If you specify a single value for the timeout, like this: > > r = requests.get('https://github.com';, timeout=5) > > The timeout value will be applied to both the connect and the read > timeouts. Specify a tuple if you would like to set the values > separately: > > r = requests.get('https://github.com';, timeout=(3.05, 27))

NOTE: The change has since been merged to the main Requests project.

3. Using evenlet or signal as already mentioned in the similar question: https://stackoverflow.com/questions/21965484/timeout-for-python-requests-get-entire-response

Solution 3 - Python

I wanted a default timeout easily added to a bunch of code (assuming that timeout solves your problem)

This is the solution I picked up from a ticket submitted to the repository for Requests.

credit: https://github.com/kennethreitz/requests/issues/2011#issuecomment-477784399

The solution is the last couple of lines here, but I show more code for better context. I like to use a session for retry behaviour.

import requests
import functools
from requests.adapters import HTTPAdapter,Retry


def requests_retry_session(
        retries=10,
        backoff_factor=2,
        status_forcelist=(500, 502, 503, 504),
        session=None,
        ) -> requests.Session:
    session = session or requests.Session()
    retry = Retry(
            total=retries,
            read=retries,
            connect=retries,
            backoff_factor=backoff_factor,
            status_forcelist=status_forcelist,
            )
    adapter = HTTPAdapter(max_retries=retry)
    session.mount('http://', adapter)
    session.mount('https://', adapter)
    # set default timeout
    for method in ('get', 'options', 'head', 'post', 'put', 'patch', 'delete'):
        setattr(session, method, functools.partial(getattr(session, method), timeout=30))
    return session

then you can do something like this:

requests_session = requests_retry_session()
r = requests_session.get(url=url,...

Solution 4 - Python

Reviewed all the answers and came to conclusion that the problem still exists. On some sites requests may hang infinitely and using multiprocessing seems to be overkill. Here's my approach(Python 3.5+):

import asyncio

import aiohttp


async def get_http(url):
    async with aiohttp.ClientSession(conn_timeout=1, read_timeout=3) as client:
        try:
            async with client.get(url) as response:
                content = await response.text()
                return content, response.status
        except Exception:
            pass


loop = asyncio.get_event_loop()
task = loop.create_task(get_http('http://example.com'))
loop.run_until_complete(task)
result = task.result()
if result is not None:
    content, status = task.result()
    if status == 200:
        print(content)

UPDATE

If you receive a deprecation warning about using conn_timeout and read_timeout, check near the bottom of THIS reference for how to use the ClientTimeout data structure. One simple way to apply this data structure per the linked reference to the original code above would be:

async def get_http(url):
    timeout = aiohttp.ClientTimeout(total=60)
    async with aiohttp.ClientSession(timeout=timeout) as client:
        try:
            etc.

Solution 5 - Python

In my case, the reason of "requests.get never returns" is because requests.get() attempt to connect to the host resolved with ipv6 ip first. If something went wrong to connect that ipv6 ip and get stuck, then it retries ipv4 ip only if I explicit set timeout=<N seconds> and hit the timeout.

My solution is monkey-patching the python socket to ignore ipv6(or ipv4 if ipv4 not working), either this answer or this answer are works for me.

You might wondering why curl command is works, because curl connect ipv4 without waiting for ipv6 complete. You can trace the socket syscalls with strace -ff -e network -s 10000 -- curl -vLk '<your url>' command. For python, strace -ff -e network -s 10000 -- python3 <your python script> command can be used.

Solution 6 - Python

Patching the documented "send" function will fix this for all requests - even in many dependent libraries and sdk's. When patching libs, be sure to patch supported/documented functions, not TimeoutSauce - otherwise you may wind up silently losing the effect of your patch.

import requests

DEFAULT_TIMEOUT = 180

old_send = requests.Session.send

def new_send(*args, **kwargs):
     if kwargs.get("timeout", None) is None:
         kwargs["timeout"] = DEFAULT_TIMEOUT
     return old_send(*args, **kwargs)

requests.Session.send = new_send

The effects of not having any timeout are quite severe, and the use of a default timeout can almost never break anything - because TCP itself has default timeouts as well.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNawazView Question on Stackoverflow
Solution 1 - Pythonron rothmanView Answer on Stackoverflow
Solution 2 - PythonHieuView Answer on Stackoverflow
Solution 3 - PythonTim RichardsonView Answer on Stackoverflow
Solution 4 - PythonAlex PolekhaView Answer on Stackoverflow
Solution 5 - Python林果皞View Answer on Stackoverflow
Solution 6 - PythonErik AronestyView Answer on Stackoverflow