Queue.Queue vs. collections.deque
PythonQueueThread SafetyPython MultithreadingDequePython Problem Overview
I need a queue which multiple threads can put stuff into, and multiple threads may read from.
Python has at least two queue classes, Queue.Queue
and collections.deque
, with the former seemingly using the latter internally. Both claim to be thread-safe in the documentation.
However, the Queue docs also state:
> collections.deque is an alternative > implementation of unbounded queues > with fast atomic append() and > popleft() operations that do not > require locking.
Which I guess I don't quite unterstand: Does this mean deque isn't fully thread-safe after all?
If it is, I may not fully understand the difference between the two classes. I can see that Queue adds blocking functionality. On the other hand, it loses some deque features like support for the in-operator.
Accessing the internal deque object directly, is
> x in Queue().deque
thread-safe?
Also, why does Queue employ a mutex for it's operations when deque is thread-safe already?
Python Solutions
Solution 1 - Python
Queue.Queue
and collections.deque
serve different purposes. Queue.Queue is intended for allowing different threads to communicate using queued messages/data, whereas collections.deque
is simply intended as a datastructure. That's why Queue.Queue
has methods like put_nowait()
, get_nowait()
, and join()
, whereas collections.deque
doesn't. Queue.Queue
isn't intended to be used as a collection, which is why it lacks the likes of the in
operator.
It boils down to this: if you have multiple threads and you want them to be able to communicate without the need for locks, you're looking for Queue.Queue
; if you just want a queue or a double-ended queue as a datastructure, use collections.deque
.
Finally, accessing and manipulating the internal deque of a Queue.Queue
is playing with fire - you really don't want to be doing that.
Solution 2 - Python
If all you're looking for is a thread-safe way to transfer objects between threads, then both would work (both for FIFO and LIFO). For FIFO:
Note:
- Other operations on
deque
might not be thread safe, I'm not sure. deque
does not block onpop()
orpopleft()
so you can't base your consumer thread flow on blocking till a new item arrives.
However, it seems that deque has a significant efficiency advantage. Here are some benchmark results in seconds using CPython 2.7.3 for inserting and removing 100k items
deque 0.0747888759791
Queue 1.60079066852
Here's the benchmark code:
import time
import Queue
import collections
q = collections.deque()
t0 = time.clock()
for i in xrange(100000):
q.append(1)
for i in xrange(100000):
q.popleft()
print 'deque', time.clock() - t0
q = Queue.Queue(200000)
t0 = time.clock()
for i in xrange(100000):
q.put(1)
for i in xrange(100000):
q.get()
print 'Queue', time.clock() - t0
Solution 3 - Python
For information there is a Python ticket referenced for deque thread-safety (https://bugs.python.org/issue15329). Title "clarify which deque methods are thread-safe"
Bottom line here: https://bugs.python.org/issue15329#msg199368
> The deque's append(), appendleft(), pop(), popleft(), and len(d) > operations are thread-safe in CPython. The append methods have a > DECREF at the end (for cases where maxlen has been set), but this > happens after all of the structure updates have been made and the > invariants have been restored, so it is okay to treat these operations > as atomic.
Anyway, if you are not 100% sure and you prefer reliability over performance, just put a like Lock ;)
Solution 4 - Python
All single-element methods on deque
are atomic and thread-safe. All other methods are thread-safe too. Things like len(dq)
, dq[4]
yield momentary correct values. But think e.g. about dq.extend(mylist)
: you don't get a guarantee that all elements in mylist
are filed in a row when other threads also append elements on the same side - but thats usually not a requirement in inter-thread communication and for the questioned task.
So a deque
is ~20x faster than Queue
(which uses a deque
under the hood) and unless you don't need the "comfortable" synchronization API (blocking / timeout), the strict maxsize
obeyance or the "Override these methods (_put, _get, ..) to implement other queue organizations" sub-classing behavior, or when you take care of such things yourself, then a bare deque
is a good and efficient deal for high-speed inter-thread communication.
In fact the heavy usage of an extra mutex and extra method ._get()
etc. method calls in Queue.py
is due to backwards compatibility constraints, past over-design and lack of care for providing an efficient solution for this important speed bottleneck issue in inter-thread communication. A list was used in older Python versions - but even list.append()/.pop(0) was & is atomic and threadsafe ...
Solution 5 - Python
Adding notify_all()
to each deque
append
and popleft
results in far worse results for deque
than the 20x improvement achieved by default deque
behavior:
deque + notify_all: 0.469802
Queue: 0.667279
@Jonathan modify his code a little and I get the benchmark using cPython 3.6.2 and add condition in deque loop to simulate the behaviour Queue do.
import time
from queue import Queue
import threading
import collections
mutex = threading.Lock()
condition = threading.Condition(mutex)
q = collections.deque()
t0 = time.clock()
for i in range(100000):
with condition:
q.append(1)
condition.notify_all()
for _ in range(100000):
with condition:
q.popleft()
condition.notify_all()
print('deque', time.clock() - t0)
q = Queue(200000)
t0 = time.clock()
for _ in range(100000):
q.put(1)
for _ in range(100000):
q.get()
print('Queue', time.clock() - t0)
And it seems the performance limited by
this function condition.notify_all()
> collections.deque is an alternative implementation of unbounded queues with fast atomic append() and popleft() operations that do not require locking.
docs Queue
Solution 6 - Python
deque
is thread-safe. "operations that do not require locking" means that you don't have to do the locking yourself, the deque
takes care of it.
Taking a look at the Queue
source, the internal deque is called self.queue
and uses a mutex for accessors and mutations, so Queue().queue
is not thread-safe to use.
If you're looking for an "in" operator, then a deque or queue is possibly not the most appropriate data structure for your problem.
Solution 7 - Python
(seems I don't have reputation to comment...) You need to be careful which methods of the deque you use from different threads.
deque.get() appears to be threadsafe, but I have found that doing
for item in a_deque:
process(item)
can fail if another thread is adding items at the same time. I got an RuntimeException that complained "deque mutated during iteration".
Check collectionsmodule.c to see which operations are affected by this