How to run Flask with Gunicorn in multithreaded mode

PythonFlaskMachine LearningGunicorn

Python Problem Overview


I have web application written in Flask. As suggested by everyone, I can't use Flask in production. So I thought of Gunicorn with Flask.

In Flask application I am loading some Machine Learning models. These are of size 8GB collectively. Concurrency of my web application can go upto 1000 requests. And the RAM of machine is 15GB.
So what is the best way to run this application?

Python Solutions


Solution 1 - Python

You can start your app with multiple workers or async workers with Gunicorn.

Flask server.py

from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello World!"

if __name__ == "__main__":
    app.run()

Gunicorn with gevent async worker

gunicorn server:app -k gevent --worker-connections 1000

Gunicorn 1 worker 12 threads:

gunicorn server:app -w 1 --threads 12

Gunicorn with 4 workers (multiprocessing):

gunicorn server:app -w 4

More information on Flask concurrency in this post: How many concurrent requests does a single Flask process receive?.

Solution 2 - Python

The best thing to do is to use pre-fork mode (preload_app=True). This will initialize your code in a "master" process and then simply fork off worker processes to handle requests. If you are running on linux and assuming your model is read-only, the OS is smart enough to reuse the physical memory amongst all the processes.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionneelView Question on Stackoverflow
Solution 1 - PythonmolivierView Answer on Stackoverflow
Solution 2 - PythonslushiView Answer on Stackoverflow