Unicorn exit timeout on Heroku after trapping TERM and sending QUIT

HerokuUnicorn

Heroku Problem Overview


I am receiving R12 Exit Timeout errors for a Heroku app running unicorn and sidekiq. These errors occur 1-2 times a day and whenever I deploy. I understand that I need to convert the shutdown signals from Heroku for unicorn to respond correctly, but thought that I had done so in the below unicorn config:

worker_processes 3
timeout 30
preload_app true

before_fork do |server, worker|
  Signal.trap 'TERM' do
    puts "Unicorn master intercepting TERM and sending myself QUIT instead. My PID is #{Process.pid}"
    Process.kill 'QUIT', Process.pid
  end

  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.connection.disconnect!
    Rails.logger.info('Disconnected from ActiveRecord')
  end
end

after_fork do |server, worker|
  Signal.trap 'TERM' do
    puts "Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is #{Process.pid}"
  end

  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.establish_connection
    Rails.logger.info('Connected to ActiveRecord')
  end

  Sidekiq.configure_client do |config|
    config.redis = { :size => 1 }
  end
end

My logs surrounding the error look like this:

Stopping all processes with SIGTERM
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 7
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 11
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 15
Unicorn master intercepting TERM and sending myself QUIT instead. My PID is 2
Started GET "/manage"
reaped #<Process::Status: pid 11 exit 0> worker=1
reaped #<Process::Status: pid 7 exit 0> worker=0
reaped #<Process::Status: pid 15 exit 0> worker=2
master complete
Error R12 (Exit timeout) -> At least one process failed to exit within 10 seconds of SIGTERM
Stopping remaining processes with SIGKILL
Process exited with status 137

It appears that all of the child processes were successfully reaped before the timeout. Is it possible master is still alive? Also, should the router still be sending web requests to the dyno during shut down, as shown in the logs?

FWIW, I'm using Heroku's zero downtime deployment plugin (https://devcenter.heroku.com/articles/labs-preboot/).

Heroku Solutions


Solution 1 - Heroku

I think your custom signal handling is what's causing the timeouts here.

EDIT: I'm getting downvoted for disagreeing with Heroku's documentation and I'd like to address this.

Configuring your Unicorn application to catch and swallow the TERM signal is the most likely cause of your application hanging and not shutting down correctly.

Heroku seems to argue that catching and transforming a TERM signal into a QUIT signal is the right behavior to turn a hard shutdown into a graceful shutdown.

However, doing this seems to introduce the risk of no shutdown at all in some cases - the root of this bug. Users experiencing hanging dynos running Unicorn should consider the evidence and make their own decision based on first principles, not just documentation.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionmiddkiddView Question on Stackoverflow
Solution 1 - HerokuWinfieldView Answer on Stackoverflow