Phusion white papers Phusion overview

Phusion Blog

The right way to deal with frozen processes on Unix

By Hongli Lai on September 21st, 2012

Those who administer production Unix systems have undoubtedly encountered the problem of frozen processes before. They just sit around, consuming CPU and/or memory indefinitely until you forcefully shut them down.

Phusion Passenger 3 – our high-performance and advanced web application server – was not completely immune to this problem either. That is until today, because we have just implemented a fix which will be part of Phusion Passenger 4.0 (4.0 hasn’t been released yet, but a first beta will appear soon). It’s going into the open source version, not just Phusion Passenger Enterprise, because we believe this is an important change that should benefit the entire community.

Behind our solution lies an array of knowledge about operating systems, process management and Unix. Today, I’d like to take this opportunity to share some of our knowledge with our readers. In this article we’re going to dive into the operating system-level details of frozen processes. Some of the questions answered by this article include:

  • What causes a process to become frozen and how can you debug them?
  • What facilities does Unix provide to control and to shut down processes?
  • How can a process manager (e.g. Phusion Passenger) automatically cleanup frozen processes?
  • Why was Phusion Passenger not immune to the problem of frozen processes?
  • How is Phusion Passenger’s frozen process killing fix implemented, and under what constraints does it work?

This article attempts to be generic, but will also provide Ruby-specific tips.

Not all frozen processes are made equal

Let me first clear some confusion. Frozen processes are sometimes also called zombie processes, but this is not formally correct. Formally, a zombie process as defined by Unix operating systems is a process that has already exited, but its parent process has not waited for its exit yet. Zombie processes show up in ps and on Linux they have “<defunct>” appended to their names. Zombie processes do not consume memory or CPU. Killing them – even with SIGKILL – has no effect. The only way to get rid of them is to either make the parent process call waitpid() on them, or by terminating the parent process. The latter causes the zombie process to be “adopted” by the init process (PID 1), which immediately calls waitpid() on any adopted processes. In any case, zombie processes are harmless.

In this article we’re only covering frozen processes, not zombie processes. A process is often considered frozen if it has stopped responding normally. They appear stuck inside something and are not throwing errors. Some of the general causes are:

  1. The process is stuck in an infinite loop. In practice we rarely see this kind of freeze.
  2. The process is very slow, even during shutdown, causing it to appear frozen. Some of the causes include:
    2.1. It is using too much memory, causing it to hit the swap. In this case you should notice the entire system becoming slow.
    2.2. It is performing an unoptimized operation that takes a long time. For example you may have code that iterates through all database table rows to perform a calculation. In development you only had a handful of rows so you never noticed this, but you forgot that in production you have millions of rows.
  3. The process is stuck in an I/O operation. It’s waiting for an external process or a server, be it the database, an HTTP API server, or whatever.

Debugging frozen processes

Obviously fixing a frozen process involves more than figuring out how to automatically kill it. Killing it just fixes the symptoms, and should be considered a final defense in the war against frozen processes. It’s better to figure out the root cause. We have a number of tools in our arsenal to find out what a frozen process is doing.


crash-watch is a tool that we’ve written to easily obtain process backtraces. Crash-watch can be instructed to dump backtraces when a given process crashes, or dump their current backtraces.

Crash-watch is actually a convenience wrapper around gdb, so you must install gdb first. It dumps C-level backtraces, not language-specific backtraces. If you run crash-watch on a Ruby program it will dump the Ruby interpreter’s C backtraces and not the Ruby code’s backtraces. It also dumps the backtraces of all threads, not just the active thread.

Invoke crash-watch as follows:

crash-watch --dump <PID>

Here is a sample output. This output is the result of invoking crash-watch on a simple “hello world” Rack program that simulates being frozen.

Crash-watch is especially useful for analyzing C-level problems. As you can see in the sample output, the program is frozen in a freeze_process call.

Crash-watch can also assist in analyzing problems caused by Ruby C extensions. Ruby’s mysql gem is quite notorious in this regard because it blocks all Ruby threads while it’s doing its work. If the MySQL server doesn’t respond (e.g. because of network problems, because the server is dead, or because the query is too heavy) then the mysql gem will freeze the entire process, making it even unable to respond to signals. With crash-watch you are able to clearly see that a process is frozen in a mysql gem call.

Phusion Passenger’s SIGQUIT handler

Ruby processes managed by Phusion Passenger respond to SIGQUIT by printing their backtraces to stderr. On Phusion Passenger, stderr is always redirected to the global web server error log (e.g. /var/log/apache2/error.log or /var/log/nginx/error.log), not the vhost-specific error log. If your Ruby interpreter supports it, Phusion Passenger will even print the backtraces of all Ruby threads, not just the active one. This latter feature requires either Ruby Enterprise Edition or Ruby 1.9.

Note that for this to work, the Ruby interpreter must be responding to signals. If the Ruby interpreter is frozen inside a C extension call (such as is the case in the sample program) then nothing will happen. In that case you should use crash-watch or the rb_backtrace() trick below.


If you want to debug a Ruby program that’s not managed by Phusion Passenger then there’s another trick to obtain the Ruby backtrace. The Ruby interpreter has a nice function called rb_backtrace() which causes it to print its current Ruby-level backtrace to stdout. You can use gdb to force a process to call that function. This works even when the Ruby interpreter is stuck inside a C call. This method has two downsides:

  1. Its reliability depends on the state of the Ruby interpreter. You are forcing a call from arbitrary places in the code, so you risk corrupting the process state. Use with caution.
  2. It only prints the backtrace of the active Ruby thread. It’s not possible to print the backtraces of any other Ruby threads.

First, start gdb:

$ gdb

Then attach gdb to the process you want:

attach <PID>

This will probably print a whole bunch of messages. Ignore them; if gdb prints a lot of library names and then asks you whether to continue, answer Yes.

Now we get to the cream. Use the following command to force a call to rb_backtrace():

p (void) rb_backtrace()

You should now see a backtrace appearing in the process’s stdout:

from `freeze_process'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/rack/thread_handler_extension.rb:67:in `call'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/rack/thread_handler_extension.rb:67:in `process_request'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler/thread_handler.rb:126:in `accept_and_process_next_request'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler/thread_handler.rb:100:in `main_loop'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/utils/robust_interruption.rb:82:in `disable_interruptions'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler/thread_handler.rb:98:in `main_loop'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler.rb:432:in `start_threads'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler.rb:426:in `initialize'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler.rb:426

strace and dtruss

The strace tool (Linux) and the dtruss tool (OS X) can be used to see which system calls a process is calling. This is specially useful for detecting problems belonging to categories (1) and (2.2).

Invoke strace and dtruss as follows:

sudo strace -p <PID>
sudo dtruss -p <PID>

Phusion Passenger’s role

Phusion Passenger

Phusion Passenger was traditionally architected to trust application processes. That is, it assumed that if we tell an application process to start, it will start, and if we tell it to stop it will stop. In practice this is not always true. Applications and libraries contain bugs that can cause freezes, or maybe interaction with an external buggy component causes freezes (network problems, database server problems, etc). Our point of view was that the developer and system administrator should be responsible for these kind of problems. If the developer/administrator does not manually intervene, the system may remain unusable.

Starting from Phusion Passenger 3, we began turning away from this philosophy. Our core philosophy has always been that software should Just Work™ with the least amount of hassle, and that software should strive to be zero-maintenance. As a result, Phusion Passenger 3 introduced the Watchdog, Phusion Passenger Enterprise introduced request time limiting, and Phusion Passenger 4 will introduce application spawning time limiting. These things attack different aspects of the freezing process problem:

  1. Application spawning time limiting solves the problem of application processes not starting up quickly enough, or application processes freezing during startup. This feature will be included in the open source version of Phusion Passenger.
  2. Request time limiting solves the problem of application processes freezing during a web request. This feature is not available in the open source version of Phusion Passenger, and only in Phusion Passenger Enterprise.
  3. The Watchdog traditionally only solves the problem of Phusion Passenger helper processes (namely the HelperAgent and LoggingAgent) freezing during web server shutdown. Now, it also solves the problem of application processes freezing during web server shutdown. These features will too be included in the open source version of Phusion Passenger.

The shutdown procedure and the fix

When you shut down your web server, the Watchdog will be the one to notice this event. It will send a message to the LoggingAgent and the HelperAgent to tell them to gracefully shut down. In turn, the HelperAgent will tell all application processes to gracefully shut down. The HelperAgent does not wait until they’ve actually shut down. Instead it will just assume that they will eventually shut down. It is for this reason that even if you shut down your web server, application processes may stay behind.

The Watchdog was already designed to assume that agent processes could freeze during shutdown, so it gives agent processes a maximum amount of time during which they shut down gracefully. If they don’t, then the Watchdog will forcefully kill them with SIGKILL. It wouldn’t just kill the agent processes, but also all application processes.

The fix was therefore pretty straightforward. Always have the watchdog kill applications processes, even if the HelperAgent terminates normally and in time. The final fix was effectively 3 lines of code.

Utilizing Unix process groups

The most straightforward method to shutdown application processes would be to maintain a list of their PIDs and then killing them one-by-one. The Watchdog however uses a more powerful and little-used Unix mechanism, namely process groups. Let me first explain them.

A system can have multiple process groups. A process belongs to exactly one process group. Each process group has exactly 1 “process group leader”. The process group ID is equal to the PID of the group leader.

The process group that a process belongs to is inherited from its parent process upon forking. However a process can be a member of any process group, no matter what group the parent belongs to.

Same-colored processes denote processes belonging to the same process group. As you can see in the process tree on the right, process group membership is not constrained by parent-child relationships.

You can simulate the process tree on the right using the following Ruby code.

top = $$
puts "Top process PID: #{$$}"
pid = fork do
  pid = $$
  pid2 = fork do
    pid2 = $$
    pid3 = fork do
      pid3 = $$
      sleep 0.1
      puts "#{top} belongs to process group: #{Process.getpgid(top)}"
      puts "#{pid} belongs to process group: #{Process.getpgid(pid)}"
      puts "#{pid2} belongs to process group: #{Process.getpgid(pid2)}"
      puts "#{pid3} belongs to process group: #{Process.getpgid(pid3)}"

    # We change process group ID of pid3 after the fact!
    Process.setpgid(pid3, top)


As you can see, you can change the process group membership of any process at any time, provided you have the permissions to do so.

The kill() system call provides a neat little feature: it allows you to send a signal to an entire process group! You can already guess where this is going.

Whenever the Watchdog spawns an agent process, it creates a new process group for it. The HelperAgent and the LoggingAgent are both process group leaders of their own process group. Since the HelperAgent is responsible for spawning application processes, all application processes automatically belong to the HelperAgent’s process group, unless the application processes themselves change this.

Upon shutdown, the Watchdog sends SIGKILL to HelperAgent’s and the LoggingAgent’s process groups. This ensures that everything is killed, including all application processes and even whatever subprocesses they spawn.


In this article we’ve considered several reasons why processes may freeze and how to debug them. We’ve seen which mechanisms Phusion Passenger provides to combat frozen processes. We’ve introduced the reader to Unix process groups and how they interact with Unix signal handling.

A lot of engineering has gone into Phusion Passenger to ensure that everything works correctly even in the face of failing software components. As you can see, Phusion Passenger uses proven and rock-solid Unix technologies and concepts to implement its advanced features.

Support us by buying Phusion Passenger Enterprise

It has been a little over a month now since we’ve released Phusion Passenger Enterprise, which comes with a wide array of additional features not found in the open source Phusion Passenger. However as we’ve stated before, we remain committed to maintaining and developing the open source version. This is made possible by the support of our customers; your support. If you like Phusion Passenger or if you want to enjoy the premium Enterprise features, please consider buying one or more Phusion Passenger Enterprise licenses. Development of the open source Phusion Passenger is directly funded by Phusion Passenger Enterprise sales.

Let me know your thoughts about this article in the comments section and thank you for your support!

Mail in 2012 from an admin’s perspective

By Luuk Hendriks on September 10th, 2012

There has been a lot of discussion about the current state of e-mail and how we use it. Many blog posts popped up, describing what has changed in the past several decades and how we could adapt to the new way e-mail is used. Different approaches and possible solutions are presented, but they all have one thing in common: they try to solve problems on the side of the end-user, the one reading his or her e-mail. But over those same several decades, a lot has changed for the system administrators too. And while most of the e-mail users don’t know (or even want to know) what happens behind the scenes, admins have sleepless nights to make sure you get your e-mails.

This blog post comes from the fact that I had an unnecessary hard time setting up SRS in Postfix. If you want to know how to do just that, please find the instructions in the latter part of this post. I’ll start with a short intro on what SPF and SRS are, why we need them, and why you might need them too most likely. Lastly a short introduction to DKIM is given, which is another nice way of improving the e-mail system. All of the described here are techniques to improve your and others’ mailing experience by verifying the sender: this helps identifying and reducing spam, and also prevents legitimate servers being wrongfully accused for spamming or other unwanted behaviour. The techniques are not new, in fact they have proven themselves in the last couple of years. Many e-mail service providers check SPF and DKIM, and Google Gmail’s spam filtering mechanisms take the validity into account. They even enforce DKIM validity on eBay and Paypal messages as these domains are obviously interesting for phising and abuse alike.

Rocket Science

Is setting up an e-mail server that hard? In the basics, no. Setting up an MTA to send a message to another MTA is not anywhere near rocket science. And while e-mail is nothing more than exchanging aforementioned messages, we are already halfway, right? But the challenges administrators have to deal with are spam (end-users know that one of course) and all the problems that come with it. Ever got an e-mail from Or got a complaint that sent an e-mail, but that someone doesn’t even exist? Forging of e-mail headers (especially the ‘From’ field) is easy, but as e-mail is a vital part of almost anyone’s daily routine, it has become problematic. So what can we do from an administrators point of view, to keep e-mail usable in 2012?

Identify and verify the mailing servers

As the forging of headers happens outside the control of the admin, methods of fighting it are limited by nature. Imagine a scenario where you control server A, but server B sends a forged e-mail to server C making it look like it came from server A (by forging the ‘From’ header). The message does not even pass your server A, so what can you do?
Only one thing in the forged e-mail is under your control: the domain name. And that is where the Sender Policy Framework (SPF) relies on. By adding special SPF records pointing to your mail servers to the DNS system, a receiving party can check whether the sending server is allowed to send mail from that domain name. Continuing on the previous example: hopefully, server C uses SPF to check the authenticity of the e-mail it got from B. As an admin, I published DNS records for server A containing the IP of server A (so A is allowed to send mail for, but B’s IP is not in that record. Server C checks the SPF record from and notices B is not allowed to use that domain name. A forged e-mail! Of course it is up to C to decide which way to handle the e-mail, be it dropping it, or bouncing it, or whatever they see fit.
Hopefully it has become clear that SPF is something that is not fully under your control. The only thing you can do, is provide the correct records in your DNS system, and hope that other mail servers to their SPF checks. To reduce spam towards your servers, you should of course do the SPF checking as well. I won’t go into detail about setting the records, but for the content and format check out this clear overview.

So far so good, right? Using SPF you can reduce the incoming spam, and you help other administrators to find out whether e-mails really came from your domain. But there is one problem: forwarding.
At Phusion, some of the guys like to have e-mail forwarded to their Gmail account. The forwarding itself is a breeze: Postfix has a map of virtual addresses pointing to the right Gmail address. But in an SPF-scenario, the following happens: sends an e-mail to, which is forwarded to and Google checks SPF records, but the check will fail! The mail originates from, so this domain is queried for an SPF record. But, the mail is received from the servers at It is a shortcoming of the SPF paradigm, with a solution called the Sender Rewriting Scheme (SRS). As the name suggests, SRS rewrites the sender (on the envelope), so the next server/MTA in the chain can check the SPF records of the right domain, instead of the original domain where the e-mail originated from. So in our described scenario, will become on the envelope, so when it is received by and Google’s server can check the SPF records for (as opposed to the records of Now, the check will pass.

As only the ‘Envelope’ is rewritten, the receiving end-user will see the normal ‘From’ headers and therefore the normal name of the sender. Almost completely invisible for the end-users, but some clients can show whether the e-mail comes from a valid source, like Gmail adds ‘via’ to the sender’s address.
Ever saw that one? That is SPF, maybe complemented by SRS. Or maybe you’re familiar with the Gmail popup shown above, telling you the SPF check passed (Mailed-by) but also the DKIM-signature (more on that later) was valid (Signed-by).

What is the hassle then? Maybe you already knew all of this, and you might think it is not that big of a deal. Maybe your MTA does this out of the box (please tell me then), but at Phusion we use Postfix and it costed me some sweat to get SRS up and running. To me, it seemed like almost nobody in the world uses SRS. If I’m new to a technology or tool, the first thing I do is engage the Google-fu to find out what other people use. What is hot on Github, what do people on StackOverflow suggest, which tools and software is still actively supported, that kind of things. The most recent stuff on SRS was from 2003 or 2004, and of course, mail is something well established but was there really no one interested in SRS for the last 8 years? Or is Postfix the problem?
Then we found pfix-tools (Github). It seems people are interested in SRS after all, or at least the developers behind this tool suite. As their readme states: “pfixtools is a suite of tools aiming at complementing postfix, to make it even more customizable, while keeping really high performance levels.” It contains pfix-srsd, a daemon to do the SRS rewritings for Postfix. Sweet! For those of you using Postfix, I will try to describe briefly but clearly how to set-up pfix-srsd.

Installation details: SRS for Postfix on Debian

FYI, this was done on a Debian 6 system.

Download and compile pfix-srsd

tar -xzvf pfixtools-0.8
aptitude install libev3 libev-dev libsrs2-0 libsrs2-dev libpcre3-dev
git clone
cd pfixtools
git submodule init
git submodule update

cd common
cd ../pfix-srsd

Finally, move the resulting binary pfix-srsd to e.g. /usr/local/bin/ (the following steps will use this path, so be careful if you change it)

Create config and secrets files


In /etc/postfix/pfix-srs.secrets, enter lots of random stuff, preferably from some kind of random generator

Fix the permissions:

chown postfix:postfix /etc/postfix/pfix-srs.secrets
chmod 400 /etc/postfix/pfix-srs.secrets

In /etc/postfix/ you can put addresses that should not be SRS’ed.
Always compile this file after you’ve changed it with postmap /etc/postfix/


NB: We use daemontools, but an example sysinit script is available here.



if [ -f $PFIXSRSD_CONFIG ]; then
echo "Error reading config file, aborting.." 
exit 1

exec setuidgid postfix $DAEMON -f $OPTIONS $DOMAIN $SECRETS >> /var/log/pfix-srsd.log 2>&1

Don’t forget to make it executable by chmod +x run

Modify Postfix config

add to /etc/postfix/

# SRS Remapping
recipient_canonical_maps = hash:/etc/postfix/, tcp:
recipient_canonical_classes = envelope_recipient
sender_canonical_maps = hash:/etc/postfix/, tcp:
sender_canonical_classes = envelope_sender

Reload and test

Afterwards, reload the postfix service (and maybe your firewall):

invoke-rc.d postfix reload

To test, you can send a test message to a Gmail account and check the headers. It should contains something alike the following:

Received-SPF: pass ( domain of designates as permitted sender);

Or use some of the available online testing tools, like this one. That might come in handy when you also get ur DKIM going, which you should!

Thanks and credits to Christoph Fischer who wrote a large part of the pfix-tools installation and configuration in German on his website.

What else can we do: an intro to DKIM

Another way to validate messages and their senders is DKIM, short for DomainKeys Identified Mail. It is based on digital signatures based on public-key cryptography. The public key is again available via DNS, the private key is used by the sender to create a signature and adding it in the DKIM-Signature: header. This header furthermore contains a specification of which fields are used to generate the signature, the signing algorithm, and of course the domain name. If you have a grasp of how public key cryptography works, you probably see the point already. But lets see how things are in the real world.

Google uses DKIM on their Gmail servers, not very surprisingly. Search for an e-mail from in your mail, and look for the headers. You want something like this:

DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113;

What do we see? The (mostly used) rsa-sha256 algorithm (a), the domain is (d), the headers used to construct the signature, the body hash (bh), and the signature itself (b). But very important is the selector (s), which is used in the DNS query. The following command –make sure dig is available on your system– shows the DKIM public key that we need to verify whether the signature is valid:

luuk@polecat> dig +short TXT
"k=rsa\; p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA1Kd87/UeJjenpabgbFwh+
ZB5DekAo5wMmk4wimDO+U8QzI3SD0" "7y2+07wlNWwIt8svnxgdxGkVbbhzY8i+RQ9DpSVpPbF7

Notice that this is a nice way to check if your DNS settings are as you expect them to be. To fetch the record, the selector (s) is used in conjunction with the domain (d): both were in the DKIM-signature header from the received e-mail. The ._domainkey. in between is defined by the standard (RFC6376) thus always should be there.

DKIM in practice

Not to hard in theory, is it? As an administrator you should take care of only two things: publish the TXT record in your DNS, and add the signatures to outgoing mails (and of course check the signatures on incoming mails, but that is taken care of by the same daemon most likely).

If you are interested in an exact and detailed way of setting up DKIM signing (again, for Debian/Postfix), please check out this guide at In a nutshell, you need the dkim-filter package that provides, amongst others, dkim-genkey and the dkim-filter daemon listening at a socket on your system. In the Postfix config a milter is specified, so all mail is proxied through the DKIM signing daemon. This way, signatures on incoming mails are verified, and signatures are added before outgoing messages leave your system. Rest is up to the receiving (or intermediate) parties to check your signatures, using your published DNS records. Easy as pie, effective like a horse.

Useful links: