Phusion white papers Phusion overview

Phusion Blog

Traveling Ruby 20150210: smaller, supports Ruby 2.2, Windows

By Hongli Lai on February 9th, 2015

Traveling Ruby allows you to create self-contained, “portable” Ruby binaries which can run on any Windows machine, any Linux distribution and any OS X machine. This allows Ruby app developers to distribute a single package to end users, without needing end users to first install Ruby or gems.

We’ve released version 20150210, which marks a major milestone. It introduces Windows support, and support for Ruby 2.2.

Backstory

There’s a little bit of a backstory behind this release.

Last week I went to Amsterdam.rb’s MRI Implementors Panel meetup featuring Koichi Sasada, Terence Lee and Zachary Scott. Distribution of Ruby apps came up at some point. Terence and Koichi talked about the fact that MRuby — an alternative Ruby implementation that Yukihiro Matsumoto is working on — is able to precompile Ruby apps into a single self-contained binary with no dependencies. They asserted that the ability to produce such a binary is the reason why so many people are moving to Go: it makes deployment and distribution much easier. But I was unconvinced about the value of MRuby, because MRuby is a subset of Ruby. In my opinion, Ruby’s main power lies in its standard library and its rich ecosystem.

So I talked with Terence and with Eloy Durán (MacRuby/RubyMotion) about Traveling Ruby. Traveling Ruby is different in that it’s just a normal MRI Ruby interpreter, and not a subset of the language with limitations. You can even use many native extensions. However, Terence and Eloy were equally unconvinced, asserting that Traveling Ruby doesn’t support Windows, and can’t produce a single binary. Traveling Ruby currently generates a self-contained directory with no dependencies. This directory contains a single wrapper script plus a lib subdirectory that contains the bulk of the app, so it looks almost like a single self-contained binary. But that’s not good enough for them.

Then we contemplated how sad it is that so many parties are moving off Ruby towards Go.

So I implemented Windows support last weekend. This is just a minimum viable product: at present, there are lots of caveats and limitations in the Windows support. In a future release, I plan on introducing the ability to produce a single self-contained binary.

Although some people within Phusion are big Go fans, I am personally a big Ruby fan. I am in love with Ruby’s simplicity, elegance and productivity. There are high-quality libraries for almost every task imagineable. The only things Ruby isn’t good at are tasks which require very low memory consumption or very high performance, but most of the problems I’m solving do not require either. So with the continued development of Traveling Ruby, I am hoping that I can prevent more people from switching to Go due to distribution issues, or even switch some people back.

Changes

Ruby 2.2.0

Previous Traveling Ruby versions used Ruby 2.1.5. But we now also support Ruby 2.2.0 in addition to 2.1.5. In the future we may drop support for 2.1.5, but for now both versions are supported at the same time. You may be interested in Ruby 2.2 because of the better garbage collector and performance characteristics.

Windows support

We now support creating Windows packages. But there are currently a number of caveats:

  • Traveling Ruby supports creating packages for Windows, but it does not yet support creating packages on Windows. That is, the Traveling Ruby tutorials and the documentation do not work when you are a Ruby developer on Windows. To create Windows packages, you must use OS X or Linux.

    This is because in our documentation we make heavy use of standard Unix tools. Tools which are not available on Windows. In the future we may replace the use of such tools with Ruby tools so that the documentation works on Windows too.

  • Only Ruby 2.1.5 is supported for Windows, not 2.2.0. This is because the RubyInstaller project hasn’t released Ruby 2.2.0 binaries yet.

Gem upgrades and other changes

  • Fixed a problem with the ‘rugged’ native extension on Linux. Closes GH-33.
  • Fixed a problem with the ‘charlock_holmes’ native extension on Linux. Closes GH-34.
  • Header files are no longer packaged. This saves 256 KB.
  • RDoc and various unnecessary Bundler files have been removed. This saves about 1.2 MB.
  • Upgraded Bundler to 1.7.12.

Introducing Traveling Ruby

By Hongli Lai on December 8th, 2014

Ruby is one of our favorite programming languages. Most people use it for web development, but Ruby is so much more. We at Phusion have been using Ruby for years for writing sysadmin automation scripts, developer command line tools and more. Heroku’s Toolbelt and Chef have also demonstrated that Ruby is an excellent language for these sorts of things.

However, distributing Ruby apps to non-Ruby-programmer end users on Linux and OS X is problematic. If you require users to install Ruby or to use RubyGems, they can get into trouble or become frustrated.

Creating platform-specific packages for each Linux distro and each OS requires a lot of work. Because building such packages requires a fleet of VMs, building packages takes a lot of time.

Our solution to this problem is Traveling Ruby, which lets you create self-contained Ruby app packages for Linux and OS X. It is a project which supplies self-contained, “portable” Ruby binaries: Ruby binaries that can run on any Linux distribution and any OS X machine. This allows Ruby app developers to bundle these binaries with their Ruby app, so that they can distribute a single package to end users, without needing end users to first install Ruby or gems.

Learn more about the motivation behind Traveling Ruby.

Key benefits

  • Self-contained: apps packaged with Traveling Ruby are completely self-contained and don’t require the user to install any further dependencies or runtimes.
  • Simple & easy: Traveling Ruby is very simple to use and very easy to learn. No complicated tooling to learn. You can grasp the basics in just 5 minutes.
  • Fast & lightweight: produce packages for multiple OS targets, regardless of which OS you are developing on. This is achieved without the need for heavyweight tools like VMs.

Learn more

logo

You can learn more about Traveling Ruby here:

Farewell, Jim.

By Ninh Bui on February 20th, 2014

Today, the sad news has reached us that Jim Weirich has passed away. We’re incredibly sad about this
as Jim was one of the nicest people we’ve got to know in the Ruby/Rails community when we first started Phusion.
In keeping his memory alive, I’d like to reflect on a particular anecdote that made Jim especially awesome to us
and most likely to you as well. I’m sure many of you who were fortunate enough to get to know him can relate to his
kindness.

Back in 2008 when Hongli, Tinco and I set out to go to RailsConf to give our very first talk abroad, we
met Jim in the lobby of the conference space. We had just attended a talk of his where he had gone through
a myriad of valuable do’s and don’ts one should be aware of when giving a talk. These tips proved to be
incredibly valuable to us in years to come, and we hope Jim knows how grateful we are for this.

Our talk was scheduled to be held the day after, and seeing Jim’s do’s and don’ts, we were suddenly confronted
with how many embarassing “don’ts” we had in our slides. As Jim told the audience that it’s generally a good idea to avoid
cliches such as having bulletpoint hell, stock images of “the world” and “business people shaking hands”, we felt
more and more uncomfortable. Not only did we have a lot of bulletpoints, we even had an image of “business people
shaking hands”… in front of “the world”. We basically tripped over every possible cliche in the book!

But hey, we still had 24 hours, surely we’d be able to fix this right? Luckily, Jim had the demeanor of a big
kind cuddly bear, so we felt compelled to walk up to him after his talk to ask for some help with our slides.
Instead of brushing us off, Jim graciously sat down with us for about 2 hours in pointing out the things that
could use improvement in the delivery of our talk. And understandibly laughed out loud at our slide with the business people
shaking hands in front of the world. 😉

The next day, after giving our talk, we had people walking up to us saying that we killed it. In reality, it was
Jim’s tips and kindness in sharing these tips that “killed it”.

We will miss you buddy.

Your friends,
Tinco, Hongli and Ninh.

Ruby Rogues 143: Phusion Passenger Enterprise with Hongli Lai and Tinco Andringa

By Hongli Lai on February 12th, 2014

We’ve been invited by Ruby Rogues to participate in a podcast about Phusion Passenger Enterprise. This podcast covers the following topics:

  • Hongli Lai and Tinco Andringa Introductions
  • Phusion Passenger introduction
  • Rack
  • Node.js, MeteorJS, Python Support
  • Processes and Threads
    • Ruby Rogues Episode #58 – Book Club: Working with Unix Processes with Jesse Storimer
    • Ruby Enterprise Edition
    • Smart Spawning
  • Advantages of Phusion Passenger Enterprise
    • Rolling Restarts
    • Mass Deployment
  • Passenger vs Unicorn
  • Error Resistant Deploys
  • Hosting
    • DreamHost
  • Apache, Nginx support
  • Stability Issues
  • Documentation and Support

Listen to the podcast at the Ruby Rogues website

Thanks Ruby Rogues for hosting us!

Phusion Passenger 4.0.21 released, supports OS X Mavericks, JRuby 1.7.6

By Hongli Lai on October 23rd, 2013


Phusion Passenger is a fast and robust web server and application server for Ruby, Python and Node.js. It works by integrating into Apache and Nginx and turning them into a fully-featured application server. It has high-profile users such as New York Times, AirBnB, Juniper, Motorola, etc, and comes with many features that make your life easier and your application perform better.

Phusion Passenger is under constant maintenance and development. Version 4.0.21 is a bugfix release.

Phusion Passenger also has an Enterprise version which comes with a wide array of additional features. By buying Phusion Passenger Enterprise you will directly sponsor the development of the open source version.

Recent changes

  • Preliminary OS X Mavericks support. Note: if you’re having trouble installing, try reinstalling the Developer Tools.
  • Supports JRuby 1.7.6.
  • Open sourced Node.js support.
  • [Nginx] Upgraded the preferred Nginx version to 1.4.3.
  • Work around an Apache packaging bug in CentOS 5.
  • Various user friendliness improvements in the documentation and the installers.
  • Fixed a bug in the always_restart.txt support. Phusion Passenger was looking for it in the wrong directory.
  • Many Solaris and Sun Studio compatibility fixes. Special thanks to "mark" for his extensive assistance.
  • [Standalone] The –temp-dir command line option has been introduced.

Installing or upgrading to 4.0.21

OS X OS X Debian Debian Ubuntu Ubuntu
Heroku Heroku Ruby gem Ruby gem Tarball Tarball

Final

Phusion Passenger’s core is open source. Please fork or watch us on Github. 🙂

If you would like to stay up to date with Phusion news, please fill in your name and email address below and sign up for our newsletter. We won’t spam you, we promise.



Phusion Passenger DOES support Ruby 2.0

By Hongli Lai on June 6th, 2013

We’ve lately encountered some confusion among users about whether Phusion Passenger supports Ruby 2.0. In the announcement for Release Candidate 2 we mentioned that we had problems with Ruby 2.0, but unfortunately we didn’t word that accurately enough, which caused confusion among users. We hope that this blog post will clear things up.

  • Phusion Passenger does support Ruby 2.0, since version 4.0.0.
  • In the RC 2 announcement we merely claimed that we encountered a lot of bugs (for example this one) in Ruby 2.0.0 itself, and that we therefore would not recommend using Ruby 2.0.0 yet. These bugs have got nothing to do with Phusion Passenger. Phusion Passenger supports Ruby 2.0.0 regardless of what bugs Ruby 2.0.0 has.
  • A lot of Ruby 2.0.0 bugs have been fixed in 2.0.0-p195, but not all. We still get segmentation faults on OS X when using Ruby 2.0.0, with and without Phusion Passenger. These bugs are not incompatibility problems with Phusion Passenger, but are bugs in Ruby 2.0.0. We therefore cannot recommend Ruby 2.0.0 yet, but it’s possible that it works fine for you. You are free to use Ruby 2.0.0 with Phusion Passenger if you so choose.

Update: it looks like manually installing from the tarball makes the crash go away. This makes it likely that the crash was caused by RVM’s compilation process. Thanks a lot for the help by @nalsh.

The new Rack socket hijacking API

By Hongli Lai on January 23rd, 2013


Yesterday saw the release of Rack 1.5.0, which adds a new feature to the Rack specification dubbed socket hijacking. This feature allows applications to take over the client socket and perform arbitrary operations on it, e.g. implementing WebSockets, streaming data to the client, etc.

Did Rack not support streaming? Actually yes it did, you can do it by returning a body object that outputs body chunks in the #each method, as explained in our past article Why Rails 4 Live Streaming is a Big Deal. But this API is a bit clunky. The socket hijacking API provides access to a Ruby IO object-like API.

Support for socket hijacking has been added to Phusion Passenger 4 yesterday. The upcoming Phusion Passenger 4 has been covered here, here and here. Phusion Passenger Enterprise customers can already test and enjoy a preview of this feature by downloading the “3.9.2 beta preview (4.0.0 beta 2)” file from the Customer Area.

The socket hijacking API was surprisingly easy to implement, but unfortunately poorly documented at this time. The application-level API is not immediately obvious, and the Rack specification documentation has not yet been updated to cover the hijacking API. In this article we’ll introduce the API and provide an example program.

What the socket hijacking API is not

Some of you may have heard of efforts to develop a “Rack 2.0” specification which properly covers things such as streaming and evented servers. According to the hijacking API developer, this API is not an attempt towards Rack 2.0. It is a “good enough” solution that works within the confines of the Rack 1.x specification. Things may change in Rack 2.0, though at this time it’s unclear what the progress towards Rack 2.0 is.

It is also unclear whether the API is supposed to be final or not. While implementing this API and writing this article we’ve discovered some room for improvement. The suggestions (which you can find later in this article) have been submitted to the developers.

Overview of the API

The hijacking API provides two modes:

  1. A full hijacking API, which gives the application complete control over what goes over the socket. In this mode, the application server doesn’t send anything over the socket, and lets the application take care of it. This mode is useful if you want to implement arbitrary (even non-HTTP) protocols over the socket. This is subject to limitations: if your application is behind a web server or an HTTP load balancer then those components dictate which protocols you can implement.
  2. A partial hijacking API, which gives the application control over the socket after the application server has already sent out headers. This mode is mostly useful for streaming.

The hijacking API is accessible through the Rack env hash. You can check whether the application server supports the hijacking API by checking env['rack.hijack?'], which returns a boolean value.

Full hijacking

You can perform a full hijack by calling env['rack.hijack'].call. You can access the hijacked socket object through env['rack.hijack_io']. Phusion Passenger’s implementation of env['rack.hijack'] returns the socket object, but it is unclear whether this is supposed to be standard behavior.

You are responsible for:

  • Outputting any HTTP headers, if applicable.
  • Closing the IO object when you no longer need it.

You should output the “Connection: close” header unless you plan on implementing HTTP keep-alive yourself.

Here’s am example of the full hijacking API in action:

# encoding: utf-8
require 'thread'

# Streams the response "Line 1" .. "Line 10", with
# 1 second sleep time between each line.
# 
# Non-Phusion Passenger users may have to turn off their
# web servers' buffering options for streaming to work.
# Phusion Passenger 4 users don't have to do anything, it
# works out-of-the-box thanks to our real-time response
# buffering feature.
app = lambda do |env|
  # Fully hijack the client socket.
  env['rack.hijack'].call
  io = env['rack.hijack_io']
  begin
    io.write("Status: 200\r\n")
    io.write("Connection: close\r\n")
    io.write("Content-Type: text/plain\r\n")
    io.write("\r\n")
    10.times do |i|
      io.write("Line #{i + 1}!\n")
      io.flush
      sleep 1
    end
  ensure
    io.close
  end
end

run app

Partial hijacking

You can perform a partial hijack by assigning a lambda to the rack.hijack response header. This lambda will be called after the application server has sent out headers. The application server will ignore the body part of the Rack response, and will call the ‘rack.hijack’ lambda, passing it the client socket. You are responsible for closing the socket when it’s no longer needed.

It is unclear what the value of the Rack response body should be. Phusion Passenger’s implementation doesn’t care: you can return a two-array response, or a three-array response where where the body can be anything. If the ‘rack.hijack’ response header is set, the body will be completely ignored.

Example:

# encoding: utf-8
require 'thread'

# Streams the response "Line 1" .. "Line 10", with
# 1 second sleep time between each line.
# 
# Non-Phusion Passenger users may have to turn off their
# web servers' buffering options for streaming to work.
# Phusion Passenger 4 users don't have to do anything, it
# works out-of-the-box thanks to our real-time response
# buffering feature.
app = lambda do |env|
  response_headers = {}
  response_headers["Content-Type"] = "text/plain"
  response_headers["rack.hijack"] = lambda do |io|
    # This lambda will be called after the app server has outputted
    # headers. Here we can output body data at will.
    begin
      10.times do |i|
        io.write("Line #{i + 1}!\n")
        io.flush
        sleep 1
      end
    ensure
      io.close
    end
  end
  [200, response_headers, nil]
end

run app

Issues with the hijacking API

Here’s how we think the hijacking API can be improved.

  • env['rack.hijack?'] appears to be unnecessary. You can already check for hijacking support by checking env['rack.hijack'].
  • The partial hijacking API should not involve assigning a lambda to the response headers. As far as we can see, you can just return the lambda as the body. That would be a much more elegant solution.
  • The return value for env['rack.hijack'] should be well-defined.

Conclusion

The Rack hijacking API, while having some quirks in our opinion, gets the job done. We hope that the usage of the hijacking API has become more clear after reading this article. If you have any comments, questions, suggestions or corrections, please let us know.

We at Phusion are working feverishly at the upcoming Phusion Passenger 4 (covered here, here and here). Implementing the hijacking API so quickly is our way of showing you how dedicated we are. Together with Phusion Passenger Enterprise, we aim to deliver the most stable, performant and feature rich polyglot application server out there. If you’re interested in future updates, please subscribe to our newsletter. Until next time!



Phusion Passenger 4.0 supports JRuby, Rubinius

By Hongli Lai on October 30th, 2012

JRuby

Rubinius

Phusion Passenger is an Apache and Nginx module for deploying Ruby and Python web applications. It has a strong focus on ease of use, stability and performance. Phusion Passenger is built on top of tried-and-true, battle-hardened Unix technologies, yet at the same time introduces innovations not found in most traditional Unix servers. Since mid-2012, it aims to be the ultimate polyglot application server.

In the announcement for Phusion Passenger 4.0.0 beta 1, we introduced a myriad of changes such as support for multiple Ruby versions, Python WSGI support, multithreading (Enterprise only), improved zero-copy architecture, better error diagnostics and more. And as we promised, the story would not end there. A commit has just landed in our Github repository for JRuby (1.7.0 required) and Rubinius support!

JRuby: the past and the current state of affairs

JRuby is an excellent Ruby implementation for the JVM, and in the past few years they have been doing a great job with regard to compatibility and performance. But for a long time, application server support for JRuby had been limited:

  • While Mongrel and Thin had limited JRuby support, these setups have not been very popular. Since so few people use these setups, their caveats are not very well known.
  • Unicorn does not support JRuby at all because it was designed to take advantage of Unix features, which JRuby does not (and cannot) always support well.
  • Phusion Passenger was in the same position: we used too many Unix features and were not able to support JRuby well.
  • Goliath does not seem to have official support for JRuby thanks to the unknown status of EventMachine’s Java support.

So the only options left were J2EE app servers such as JBoss, Tomcat, GlassFish and TorqueBox; as well as the recently developed Puma, which is almost pure Ruby.

Thanks to the new ApplicationPool and Spawner architecture in Phusion Passenger 4, we’re now able to support JRuby with ease. Because a lot of code has been moved into C++, we no longer need the Ruby implementation to support Unix features. We only needed an hour to add support for JRuby.

Phusion Passenger vs J2EE

With Phusion Passenger’s support for JRuby, you don’t need to learn about J2EE deployment. Using JRuby on Phusion Passenger is very straightforward: set PassengerRuby to your JRuby command, point the virtual host’s document root to your application’s ‘public’ directory, and you’re done.

With Phusion Passenger Enterprise, JRuby users get to enjoy all the enterprise features such as multithreading, rolling restarts, deployment error resistance, time and memory usage limiting, and more.

Rubinius is impressive as well

We remember that back in the days, Rubinius was quite slow during startup and did not support MRI native extensions. Fast forward to 2012, and what we find is a very impressive Ruby implementation. They have 1.9 support and MRI extension support. The Ruby interpreter starts quickly. They support Unix features. Adding Rubinius support was pretty straightforward. The Rubinius team has done an excellent job!

Why you should use JRuby or Rubinius

JRuby and Rubinius support real multi-core concurrency. JRuby and Rubinius threads map to real OS threads, and neither Ruby implementations have a global interpreter lock. In contrast, MRI Ruby 1.8 uses userspace threading and so cannot take advantage of multi-core using a single process. MRI Ruby 1.9 has real OS threads, but also has a global interpreter lock and so still cannot take advantage of multi-core using a single process.

Granted, the multi-core issue isn’t that big. Phusion Passenger spawns multiple processes in order to take advantage of multi-core. But if you’re in a position in which you can only use 1 process, for whatever reason, then JRuby and Rubinius are what you need. With Phusion Passenger Enterprise’s multithreading support, you can have hybrid multi-processed and multi-threaded applications – the best of both worlds.

JRuby and Rubinius also often have superior performance. Both implementations support JIT compilation, which MRI Ruby does not.

That said, MRI Ruby still has the best compatibility in the Ruby ecosystem, so JRuby and Rubinius are not silver bullets. You should use the best tool for the best job. With Phusion Passenger 4’s support for multiple Ruby versions, this should be a breeze.

Where to get Phusion Passenger with JRuby/Rubinius support

JRuby/Rubinius support will become part of the upcoming 4.0.0 beta 2. Please stay tuned!

These Phusion Passenger 4 updates are just the beginning. We have more exciting changes planned for the near future! Curious? Enter your email address and name below and we’ll keep you up to date.



SHA-3 extensions for Ruby and Node.js

By Hongli Lai on October 6th, 2012

A few days ago, NIST announced the winner of the SHA-3 competition: Keccak (prounced [kɛtʃak], ketchak). The researchers who authored Keccak released a reference implementation in C.

We’ve created Ruby and Node.js extensions for Keccak. Our extensions utilize the code from the official reference implementation and come with a extensive suite of unit tests. But note however that I do not claim to be a security expert. Feel free to review the code for any flaws.

Install with:

gem install digest-sha3
npm install sha3

We’ve strived to emulate the languages’ standard hash libraries’ interfaces, so using these extensions is straightforward:

require 'digest/sha3'
Digest::SHA3.hexdigest("foo")

and

var SHA3 = require('sha3');
var hash = SHA3.SHA3Hash();
hash.update('foo');
hash.digest();

Both libraries are MIT licensed. Enjoy!

Why does the world need SHA-3?

If you’re not a security researcher then you’ve undoubtedly asked the same questions as I did. What’s wrong with SHA-1, SHA-256 and SHA-512? Why does the world need SHA-3? Why was Keccak the winner?

According to to well-known security researcher Bruce Schneier, there’s nothing wrong with the SHA-2 family of hashing functions. However he likes SHA-3 because it’s completely different. SHA-1 and SHA-2 are both based on the Merkle–Damgård construction. It may be feasible to find a flaw in SHA-1 that also affects SHA-2. In contrast, Keccak is based on the sponge construction so any attacks on SHA-1 and SHA-2 will probably not affect Keccak. Indeed, it appears that NIST chose Keccak because it was looking for some kind of insurance in case SHA-2 would be broken. Many people commented that they expected Skein to win.

Do not hash your passwords

In any case, the following cannot be repeated enough. Do not use SHA-3 (or SHA-256, SHA-512, RIPEMD-160 or whatever fast hash) to hash passwords! Do not even use SHA-3 + salt to hash passwords. Instead, use a slow hash like bcrypt. As Coda Hale explained in his article, you’ll want the hash to be slow so you can defend against attackers effectively.

The right way to deal with frozen processes on Unix

By Hongli Lai on September 21st, 2012


Those who administer production Unix systems have undoubtedly encountered the problem of frozen processes before. They just sit around, consuming CPU and/or memory indefinitely until you forcefully shut them down.

Phusion Passenger 3 – our high-performance and advanced web application server – was not completely immune to this problem either. That is until today, because we have just implemented a fix which will be part of Phusion Passenger 4.0 (4.0 hasn’t been released yet, but a first beta will appear soon). It’s going into the open source version, not just Phusion Passenger Enterprise, because we believe this is an important change that should benefit the entire community.

Behind our solution lies an array of knowledge about operating systems, process management and Unix. Today, I’d like to take this opportunity to share some of our knowledge with our readers. In this article we’re going to dive into the operating system-level details of frozen processes. Some of the questions answered by this article include:

  • What causes a process to become frozen and how can you debug them?
  • What facilities does Unix provide to control and to shut down processes?
  • How can a process manager (e.g. Phusion Passenger) automatically cleanup frozen processes?
  • Why was Phusion Passenger not immune to the problem of frozen processes?
  • How is Phusion Passenger’s frozen process killing fix implemented, and under what constraints does it work?

This article attempts to be generic, but will also provide Ruby-specific tips.

Not all frozen processes are made equal

Let me first clear some confusion. Frozen processes are sometimes also called zombie processes, but this is not formally correct. Formally, a zombie process as defined by Unix operating systems is a process that has already exited, but its parent process has not waited for its exit yet. Zombie processes show up in ps and on Linux they have “<defunct>” appended to their names. Zombie processes do not consume memory or CPU. Killing them – even with SIGKILL – has no effect. The only way to get rid of them is to either make the parent process call waitpid() on them, or by terminating the parent process. The latter causes the zombie process to be “adopted” by the init process (PID 1), which immediately calls waitpid() on any adopted processes. In any case, zombie processes are harmless.

In this article we’re only covering frozen processes, not zombie processes. A process is often considered frozen if it has stopped responding normally. They appear stuck inside something and are not throwing errors. Some of the general causes are:

  1. The process is stuck in an infinite loop. In practice we rarely see this kind of freeze.
  2. The process is very slow, even during shutdown, causing it to appear frozen. Some of the causes include:
    2.1. It is using too much memory, causing it to hit the swap. In this case you should notice the entire system becoming slow.
    2.2. It is performing an unoptimized operation that takes a long time. For example you may have code that iterates through all database table rows to perform a calculation. In development you only had a handful of rows so you never noticed this, but you forgot that in production you have millions of rows.
  3. The process is stuck in an I/O operation. It’s waiting for an external process or a server, be it the database, an HTTP API server, or whatever.

Debugging frozen processes

Obviously fixing a frozen process involves more than figuring out how to automatically kill it. Killing it just fixes the symptoms, and should be considered a final defense in the war against frozen processes. It’s better to figure out the root cause. We have a number of tools in our arsenal to find out what a frozen process is doing.

crash-watch

crash-watch is a tool that we’ve written to easily obtain process backtraces. Crash-watch can be instructed to dump backtraces when a given process crashes, or dump their current backtraces.

Crash-watch is actually a convenience wrapper around gdb, so you must install gdb first. It dumps C-level backtraces, not language-specific backtraces. If you run crash-watch on a Ruby program it will dump the Ruby interpreter’s C backtraces and not the Ruby code’s backtraces. It also dumps the backtraces of all threads, not just the active thread.

Invoke crash-watch as follows:

crash-watch --dump <PID>

Here is a sample output. This output is the result of invoking crash-watch on a simple “hello world” Rack program that simulates being frozen.

Crash-watch is especially useful for analyzing C-level problems. As you can see in the sample output, the program is frozen in a freeze_process call.

Crash-watch can also assist in analyzing problems caused by Ruby C extensions. Ruby’s mysql gem is quite notorious in this regard because it blocks all Ruby threads while it’s doing its work. If the MySQL server doesn’t respond (e.g. because of network problems, because the server is dead, or because the query is too heavy) then the mysql gem will freeze the entire process, making it even unable to respond to signals. With crash-watch you are able to clearly see that a process is frozen in a mysql gem call.

Phusion Passenger’s SIGQUIT handler

Ruby processes managed by Phusion Passenger respond to SIGQUIT by printing their backtraces to stderr. On Phusion Passenger, stderr is always redirected to the global web server error log (e.g. /var/log/apache2/error.log or /var/log/nginx/error.log), not the vhost-specific error log. If your Ruby interpreter supports it, Phusion Passenger will even print the backtraces of all Ruby threads, not just the active one. This latter feature requires either Ruby Enterprise Edition or Ruby 1.9.

Note that for this to work, the Ruby interpreter must be responding to signals. If the Ruby interpreter is frozen inside a C extension call (such as is the case in the sample program) then nothing will happen. In that case you should use crash-watch or the rb_backtrace() trick below.

rb_backtrace()

If you want to debug a Ruby program that’s not managed by Phusion Passenger then there’s another trick to obtain the Ruby backtrace. The Ruby interpreter has a nice function called rb_backtrace() which causes it to print its current Ruby-level backtrace to stdout. You can use gdb to force a process to call that function. This works even when the Ruby interpreter is stuck inside a C call. This method has two downsides:

  1. Its reliability depends on the state of the Ruby interpreter. You are forcing a call from arbitrary places in the code, so you risk corrupting the process state. Use with caution.
  2. It only prints the backtrace of the active Ruby thread. It’s not possible to print the backtraces of any other Ruby threads.

First, start gdb:

$ gdb

Then attach gdb to the process you want:

attach <PID>

This will probably print a whole bunch of messages. Ignore them; if gdb prints a lot of library names and then asks you whether to continue, answer Yes.

Now we get to the cream. Use the following command to force a call to rb_backtrace():

p (void) rb_backtrace()

You should now see a backtrace appearing in the process’s stdout:

from config.ru:5:in `freeze_process'
from config.ru:5
from /Users/hongli/Projects/passenger/lib/phusion_passenger/rack/thread_handler_extension.rb:67:in `call'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/rack/thread_handler_extension.rb:67:in `process_request'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler/thread_handler.rb:126:in `accept_and_process_next_request'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler/thread_handler.rb:100:in `main_loop'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/utils/robust_interruption.rb:82:in `disable_interruptions'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler/thread_handler.rb:98:in `main_loop'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler.rb:432:in `start_threads'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler.rb:426:in `initialize'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler.rb:426

strace and dtruss

The strace tool (Linux) and the dtruss tool (OS X) can be used to see which system calls a process is calling. This is specially useful for detecting problems belonging to categories (1) and (2.2).

Invoke strace and dtruss as follows:

sudo strace -p <PID>
sudo dtruss -p <PID>

Phusion Passenger’s role

Phusion Passenger

Phusion Passenger was traditionally architected to trust application processes. That is, it assumed that if we tell an application process to start, it will start, and if we tell it to stop it will stop. In practice this is not always true. Applications and libraries contain bugs that can cause freezes, or maybe interaction with an external buggy component causes freezes (network problems, database server problems, etc). Our point of view was that the developer and system administrator should be responsible for these kind of problems. If the developer/administrator does not manually intervene, the system may remain unusable.

Starting from Phusion Passenger 3, we began turning away from this philosophy. Our core philosophy has always been that software should Just Work™ with the least amount of hassle, and that software should strive to be zero-maintenance. As a result, Phusion Passenger 3 introduced the Watchdog, Phusion Passenger Enterprise introduced request time limiting, and Phusion Passenger 4 will introduce application spawning time limiting. These things attack different aspects of the freezing process problem:

  1. Application spawning time limiting solves the problem of application processes not starting up quickly enough, or application processes freezing during startup. This feature will be included in the open source version of Phusion Passenger.
  2. Request time limiting solves the problem of application processes freezing during a web request. This feature is not available in the open source version of Phusion Passenger, and only in Phusion Passenger Enterprise.
  3. The Watchdog traditionally only solves the problem of Phusion Passenger helper processes (namely the HelperAgent and LoggingAgent) freezing during web server shutdown. Now, it also solves the problem of application processes freezing during web server shutdown. These features will too be included in the open source version of Phusion Passenger.

The shutdown procedure and the fix

When you shut down your web server, the Watchdog will be the one to notice this event. It will send a message to the LoggingAgent and the HelperAgent to tell them to gracefully shut down. In turn, the HelperAgent will tell all application processes to gracefully shut down. The HelperAgent does not wait until they’ve actually shut down. Instead it will just assume that they will eventually shut down. It is for this reason that even if you shut down your web server, application processes may stay behind.

The Watchdog was already designed to assume that agent processes could freeze during shutdown, so it gives agent processes a maximum amount of time during which they shut down gracefully. If they don’t, then the Watchdog will forcefully kill them with SIGKILL. It wouldn’t just kill the agent processes, but also all application processes.

The fix was therefore pretty straightforward. Always have the watchdog kill applications processes, even if the HelperAgent terminates normally and in time. The final fix was effectively 3 lines of code.

Utilizing Unix process groups

The most straightforward method to shutdown application processes would be to maintain a list of their PIDs and then killing them one-by-one. The Watchdog however uses a more powerful and little-used Unix mechanism, namely process groups. Let me first explain them.

A system can have multiple process groups. A process belongs to exactly one process group. Each process group has exactly 1 “process group leader”. The process group ID is equal to the PID of the group leader.

The process group that a process belongs to is inherited from its parent process upon forking. However a process can be a member of any process group, no matter what group the parent belongs to.


Same-colored processes denote processes belonging to the same process group. As you can see in the process tree on the right, process group membership is not constrained by parent-child relationships.

You can simulate the process tree on the right using the following Ruby code.

top = $$
puts "Top process PID: #{$$}"
Process.setpgrp
pid = fork do
  pid = $$
  pid2 = fork do
    pid2 = $$
    Process.setpgrp
    pid3 = fork do
      pid3 = $$
      sleep 0.1
      puts "#{top} belongs to process group: #{Process.getpgid(top)}"
      puts "#{pid} belongs to process group: #{Process.getpgid(pid)}"
      puts "#{pid2} belongs to process group: #{Process.getpgid(pid2)}"
      puts "#{pid3} belongs to process group: #{Process.getpgid(pid3)}"
    end

    # We change process group ID of pid3 after the fact!
    Process.setpgid(pid3, top)

    Process.waitpid(pid3)
  end
  Process.waitpid(pid2)
end
Process.waitpid(pid)

As you can see, you can change the process group membership of any process at any time, provided you have the permissions to do so.

The kill() system call provides a neat little feature: it allows you to send a signal to an entire process group! You can already guess where this is going.

Whenever the Watchdog spawns an agent process, it creates a new process group for it. The HelperAgent and the LoggingAgent are both process group leaders of their own process group. Since the HelperAgent is responsible for spawning application processes, all application processes automatically belong to the HelperAgent’s process group, unless the application processes themselves change this.

Upon shutdown, the Watchdog sends SIGKILL to HelperAgent’s and the LoggingAgent’s process groups. This ensures that everything is killed, including all application processes and even whatever subprocesses they spawn.

Conclusion

In this article we’ve considered several reasons why processes may freeze and how to debug them. We’ve seen which mechanisms Phusion Passenger provides to combat frozen processes. We’ve introduced the reader to Unix process groups and how they interact with Unix signal handling.

A lot of engineering has gone into Phusion Passenger to ensure that everything works correctly even in the face of failing software components. As you can see, Phusion Passenger uses proven and rock-solid Unix technologies and concepts to implement its advanced features.

Support us by buying Phusion Passenger Enterprise

It has been a little over a month now since we’ve released Phusion Passenger Enterprise, which comes with a wide array of additional features not found in the open source Phusion Passenger. However as we’ve stated before, we remain committed to maintaining and developing the open source version. This is made possible by the support of our customers; your support. If you like Phusion Passenger or if you want to enjoy the premium Enterprise features, please consider buying one or more Phusion Passenger Enterprise licenses. Development of the open source Phusion Passenger is directly funded by Phusion Passenger Enterprise sales.

Let me know your thoughts about this article in the comments section and thank you for your support!