Phusion white papers Phusion overview

Phusion Blog

A sneak preview of Phusion Passenger 3.2, part 2

By Hongli Lai on April 25th, 2012

Phusion Passenger is an Apache and Nginx module for deploying Ruby web applications. It has a strong focus on ease of use, stability and performance. Phusion Passenger is built on top of tried-and-true, battle-hardened Unix technologies, yet at the same time introduces innovations not found in most traditional Unix servers.

In our last Phusion Passenger sneak preview article we described a number of large but exciting changes in the upcoming 3.2 series. Since then, much development progress has been made thanks to much-appreciated user feedback. A ton of bugs have been fixed and we’re almost ready to dogfood it (running it in production on our own servers). We’re already using the 3.2 pre-release code on our development workstations while developing web apps.

In this article I shall explain more changes that the 3.2 series brings.

More zero-copy I/O

In the Phusion Passenger 3.0 technology preview articles we first described the introduction of a zero-copy I/O architecture. In 3.2, this architecture has evolved a lot further. In most performance-critical places we now avoid copying data whenever we can, and we use scatter-gather I/O calls all over the place instead of traditional I/O calls.

What is scatter-gather I/O? Normally when you have strings from multiple memory addresses, and you want to write them over a file descriptor, you have two choices:

  1. Concatenate all strings into one big string, and send the big string to the kernel. This requires more memory and involves copying data, but only involves one call to the kernel. A kernel call tends to be much more expensive than a concatenation operation unless you’re working with a lot of data.
  2. Send each string individually to the kernel. You don’t need as much memory but you need a lot of expensive kernel calls.

Normal I/O

In a similar fashion, if you want to read some data from a file descriptor but you want different parts of the data to end up in different memory buffers, then you either have to read() the data into a big buffer and copy each parts to the individual buffers, or you have to read() each part individually into its own buffer.

With scatter-gather I/O you can pass an array of memory buffers to the kernel. This way you can tell the kernel to write multiple buffers to a file descriptor, as if they form a single contiguous buffer, but with only one kernel call. Similarly you can tell the kernel to put different parts of the read data into different buffers. On Unix systems this is done through the readv and writev() system calls. In Phusion Passenger 3.2 we use the latter system call extensively.

Unfortunately writev() has many quirks. Typical implementations cannot handle more than IOVEC_MAX buffers per call where IOVEC_MAX is a constant with an arbitrary number. On some implementations the call will fail if the limit is surpassed, but on Linux/glibc it will quietly concatenate everything into a big buffer for you! Neither are desirable properties in Phusion Passenger, but we at Phusion care about stability so we have written extensive code to take care of this issue.

Environment variable passing

Properly passing environment variables in Phusion Passenger 3 for Apache is quite a pain. People expect SetEnv to just work, but in practice it doesn’t because of various implementation details in Phusion Passenger. To pass things like LD_LIBRARY_PATH you had to write a wrapper script. Passing PATH works but only if you’re not expecting it to affect the search path for the Ruby interpreter itself.

In 3.2 we fully support passing environment variables with SetEnv, and it works as expected. This is actually a side effect of supporting multiple Ruby versions, but it works out rather nicely.

Less memory usage

Phusion Passenger has always been lightweight when it comes to memory usage, but in 3.2 we’ve reduced it even further. More parts have been moved from Ruby into C++. The Ruby component now only load code that’s absolutely necessary. The result is massive memory savings:

  • When using the smart spawn method, the Preloader process (formerly called the ApplicationSpawner process) now uses 300 KB less memory.
  • When using the direct spawn method (the new name for the conservative spawn method), the request handler now uses 500 KB less memory per application worker process.
  • The Ruby stack, as is reachable at the Rack application object’s starting point, has been reduced from about 10 levels to only 2 levels. This results in at least 8 KB of reduced stack size. If your application is multithreaded and you’re still on Ruby 1.8 then you should see faster thread context switching performance.

Memory measurements are done on OS X Lion. Your mileage may vary.

Release date

So people have been asking us when 3.2 will be released. We want to make sure that the 3.2 release is a rock-solid one, as we always do with our products. But we need your help. Please download the 3.2 pre-release code from Github and play with it and report any bugs you find. Or better yet: send us a patch. 🙂 It’s getting more and more stable by the day but you can make the process even faster.