Phusion Passenger 4.0 beta 1 is here
Phusion Passenger is an Apache and Nginx module for deploying Ruby and Python web applications. It has a strong focus on ease of use, stability and performance. Phusion Passenger is built on top of tried-and-true, battle-hardened Unix technologies, yet at the same time introduces innovations not found in most traditional Unix servers. Since mid-2012, it aims to be the ultimate polyglot application server.
The 3.0 series brought forth many architectural and feature improvements and has lasted us for a long time, but it’s finally time to move on to the next big thing. We are proud to introduce the first beta of the Phusion Passenger 4.0 series which introduces exciting new features, much improves scalability and concurrency, lifts many old limitations and dramatically improves the internal architecture. We blogged extensively about 4.0 in the past (see part 1 and part 2, as well as these blog posts).
At the same time, we’ve also released Phusion Passenger Enterprise 4.0 beta 1, which contains all these changes. Enterprise customers can download it from the Customer Area.
What’s new?
For your convenience, we’ve compiled the full list of changes in 4.0. It should be noted that these changes are only the beginning. The new architecture opens the door for many exciting future improvements, which we will blog about in the near future.
Multiple Ruby versions
You can now run multiple Ruby versions at the same time in the same Phusion Passenger instance. This allows you to run some apps in Ruby 1.8 and some apps in Ruby 1.9, for example.
In Phusion Passenger tradition, using this feature is a breeze. We’ve now made the PassengerRuby
/passenger_ruby
option per-vhost instead global so you can set a different value per application. Phusion Passenger then takes care of the rest.
Evented I/O internally
Phusion Passenger’s I/O handler has been completely rewritten and is now evented, just like Nginx and Node.js. Evented I/O brings forth many scalability benefits. Phusion Passenger can now handle a virtually unlimited number of connections (high I/O concurrency support), limited only by system resources such as memory, CPU and OS file descriptor limits.
Why is being evented and being able to support a high I/O concurrency such a big deal? There are several reasons.
- Lifting request queuing limits.
Previously, Phusion Passenger’s internal I/O was multithreaded. On Apache, Phusion Passenger would have as many I/O threads as there were Apache processes/threads. On Nginx, Phusion Passenger would have
4 * passenger_max_pool_size
I/O threads.Suppose that an application is handling a long-running request, but that application is temporarily receiving a lot of traffic. Phusion Passenger will then queue all those requests and will wait for the application to become available to handle more requests. It only takes a small number of requests to be queued up before Phusion Passenger runs out of I/O concurrency, meaning that Phusion Passenger cannot handle requests for other applications even when they’re available.
The problem is shown schematically in the following figure.
4 clients (denoted A) are sending long-running requests to the server. The kernel dispatches these requests over the 4 I/O threads, which forward the request to the application process foo.com. The server only has 4 I/O threads so it has now run out of I/O concurrency. In the mean time, another client (denoted B) sends another request, meant for bar.com. The application process bar.com is obviously available for work, but it cannot be reached because the I/O threads are still busy.
Evented I/O lifts this limit completely. There are no I/O threads and each request can be handled immediately as soon as the corresponding application process becomes available.
-
Lower virtual memory usage.
A thread uses a fixed amount of virtual memory at minimum thanks to its stack. We set a custom thread stack size of 256 KB. With evented I/O, there is only 1 I/O thread, so the overhead is even lower.
-
Real-time response buffering support.
The real-time response buffering feature, which this article describes later, depends on the ability to support high I/O concurrency.
-
Support for applications that block a lot on external I/O.
If the application blocks on a lot of external I/O, e.g. if it performs a lot of HTTP API calls, then the I/O core must either be evented or heavily multithreaded. The “Multithreading within Ruby apps” subsection explains this in detail.
-
Non-HTTP protocol support.
Evented I/O allows us to support non-HTTP protocols in the future, e.g. WebSockets.
It should be noted that to fully enjoy the benefits of evented I/O, your web server must also be able to support high I/O concurrency. Nginx and Phusion Passenger Standalone already support this by default. On Apache you may have to increase your number of processes or worker threads. If you really need a lot of I/O concurrency then we recommend you to use the worker MPM.
It should also be noted that only Phusion Passenger itself has become evented. Phusion Passenger still hosts applications as multiple single-threaded processes. However Phusion Passenger Enterprise 4 is also able to host applications as multi-threaded processes; read on for more!
Real-time response buffering
Many web applications depend on the web server to buffer the output in order to protect themselves from slow client. In multi-process architectures, you really don’t want to block the application while output is being sent to the client, because the client can take an arbitrary amount of time to receive the response. Unicorn for example is completely designed around the architecture of letting the web server take care of slow clients.
However this setup has some limitations. Web servers traditionally buffer the entire response before sending it to the client. This also means that it’s not easily possible to flush partial response data to the client immediately. Rails 3.2 streaming depends on this ability, which is why Unicorn + Nginx + Rails 3.2 streaming can be problematic.
This becomes even more problematic if you have a long-running request but you want to send progress data to the client periodically. For example, consider the following use case where we fetch 1 million database records, compresses them into an archive file, and emails it to the user. While we’re fetching and creating the archive file we want to tell the user what the progress is.
zip = ZipFile.new("archive.zip")
total = DatabaseRecord.count
counter = 0
DatabaseRecord.each do |record|
zip.append(record.name, record.data)
counter += 1
if counter % 100 == 0
# We report the progress every 100 records.
response.stream.write("Progress: #{counter}/#{total}\n")
end
end
zip.close
response.stream.write("Done!\n")
The above example also depends on the ability to immediately flush data to the client. If the web server buffers the entire response first then the user will not see a smooth progress report.
Phusion Passenger 4 introduces real-time response buffering. Unlike traditional response buffering, Phusion Passenger sends data to the client immediately, while still relieving the application from slow clients. It works by reading the response as quickly as possible from the application, while concurrently sending the response as quickly as possible to the client.
This works even with very large responses: it buffers a limited amount of data in memory, and if there’s more data it will buffer to disk instead. Now you can send multi-megabyte responses without worrying about introducing latency because of buffering. You can now safely use the Rails send_file
method without worrying about it being too slow.
You don’t need to turn real-time response buffering on, it’s enabled by default. Now application developers need never worry about response buffering anymore, it Just Works™ and does the right thing.
Zero-copy architecture
Phusion Passenger 4 has a much more advanced zero-copy architecture than Phusion Passenger 3. In most performance-critical places we now avoid copying data whenever we can. The zero-copy architecture is implemented by using scatter-gather I/O calls instead of traditional I/O calls.
What is scatter-gather I/O? Normally when you have strings from multiple memory addresses, and you want to write them over a file descriptor, you have two choices:
- Concatenate all strings into one big string, and send the big string to the kernel. This requires more memory and involves copying data, but only involves one call to the kernel. A kernel call tends to be much more expensive than a concatenation operation unless you’re working with a lot of data.
- Send each string individually to the kernel. You don’t need as much memory but you need a lot of expensive kernel calls.
In a similar fashion, if you want to read some data from a file descriptor but you want different parts of the data to end up in different memory buffers, then you either have to read()
the data into a big buffer and copy each parts to the individual buffers, or you have to read()
each part individually into its own buffer.
With scatter-gather I/O you can pass an array of memory buffers to the kernel. This way you can tell the kernel to write multiple buffers to a file descriptor, as if they form a single contiguous buffer, but with only one kernel call. Similarly you can tell the kernel to put different parts of the read data into different buffers. On Unix systems this is done through the readv() and writev() system calls. In Phusion Passenger 4 we use the latter system call extensively.
Unfortunately writev()
has many quirks. Typical implementations cannot handle more than IOVEC_MAX
buffers per call where IOVEC_MAX
is a constant with an arbitrary number. On some implementations the call will fail if the limit is surpassed, but on Linux/glibc it will quietly concatenate everything into a big buffer for you! Neither are desirable properties in Phusion Passenger, but we at Phusion care about stability so we have written extensive code to take care of this issue.
Rewritten ApplicationPool and process spawning subsystem
One of the central subsystems in Phusion Passenger is the ApplicationPool, which spawns Ruby application processes when necessary and keeps track of them. It scales the number of processes according to the current traffic and it ensures that the number of processes do not go over your defined resource limits. It’s one of the most complex parts of Phusion Passenger and consists of a lot of carefully written code.
The other large subsystem is the process spawning subsystem (SpawnManager and friends), which takes care of the details of process spawning. This subsystem is what implements smart spawning (similar to preload_app true
if you’re familiar with Unicorn).
The old ApplicationPool and process spawning subsystem has lasted for a long time, and they were not without issues. The ApplicationPool had a large lock that would be held whenever the first process for an application is being spawned. Only after the first application process has been spawned can subsequent processes be spawned in the background. While the lock is held, Phusion Passenger is unable to handle any requests.
The ApplicationPool and process spawning subsystems have been entirely rewritten in Phusion Passenger 4. They no longer have a large lock that must be held for a long time. The design is completely asynchronous. The subsystems are now:
- Faster. Critical code paths are carefully optimized in C++ for performance, and require less thread context switching. Critical parts are now zero-copy.
- More stable. The new spawning subsystem has a lot of error checking code.
- More maintainable. The old subsystems were hard to read and hard to extend. The new subsystems are much more modular and are very well-tested.
- More DRY. A part of the old process spawning subsystem was implemented in Ruby, but this means that a lot of code had to be duplicated between Ruby and Python support code. Now that the majority has been moved into C++, the Ruby and Python launchers are extremely lightweight.
- Less memory hungry, thanks to the DRYness.
- When using the smart spawn method, the Preloader process (formerly called the ApplicationSpawner process) now uses 300 KB less memory.
- When using the direct spawn method (the new name for the conservative spawn method), the request handler now uses 500 KB less memory per application worker process.
- The Ruby stack, as is reachable at the Rack application object’s starting point, has been reduced from about 10 levels to only 2 levels. This results in at least 8 KB of reduced stack size. If your application is multithreaded and you’re still on Ruby 1.8 then you should see faster thread context switching performance.
Memory measurements are done on OS X Lion. Your mileage may vary.
Multithreading within Ruby apps (Phusion Passenger Enterprise only)
In Why Rails 4 Live Streaming is a big deal, we explained that single-threaded pure multiprocessing is not a good I/O model for supporting high concurrency I/O uses, such as apps that make a lot of HTTP API calls.
We used to be big proponents of single-threaded pure multiprocessing when it comes to the context of web apps. Multiprocessing solved a lot of threading-related problems, had wide support in the Ruby ecosystem, and memory savings could be achieved through the use of copy-on-write as provided by Ruby Enterprise Edition (our branch of Ruby 1.8 that had a copy-on-write friendly garbage collector), was well suited to typical web app I/O patterns at the time, provided some form of fault tolerance (a crashing/freezing process would not take the entire web app down) and allowed utilizing multiple CPU cored.
However, times and requirements have changed. We believe multiprocessing is no longer sufficient for many of today’s applications. Together with the end-of-life of Ruby Enterprise Edition, we believe that a hybrid between multiprocessing, multithreading and evented is the way forward. The Ruby ecosystem these days has excellent support for multithreading. A limited number of processes together with a larger number of threads, or just a limited number of evented processes, also saves a lot more memory than copy-on-write multiprocessing did.
Phusion Passenger Enterprise 4 is the first step towards this hybrid multiprocessed, multithreaded and evented I/O model. It supports multithreading within Ruby apps. For optimal backward compatibility, the default is still multiprocessing. Enabling multithreading support is a breeze, requiring only 2 configuration options:
PassengerConcurrencyModel thread
PassengerThreadCount 32
Python WSGI support lifted to “beta” status
It is a little known fact that we have supported Python WSGI since mid-2008. However, WSGI support remained “proof of concept”. We are now lifting WSGI support to “beta” status, which means that we make an effort to make it work and that there’s documentation available. This is the first step towards our goal of becoming a polyglot application server.
WSGI support in Phusion Passenger 3 and earlier required Ruby, but in version 4 this requirement has been removed because a lot of functionality has been migrated from Ruby to C++. Phusion Passenger now only requires Ruby for a limited number of things:
- The build system.
- Ruby application support.
- The
PassengerPreStart
feature.
More protection against stuck processes
Phusion Passenger 4 offers more protection against stuck application processes. The following cases are covered:
- During web server shutdown. Previously, when the web server was being shut down, Phusion Passenger would only gracefully ask application processes to shut down. This didn’t work for stuck processes. Phusion Passenger 4 forcefully cleans up all processes when the web server is shut down. This makes everything much more reliable in case there are problems.
- During spawning of new application processes.
Additionally, Phusion Passenger Enterprise also offers protection against processes that are stuck during a request.
Automatically picks up environment variables from your bashrc
Setting environment variables has traditionally been a huge usability problem with Phusion Passenger. Many users expect that environment variable settings in their bashrc would affect Phusion Passenger. It does not, because the web server tends to be started in a completely different environment, not invoked from bash. Setting environment variables required special instructions that were not so obvious to people who are inexperienced with Unix.
Although this usability problem is not our fault, we consider it to be our problem and we feel that addressing it is in line with the Phusion Passenger philosophy. Phusion Passenger 4 therefore automatically picks up environment variables, umasks, ulimits or whatever other settings from your bashrc! This is implemented by starting all application processes through “bash -li”, but obviously only if the user that the application runs under has bash set as his/her shell.
Setting environment variables directly in Apache
You can now set environment variables directly in Apache using PassEnv
and SetEnv
. Previously, this would not always work as expected because application processes are forked off from a spawn manager process that was written in Ruby. Spawning application processes did not involve an exec()
, so setting environment variables like LD_LIBRARY_PATH
would have no effect.
In Phusion Passenger 4, application processes are started in a different way, so PassEnv
and SetEnv
now work as expected.
Better error messages and error diagnostics
With Phusion Passenger we’ve always tried to make our error logs as clear and useful as possible. Phusion Passenger 4 marks a new milestone in this effort. Now whenever Phusion Passenger’s C++ code crashes for whatever reason, it will print a more detailed crash report to the error log of the web server, including a backtrace and a bunch of information about the environment that is relevant to the crash.
This feature makes sure that whenever Phusion Passenger crashes, be it due to an operating system error, or a problem with Passenger itself, you are equipped with the knowledge you need to deal with the issue quickly and effectively. The crash handler behavior is configurable.
The error page, which is shown in your browser when your application fails to start, has been much improved. It now contains detailed information about your environment, such as the Ruby version, environment variables, user and group ID, ulimits, etc. Deployment problems now become easier to debug than ever.
Automatic asset pipeline support in Standalone
The Rails guides recommend that files generated by the asset pipeline should have a far-future expiration date and that they should max out any caching headers. But configuring these options in a fully correct way can be quite tedious. Phusion Passenger Standalone alleviates this problem by automatically configuring Nginx with the right settings.
Deleting restart.txt no longer triggers a restart
When restart.txt no longer exists, Phusion Passenger would consider this to be a sign to restart the application. This interfered with some Capistrano deployments, so deleting restart.txt will no longer trigger a restart.
Installation & documentation
You can install the open source version of 4.0 beta 1 with the following commands:
gem install passenger --pre
passenger-install-apache2-module
passenger-install-nginx-module
You can also download the tarball at Google Code.
Phusion Passenger Enterprise users can download the Enterprise version of 4.0 beta 1 from the Customer Area.
Documentation is available under the doc/ directory in the source tree. You can view them online here:
Note that these are temporary links for the beta documentation, and they will eventually be removed. Please don’t link to them.
And because this is a beta, not everything may work correctly. Please submit bug reports to our bug tracker.
Final
The open source Phusion Passenger is provided to the community for free. We also provide an Enterprise version which comes with a wide array of additional features. Development of the open source version is directly sponsored by Enterprise sales. If you like Phusion Passenger, or if you would like to use the Enterprise features, please consider buying one or more licenses. Thank you!