Phusion white papers Phusion overview

Phusion Blog

The Road to Passenger 3: Technology Preview 1 – Performance

By Hongli Lai on June 10th, 2010

It has already been two years since we’ve first released Phusion Passenger. Time sure flies and we’ve come a long way since then. We were the first to implement a working Ruby web app deployment solution that integrates seamlessly in the web server, and all the features that we’ve developed over time – smart spawning and memory reduction, upload buffering, Nginx support, etc – have served us for a long time. Nevertheless, it is time to say goodbye to the old Phusion Passenger 2.2 codebase. In the past we had focused primarily on three things:

  • Ease of use.
  • Stability.
  • Robustness.

Notice that “performance” is not on the above list. We strived to make Phusion Passenger “fast enough”, e.g. not ridiculously slower than the alternatives. Lately it would appear that competitors are once again focusing on performance. We can of course not afford to stay behind. We’ve been working on Phusion Passenger 3 for a while now. Today we will begin unveiling the technology behind this new major Phusion Passenger version. This blog post is the first of the multiple technology previews to come.

The performance test

It’s not very useful to benchmark Phusion Passenger performance using a Rails application because most of the time is spent in Rails and the application itself. Therefore we’ll benchmark with a simple Rack application. Consider the following hello world Rack application:

app = proc do |env|
  [200, { "Content-Type" => "text/html" }, ["hello world\n"]]

run app

How fast does this run on Phusion Passenger 2.2?

  • Operating system: OS X Snow Leopard
  • ab -c 25 -n 10000 http://rack.test/, pool size 25
  • Apache: 1628 req/sec
  • Nginx: 1843 req/sec

Now let’s look at Phusion Passenger 3:

  • Operating system: OS X Snow Leopard
  • Apache: 2225 req/sec; 36% faster
  • Nginx: 2864 req/sec; 55% faster

That’s right, the Nginx version is over 50% faster than 2.2 on OS X!

A graph is worth more than a thousand words

Suffice it to say, even though Phusion Passenger was already pretty fast, we believe we’ve created some pretty significant improvements in terms of performance and it will be interesting to see how the final version of Phusion Passenger 3 will stack up against the competition. Needless to say, we’ve performed our own benchmarks already and have concluded that “the self-proclaimed fastest deployment solution” really isn’t the fastest deployment solution compared to Phusion Passenger 3. 😉 That said, benchmarks are lies, lies, lies, damn lies of course and your mileage may definitely vary so we will encourage you to perform any kind of benchmark you’d like when we release 3. For us, the most important issue still lies in the trade off of how much time you have to spend actually maintaining your setup, but as the graphs indicate, we’ve made some pretty monstrous improvements to performance as well.

How did we do it?

When it comes to optimizing software, there’s the saying that 20% of the code is responsible for 80% of the time. Not so with Phusion Passenger: we’ve found that there were no obvious performance bottlenecks. Even profilers turned out to be totally useless because all the times are so small and so close to each other. Phusion Passenger was already pretty fast.

Instead, we optimized the hard way: with lots and lots of micro-optimizations. 2% here, 3% there, etc etc. In other words, blood, sweat, tears and lots of sleepless nights. The optimizations can be summed up as follows:

Reducing system calls
System calls are pretty expensive compared to userspace computation. They require a context switch to the kernel. For example, all I/O operations (read(), write()) are system calls. We’ve performed an extensive code inspection and removed and coalesced a lot of redundant system calls.
The beginning of a zero-copy I/O architecture
The CPU is very fast nowadays. In fact it is so fast that RAM speed cannot keep up with the CPU. This makes memory access very expensive. In case of I/O intensive applications such as web servers, one would benefit from copying I/O data as little as possible. In order to optimize memory access, we’ve implemented the beginning of a zero-copy I/O architecture. This architecture covers both the C++ and the Ruby parts of Phusion Passenger.
Less Ruby garbage production
The garbage collector in Ruby can be a significant bottleneck. We’ve heavily optimized the Ruby part of the request handler and reduced creation of Ruby objects to a minimum. This made the request handler significantly faster in our tests.
Optimizing algorithms and optimizing Ruby code in C
Some algorithms have been optimized, e.g. some O(N) algorithms have been replaced by O(log N) or O(1) algorithms. Some key Ruby code has been replaced by C code. The former didn’t give us a lot of performance because all the O(N) algorithms weren’t doing a lot of work in the first place, but the latter gave us a much more noticeable boost.
Reducing context switches
Phusion Passenger is heavily multithreaded and consists of multiple threads and service processes. However, some communication between threads and processes required round trips, which caused more context switches than necessary. We’ve optimized our internal protocols and reduced context switching to a minimum.

The future

As stated in this blog post, this is just a glimpse of what we’ve got in store for you and as you’ve come to expect from us, we want to make sure that our findings will hold up in real life scenarios as well. With close to two years of field testing with Phusion Passenger 2, witnessing some of the most high demanding environments in web hosting of our clients, we’ve been working for the last few months now on forging this experience back into Phusion Passenger 3. Through beta testing in these high-demand Rails environments, we hope to ensure that they will give you the best experience both in an enterprise environment as well as for your personal use. Performance has been touched upon in this blog post, and in the coming period leading up to the release of Phusion Passenger 3, we’ll start to unveil bit by bit what we’ve been tinkering on for the last few months. In particular, we’re looking forward how the zero-copy I/O architecture will unfold in a real life scenario as well as the optimizations we’ve performed over the last months. Even though we’re not done yet in terms of optimizing, we will likely hit a ceiling at some point where optimizations will get harder and harder and this in particular is true if you want to retain features such as ease of use that define Phusion Passenger. One thing is for sure, we want this release to be nothing less than stunning so we encourage you to submit your wish list to us as well. We’ve likely implemented a lot of them already, but we just want to make sure that we’re not missing anything.

  • Wow! Way to go guys. I’m looking forward to hearing more about Passenger 3.

  • It’s prefect! My production servers are already powered by Passenger, and no reasons to change it!

  • Amazing! This is just too much joy to handle haha. With Ruby on Rails 3 giving a refreshing development environment, focussed on making things simpler, modular and more memory efficient/performant.. Now you guys come with Phusion Passenger 3 to improve performance to THAT extent, yet keeping it simple and maintainable.

    All this really makes me so glad that I joined this community rather than stick with the previous ones. This is pure satisfaction, and apparently, it never stops!

    Thanks so much for putting all your effort into this, can’t wait to read any updates!

  • Pingback: The Road to Passenger 3: Performance » News, Hacker, View, Comments » Adjoozey()

  • Awesome! Keep up the great work!

  • Brilliant news, I seriously cannot wait for this to be released.

  • Great job, great news

  • Thanks so much for Passenger. It brings a lot of value to the ruby world.
    The ease of use is greatly appreciated. Can’t wait to see Passenger 3!

  • sabat

    Competitors? You have competitors? I’m not sure if I was ignorant, or whether there was no point in paying attention anyway. Nice going, guys.

  • It’s really great. Rails 3 + Passenger 3 looks like a nice and big step towards better performance.

  • Kevin
  • will

    Fantastic work, very much looking forward to the new release

  • This is great! Thanks a lot!

  • You guys are relentless, which is great for the whole community. I hope more people pay attention to you guys because Passenger + REE already rocked, and now this new Passenger 3 is even more amazing. Thank you all for the hard work.

  • I’d like to see some real details here.

    What MPM model was apache running? How was it tuned? What plugins?

    How does it compare to unicorn?

  • Number 1 on my wish list for Passenger 3 is a final fix for issue 435

  • Steve: you may want to check your load balancer or routers on the way. We haven’t found anything in Phusion Passenger so far that could cause the EPIPE thing; on the other hand the cases that we were able to solve have all been related to broken load balancers/routers/proxies and that kind of stuff.

  • Can’t wait for this, for those of us that have spent countless hours optimizing our applications this will put the icing on the cake 🙂

  • @Hongli – still suffering the problem that’s in 378 (which was merged into 435) – see the output from Jan 27 My setup is Ubuntu + latest Nginx/Passenger + Ruby 1.9 + Rails 2.3. No interesting load balancers, proxies etc. In short, once that second spawner appears, my app locks up. Happens about once a day. Since I saw a comment that says the spawner has been rewritten in Passenger 3, I’m hoping it will just go away.

  • @hongli – Phusion Passenger 2 is already pretty feature complete, and PP3 sounds great. Any further optimizations in REE, and Rails 3’s huge leaps in performance, will make PP3 an even better solution for Rails app deployment that it already is. Keep up the good work.

    One thing that would be nice to see in PP3 though, is a “PassengerMaxInstances” settings for individual vhosts, which controls the amount of instances each site can spawn up. PP2 only has one for global, which means I might have two sites, one only needing two workers, one needing 10, but I can’t tell Passenger that.

  • seydar

    Nice article. Would you be able to publish more statistics next time, such as the standard deviation, for instance?

  • Erik Dahlstrand

    Exciting news! Keep up the good work!

  • Good news!!! Will wait for release 🙂

  • I’d be pleased to learn more about Zero Copy I/O feature.

    Anyway you guys at phusion rocks.

  • Are you guys hiring? 😉

    This is pretty darn awesome. Looking forward @ your next post.

  • Hugo

    What about REE for Ruby 1.9? Is it possible?

  • Awesome,

    That’s why I love passenger.

  • Pingback: The Road to Passenger 3()

  • Pingback: Phusion Passenger 3.0.0 public beta 1 is out! – Phusion Corporate Blog()

  • So that’s how much increase in a rack app I would see, but is the difference actually noticeable with a sizeable rails app, at all, or immediately eclipsed by rails’ sizeable overhead?

  • That depends on the app. You should always do your performance benchmarks. Don’t rely on others’.

  • amay82

    Really cool work, thank you. Unfortunately, I can’t use it because of issue #563 which is a blocker for all users who use page caching