Phusion white papers Phusion overview

Phusion Blog

How to fix the Ruby 1.9 HTTPS/Bundler segmentation fault on OS X Lion

By Hongli Lai on May 9th, 2012

If you’ve installed a gem bundle on OS X Lion the past few weeks then you may have seen the dreaded “[BUG] Segmentation fault” error, where Ruby sees to crash in the connect C function in http.rb. Upgrading to the latest Ruby 1.9.3 version (p194) doesn’t seem to help. Luckily someone has found a solution for this problem.

It turns out the segmentation fault is caused by an incompatibility between MacPort’s OpenSSL and RVM. MacPorts installs everything to /opt/local but RVM does not look for OpenSSL in /opt/local. We solved the problem by reinstalling Ruby 1.9.3 with the MacPorts OpenSSL, as follows.

First edit $HOME/.rvmrc and add:

export CFLAGS="-O2 -arch x86_64"
export LDFLAGS="-L/opt/local/lib"
export CPPFLAGS="-I/opt/local/include"

Then run:

sudo port install libyaml
rvm reinstall ruby-1.9.3 --with-openssl-dir=/opt/local --with-opt-dir=/opt/local

Bundler and public applications

By Hongli Lai on January 19th, 2012

I think Bundler is a great tool. Its strength lies not in its ability to install all the gems that you’ve specified, but in automatically figuring out a correct dependency graph so that nothing conflicts with each other, and in the fact that it gives you rock-solid guarantees that whatever gems you’re using in development is exactly what you get in production. No more weird gem version conflict errors.

This is awesome for most Ruby web apps that are meant to be used internally, e.g. things like Twitter, Basecamp, Union Station. Unfortunately, this strength also turns in a kind of weakness when it comes to public apps like Redmine and Juvia. These apps typically allow the user to choose their database driver through config/database.yml. However the driver must also be specified inside Gemfile, otherwise the app cannot load it. The result is that the user has to edit both database.yml and Gemfile, which introduces the following problems:

  • The user may not necessarily be a Ruby programmer. The Gemfile will confuse him.
  • The user is not able to use the Gemfile.lock that the developer has provided. This makes installing in deployment mode with the developer-provided Gemfile.lock impossible.

This can be worked around in a very messy form with groups. For example:

group :driver_sqlite do
  gem 'sqlite3'
end

group :driver_mysql do
  gem 'msyql'
end

group :driver_postgresql do
  gem 'pg'
end

And then, if the user chose to use MySQL:

bundle install --without='driver_postgresql driver_sqlite'

This is messy because you have to exclude all the things you don’t want. If the app supports 10 database drivers then the user has to put 9 drivers on the exclusion list.

How can we make this better? I propose supporting conditionals in the Gemfile language. For example:

condition :driver => 'sqlite' do
  gem 'sqlite3'
end

condition :driver => 'mysql' do
  gem 'mysql'
end

condition :driver => 'postgresql' do
  gem 'pg'
end

condition :driver => ['mysql', 'sqlite'] do
  gem 'foobar'
end

The following command would install the mysql and the foobar gems:

bundle install --condition driver=mysql

Bundler should enforce that the driver condition is set: if it’s not set then it should raise an error. To allow for the driver condition to not be set, the developer must explicitly define that the condition may be nil:

condition :driver => nil do
  gem 'null-database-driver'
end

Here, bundle install will install null-database-driver.

With this proposal, user installation instructions can be reduced to these steps:

  1. Edit database.yml and specify a driver.
  2. Run bundle install --condition driver=(driver name)

I’ve opened a ticket for this proposal. What do you think?

Making Ruby threadable: properly handling context switching in native extensions

By Hongli Lai on June 10th, 2010

In the previous article Does Rails Performance Need an Overhaul? we had discussed the fact that proper Ruby threading is hindered by various broken native extensions. Writing a native extension for Ruby is pretty easy, however writing it right can not only be difficult, but can also be an obscure practice that requires l33t sk1llz because of the lack of documentation in this area. We’ve written several native extensions so far and in the process of figuring out how to make threading-friendly native extensions we had to wade through tons of Ruby source code. In this article I want to teach some best practices in the writing of threading-friendly native extensions.

Threading basics

As discussed in the previous article, Ruby 1.8 implements userspace threads, meaning that no matter how many Ruby threads you have, only one can run at a time, and only on a single CPU core. The threads are scheduled by Ruby itself, not by the operating system.

Ruby 1.9 implements native operating system threads. However it has a global interpreter lock which must be locked when the thread is running Ruby code. This effectively makes Ruby 1.9 single threaded most of the time.

With both Ruby 1.8 and 1.9 threads, system calls such as I/O operations can block the thread and preventing Ruby from context switching to another. Thus, system calls require special attention. Expensive calculations that do not involve system calls can also block the thread, but something can be done those as well, as you will read later on.

Handling I/O

Suppose that you have a file descriptor on which you want to perform some potentially blocking I/O. The naive approach is to perform the I/O command anyway and risk blocking the entire Ruby process. This is exactly what makes the mysql extension thread-unfriendly: while waiting on MySQL no other threads can run, grinding your multi-threaded Rails web app to a halt.

However there are a number of functions in your arsenal that you can use to combat this problem. And as a general rule, you should set to file descriptors to non-blocking mode.

rb_thread_wait_fd(fd)

Just before performing a blocking read, you should call rb_thread_wait_fd() on the file descriptor that you’re reading from. On 1.8, this function marks the current thread as waiting for readable data on this file descriptor and then invokes the scheduler. The scheduler uses the select() system call to check which file descriptors are readable and then selects a thread which may continue. If the file descriptor that you were waiting on is not readable, then your thread will be suspended until the next time the scheduler is invoked and selects your thread. But even if the file descriptor is immediately readable, the scheduler does not guarantee that your thread will be selected immediately.

On 1.9, rb_thread_wait_fd() simply unlocks the global interpreter lock, calls the select() system call on the given file descriptor, and re-acquires the global interpreter lock when select() returns. While select() is blocking, other threads can run.

As an optimization, if only the main thread exists then this function does nothing. This applies to both 1.8 and 1.9.

rb_thread_fd_writable(fd)

This works the same as rb_thread_fd(), but waits until the given file descriptor becomes writable. The single-thread optimization applies here too. You should call rb_thread_fd_writable() just before you perform a write I/O operation.

rb_thread_select()

To wait on multiple file descriptors, use this function instead of select() or poll(). Unlike the native system calls, this function will take care of invoking the scheduler or unlocking the global interpreter lock. Unlike rb_thread_wait_fd() and rb_thread_fd_writable(), there is no do-nothing-when-there’s-only-one-thread optimization here so it will always invoke the scheduler and call select().

rb_io_wait_readable()

I/O system calls can return a variety of error codes that indicate that you should restart the system call, such as EINTR (system call interrupted by signal) and EAGAIN (the file descriptor is set to non-blocking mode and the data is not yet available). You should therefore always call I/O system calls in a loop until it returns success or a different error code. You must however not forget to call rb_thread_wait_fd() or rb_thread_select() before you restart the system call, or you will risk blocking the thread again.

Ruby provides a function rb_io_wait_readable() to aid you in writing restart code. This function should be called right after your I/O reading system call has returned. It checks whether the system call should be restarted (returning Qtrue) or whether you should report an error (returning Qfalse). Here’s a code example:

int done = 0;
int ret;

/* Have the Ruby scheduler suspend this thread until the file descriptor becomes
 * readable; or if this is the only thread in the system, rb_thread_wait_fd() does
 * nothing and we immediately continue to the 'do' loop.
 */
rb_thread_wait_fd(fd);
do {
    /* Actually you should surround your system call with some more code, but
     * we'll get to this later. This example code is only partial. */
    ret = ...your read system call here...
    if (ret == -1) {
        if (rb_io_wait_readable(fd) == Qfalse) {
            ...throw an exception here...
        } /* else restart loop */
    } else {
        done = 1;
    }
} while (!done);

rb_io_wait_readable() checks whether errno equals EINTR or ERESTART, in which case it will call rb_thread_wait_for() on the file descriptor and return Qtrue. If errno is EAGAIN or EWOULDBLOCK then it calls rb_thread_select() on the file descriptor and returns true. Otherwise it returns false.

The difference between calling rb_thread_wait_for() and rb_thread_select() here is subtle, but important. The former only blocks (calls select() on the file descriptor) when there are multiple Ruby threads in the Ruby process, while the latter always blocks no matter what. This behavior is important because EAGAIN and EWOULDBLOCK occur when a non-blocking file descriptor is not yet readable; if we don’t block here on a select() then the code will enter a 100% CPU busy loop.

rb_io_wait_writable()

Works the same way as rb_io_wait_readable(). Use this for I/O write operations instead.

Sleeping

Use rb_thread_wait_for() instead of sleep() or usleep(). On 1.8 rb_thread_wait_for() marks the current thread as sleeping for a period of time and then invokes the scheduler, which does not select this thread until the period of time has expired. On 1.9 Ruby unlocks the global interpreter lock, calls some sleeping function, and then re-locks it after that function returns.

Other non-I/O blocking system calls

Sometimes you will want to wait on a blocking system call that isn’t related to I/O, such as waitpid(). There are several ways to deal with these kind of system calls.

Blocking outside the global interpreter lock

This method only works on Ruby 1.9. Unlock the global interpreter lock, do your thing, then re-locks it. Dealing with the global interpreter lock will be discussed later.

Non-blocking polling

Some system calls have non-blocking equivalents which return a certain error instead of blocking. For example waitpid() blocks by default, but it can be set to non-blocking by passing the WNOHANG flag, which causes it to return immediately with an error instead of blocking. You must call the non-blocking version in a loop. Upon detecting a blocking error, you must call rb_thread_polling(). On 1.8 this function lets the scheduler put the current thread to sleep for 60 msec, on 1.9 for 100 msec.

For example, Ruby’s Process#waitpid function does not block other threads. On 1.9 it simply unlocks the global interpreter lock while blocking on waitpid(). On 1.8 it is implemented as follows (simplified version):

retry:

int result = waitpid(..., WNOHANG);
if (result < 0) {
    if (errno == EINTR) {
        /* Process isn't ready yet. Tell the scheduler and then restart the call. */
        rb_thread_polling();
        goto retry;
    } else {
        ...throw exception...
    }
}

The actual code is actually more optimized than this. For example if there's only a single thread in the system then it calls waitpid() without WNOHANG and just have it block.

Calling the system call in a native OS thread and use I/O to report results

This is probably the most complex way but on 1.8 sometimes you don't have any choice. On 1.9 you should always prefer unlocking the global interpreter lock over this method.

Create a pipe, then spawn a native OS thread which calls the system call. When the system call is done, have your native thread report the result back via the pipe. On the Ruby side, use rb_thread_wait_fd() and friends to block on the pipe and then receive the results. Be sure to join the thread after you've read the result because rb_thread_wait_fd() does not necessarily block until there is data, so when rb_thread_wait_fd() returns it is not guaranteed that the thread has returned yet.

Another thing to watch out for is that your thread must not refer to data that's on the Ruby thread's stack. This is because Ruby overwrites the main OS thread's C stack upon context switching to another Ruby thread. For example code like this is not OK:

static void thread_main(int *value) {
    /* 'value' here refers to the 'value' variable on foobar's stack, but
     * that data is overwritten when Ruby context switches, so we
     * really can't use 'value' here!
     */
}

/* Native extension Ruby method. */
static void foobar() {
    int value = 1234;
    
    thread_t thread = create_a_thread(thread_main, &value);
    ...do something which can cause a Ruby thread context switch...
    join_thread(thread);
}

To pass data to the thread, you should put the data on the heap instead of the stack. This is OK:

typedef struct {
    ...
} Data;

static void thread_main(Data *data) {
    /* 'data' is safe to access. */
}

/* Native extension Ruby method. */
static void foobar() {
    Data *data = malloc(sizeof(Data));
    thread_t thread = create_a_thread(thread_main, data);
    ...do something which can cause a Ruby thread context switch...
    join_thread(thread);
    free(data);
}

Heavy CPU computations

Not only blocking system calls can block other threads, CPU-heavy computation code can also do that. While executing non-Ruby-API C code, context switching to other threads is not possible. Calls to Ruby APIs may sometimes cause context switching. However there are several ways to make context switching possible while running CPU-heavy computations.

Unlocking the global interpreter lock

This only works on 1.9. Unlock the global interpreter lock and then call the computation code, and relock when done. Consider BCrypt-Ruby as an example. BCrypt is a very heavy hashing algorithm used for securely hashing passwords; depending on the configured cost it could need several minutes to calculate a hash. We've recently patched BCrypt-Ruby to unlock the global interpreter lock while running the BCrypt algorithm, so that when you run BCrypt-Ruby in multiple threads the algorithms can be spread across multiple CPU cores.

However, be aware of the fact that unlocking and relocking the global interpreter lock comes with some overhead as well. Unlocking and relocking the global interpreter lock is only worth it if you know that the computation is going to take a while (say, longer than 50 msec). If the computation time is short then you will actually make your code slower because of all the locking overhead. Therefore BCrypt-Ruby only unlocks the global interpreter lock if the BCrypt cost is set to 9 or higher.

Explicit yielding

You can call rb_thread_schedule() once in a while to force context switching to another thread. However this approach does not allow your code to make use of multiple cores even if you're on 1.9.

Running the C code in a native OS thread

This is pretty much the same approach as described by "Calling the system call in a native OS thread and use I/O to report results". In my opinion, unless your computation takes a very long time, implementing this is almost never worth the trouble. For BCrypt-Ruby we didn't bother: if you want multi-core support in BCrypt-Ruby you need to be on 1.9.

TRAP_BEG/TRAP_END and the global interpreter lock

TRAP_BEG and TRAP_END

On 1.8, you should surround system calls with calls to TRAP_BEG and TRAP_END. TRAP_BEG performs some preparation work. TRAP_END performs a variety of things:

  1. It checks whether there are any pending signals, e.g. whether the user pressed Ctrl-C. If so it will raise an appropriate SignalException.
  2. It also calls the scheduler if a certain amount of time has been spent on the current thread.

On 1.9 TRAP_BEG and TRAP_END are macros that unlock and lock the global interpreter lock. However these macros are deprecated and are likely to disappear in the future so you should not use them on 1.9. Instead, you should use rb_thread_blocking_region().

On 1.9 TRAP_BEG and TRAP_END are defined in ruby/backward/rubysig.h.

rb_thread_blocking_region()

This is a 1.9-specific function which allows you to call a function outside the global interpreter lock. Its declaration is as follows:

rb_thread_blocking_region(rb_blocking_function_t *func, void *data1,
                          rb_unblock_function_t *ubf, void *data2);

func is a pointer to a function that is to be called outside the global interpreter lock. This function must look similar to:

VALUE foobar(void *data)

The data passed via the data1 parameter is passed to the function.

ubf is either RUBY_UBF_IO (indicating that you're performing some kind of I/O operation) or RUBY_UBF_PROCESS (indicating that you're calling some kind of process management system call). However I'm not sure what this parameter exactly does. data2 is supposedly passed to ubf when it's called.

The return value of this function is the return value of func.

Global interpreter lock caveats

Do not call any Ruby API functions while the global interpreter lock is unlocked! No rb_yield(), rb_str_new(), or anything. The entirety of the Ruby API is only safe to call when the global interpreter lock is obtained.

Does Rails Performance Need an Overhaul?

By Hongli Lai on June 9th, 2010

Igvita.com has recently published the article Rails Performance Needs an Overhaul. Rails performance… no, Ruby performance… no Rails scalability… well something is being criticized here. From my experience, talking about scalability and performance can be a bit confusing because the terms can mean different things to different people and/or in different situations, yet the meanings are used interchangeably all the time. In this post I will take a closer look at Igvita’s article.

Performance vs scalability

Let us first define performance and scalability. I define performance as throughput; number of requests per second. I define scalability as the amount of users a system can concurrently handle. There is a correlation between performance and scalability. Higher performance means each request takes less time, and so is more scalable, right? Sometimes yes, but not necessarily. It is entirely possible for a system to be scalable, yet manages to have a lower throughput than a system that’s not as scalable, or for a system to be uber-fast yet not very scalable. Throughout this blog post I will show several examples that highlight the difference.

“Scalability” is an extremely loaded word and people often confuse it with “being able to handle tons and tons of traffic”. Let’s use a different term that better reflects what Igvita’s actually criticizing: concurrency. Igvita claims that concurrency in Ruby is pathetic while referring to database drivers, Ruby application servers, etc. Some practical examples that demonstrate what he means are as follows.

Limited concurrency at the app server level

Mongrel, Phusion Passenger and Unicorn all use a “traditional” multi-process model in which multiple Ruby processes are spawned, each process handling a single request per second. Thus, concurrency is (assuming that the load balancer has infinite concurrency) limited by the number of Ruby processes: having 5 processes allow you to handle 5 users concurrently.

Threaded servers, where the server spawns multiple threads, each handling 1 connection concurrently, allow more concurrency because because it’s possible to spawn a whole lot more threads than processes. In the context of Ruby, each Ruby process needs to load its own copy of the application code and other resources, so memory increases very quickly as you spawn additional processes. Phusion Passenger with Ruby Enterprise Edition solves this problem somewhat by using copy-on-write optimizations which save memory, so you can spawn a bit more processes, but not significantly (as in 10x) more. In contrast, a multi-threaded app server does not need as much memory because all threads share application code with each other so you can comfortably spawn tens or hundreds of threads. At least, this is the theory. I will later explain why this does not necessarily hold for Ruby.

When it comes to performance however, there’s no difference between processes and threads. If you compare a well-written multi-threaded app server with 5 threads to a well-written multi-process app server with 5 processes, you won’t find either being more performant than the other. Context switch overhead between processes and threads are roughly the same. Each process can use a different CPU core, as can each thread, so there’s no difference in multi-core utilization either. This reflects back on the difference between scalability/concurrency and performance.

Multi-process Rails app servers have a concurrency level that can be counted with a single hand, or if you have very beefy hardware, a concurrency level in the range of a couple of tens, thanks to the fact that Rails needs about 25 MB per process. Multi-threaded Rails app servers can in theory spawn a couple of hundred of threads. After that it’s also game over: an operating system thread needs a couple MB of stack space, so after a couple hundreds of threads you’ll run out of virtual memory address on 32-bit systems even if you don’t actually use that much memory.

There is another class of servers, the evented ones. These servers are actually single-threaded, but they use a reactor style I/O dispatch architecture for handling I/O concurrency. Examples include Node.js, Thin (built on EventMachine) and Tornado. These servers can easily have a concurrency level of a couple of thousand. But due to their single-threaded nature they cannot effectively utilize multiple CPU cores, so you need to run a couple of processes, one per CPU core, to fully utilize your CPU.

The limits of Ruby threads

Ruby 1.8 uses userspace threads, not operating system threads. This means that Ruby 1.8 can only utilize a single CPU core no matter how many Ruby threads you create. This is why one typically needs multiple Ruby processes to fully utilize one’s CPU cores. Ruby 1.9 finally uses operating system threads, but it has a global interpreter lock, which means that each time a Ruby 1.9 thread is running it will prevent other Ruby threads from running, effectively making it the same multicore-wise as 1.8. This is also explained in an earlier Igvita article, Concurrency is a Myth in Ruby.

On the bright side, not all is bad. Ruby 1.8 internally uses non-blocking I/O while Ruby 1.9 unlocks the global interpreter lock while doing I/O. So if one Ruby thread is blocked on I/O, another Ruby thread can continue execution. Likewise, Ruby is smart enough to cause things like sleep() and even waitpid() to preempt to other threads.

On the dark side however, Ruby internally uses the select() system call for multiplexing I/O. select() can only handle 1024 file descriptors on most systems so Ruby cannot handle more than this number of sockets per Ruby process, even if you are somehow able to spawn thousands of Ruby threads. EventMachine works around this problem by bypassing Ruby’s I/O code completely.

Naive native extensions and third party libraries

So just run a couple of multi-threaded Ruby processes, one process per core and multiple threads per process, and all is fine and we should be able to have a concurrency level of up to a couple hundred, right? Well not quite, there are a number of issues hindering this approach:

  • Some third party libraries and Rails plugins are not thread-safe. Some aren’t even reentrant. For example Rails < 2.2 suffered from this problem. The app itself might not be thread-safe.
  • Although Ruby is smart enough not to let I/O block all threads, the same cannot be said of all native extensions. The MySQL extension is the most infamous example: when executing queries, other threads cannot run.

Mongrel is actually multi-threaded but in practice everybody uses in multi-process mode (mongrel_cluster) exactly because of these problems. It is also the reason why Phusion Passenger has also gone the multi-process route.

And even though Thin is evented, a typical Ruby web application running on Thin cannot handle thousands of concurrent users. This is because evented servers typically require a special evented programming style, such as the one seen in Node.js and EventMachine. A Ruby web app that is written in an evented style running on Thin can definitely handle a large number of concurrent users.

When is limited application server concurrency actually a problem?

Igvita is clearly disappointed at all all the issues that hinder Ruby web apps from achieving high concurrency. For many web applications I would however argue that limited concurrency is not a problem.

  • Web applications that are slow, as in CPU-heavy, max out CPU resources pretty quickly so increasing concurrency won’t help you.
  • Web applications that are fast are typically quick enough at handling the load so that even large number of users won’t notice the limited concurrency of the server.

Having a concurrency of 5 does not mean not mean that the app server can only handle 5 requests per second; it’s not hard to serve hundreds of requests per second with only a couple of single-threaded processes.

The problem becomes most evident for web applications that have to wait a lot for I/O (besides its own HTTP request/response cycle). Examples include:

  1. Apps that have to spend a lot of time waiting on the database.
  2. Apps that perform a lot of external HTTP calls that respond slowly.
  3. Chat apps. These apps typically have thousands of users, most of them doing nothing most of the time, but they all require a connection (unless your app uses polling, but that’s a whole different discussion).

We at Phusion have developed a number of web applications for clients that fall in the second category, the most recent one being a Hyves gadget. Hyves is the most popular social network in the Netherlands and they get thousands of concurrent visitors during the day. The gadget that we’ve developed has to query external HTTP servers very often, and these servers can take 10 seconds to respond in extreme cases. The servers are running Phusion Passenger with maybe a couple tens of processes. If every request to our gadget also causes us to wait 10 seconds for the external HTTP call then we’d soon run out of concurrency.

But even suppose that our app and Phusion Passenger can have a concurrency level of a couple of thousand, all of those visitors will still have to wait 10 seconds for the external HTTP calls, which is obviously unacceptable. This is another example that illustrates the difference between scalability and performance. We had solved this problem by aggressively caching the results of the HTTP calls, minimizing the number of external HTTP calls that are necessary. The result is that even though the application’s concurrency is fairly limited, it can still comfortably serve many concurrent users with a reasonable response time.

This anecdote should explain why I believe that web apps can get very far despite having a limited concurrency level. That said, as Internet usage continues to increase and websites get more and more users, we may at some time come to a point where much a larger concurrency level is required than most of our current Ruby tools allow us to (assuming server capacity doesn’t scale quickly enough).

What was Igvita.com criticizing?

Igvita.com does not appear to be criticizing Ruby or Rails for being slow. It doesn’t even appear to be criticizing the lack of Ruby tools for achieving high concurrency. It appears to be criticizing these things:

  • Rails and most Ruby web application servers don’t allow high concurrency by default.
  • Many database drivers and libraries hinder concurrency.
  • Although alternatives exist that allow concurrency, you have to go out of your way to find them.
  • There appears to be little motivation in the Ruby community for making the entire stack of web frame work + web app server + database drivers etc scalable by default.

This is in contrast to Node.js where everything is scalable by default.

Do I understand Igvita’s frustration? Absolutely. Do I agree with it? Not entirely. The same thing that makes Node.js so scalable is also what makes it relatively hard to program for. Node.js enforces a callback style of programming and this can eventually make your code look a lot more complicated and harder to read than regular code that uses blocking calls. Furthermore, Node.js is relatively young – of course you won’t find any Node.js libraries that don’t scale! But if people ever use Node.js for things other than high-concurrency servers apps, then non-scalable libraries will at some time pop up. And then you will have to look harder to avoid these libraries. There is no silver bullet.

That said, all would be well if at least the preferred default stack can handle high concurrency by default. This means e.g. fixing the MySQL extension and have the fix published by upstream. The mysqlplus extension fixes this but for some reason their changes aren’t accepted and published by the original author, and so people end up with a multi-thread-killing database driver by default.

Is Node.js innovative? Is Ruby lacking innovation?

A minor gripe that I have with the article is that Igvita calls Node.js innovative while seemingly implying that the Ruby stack isn’t innovating. Evented servers like Node.js actually have been around for years and the evented pattern is well-known long before Ruby or Javascript have become popular. Thin is also evented and predates Node.js by several years. Thin and EventMachine also allow Node.js-style evented programming. The only innovation that Node.js brings, in my opinion, is the fact that it’s Javascript. The other “innovation” is the lack of non-scalable libraries.

Conclusion

Igvita appears to be criticizing something other than Rails performance, as his article’s title would imply.

I don’t think the concurrency levels that the Rails stack provides by default is that bad in practice. But as a fellow programmer, it does intuitively bother me that our laptops, which are a million times more powerful than supercomputers from two decades ago, cannot comfortably handle a couple of thousand concurrent users. We can definitely work towards something better, but in the mean time let’s not forget that the current stack is more than capable of Getting Work Done(tm).

Securely store passwords with bcrypt-ruby; now compatible with JRuby and Ruby 1.9

By Hongli Lai on August 13th, 2009

When writing web applications, or any application for that manner, any passwords should be stored securely. As a rule of thumb, one should never store passwords as clear text in the database for the following reasons:

  • If the database ever gets leaked out, then all accounts are compromised until every single user resets his password. Imagine that you’re an MMORPG developer; leaking out the database with clear text passwords allows the attacker to delete every player’s characters.
  • Many people use the same password for multiple sites. Imagine that the password stored in your database is also used for the user’s online banking account. Even if the database does not get leaked out, the password is still visible to the system administrator; this can be a privacy breach.

There are several “obvious” alternatives, which aren’t quite secure enough:

Storing passwords as MD5/SHA1/$FAVORITE_ALGORITHM hashes
These days MD5 can be brute-force cracked with relatively little effort. SHA1, SHA2 and other algorithms are harder to brute-force, but the attacker can still crack these hashes by using rainbow tables: precomputed tables of hashes with which the attacker can look up the input for a hash with relative ease. This rainbow table does not have to be very large: it just has to contain words from the dictionary, because many people use dictionary words as passwords.

Using plain hashes also makes it possible for an attacker to determine whether two users have the same password.

Encrypting the password
This is not a good idea because if the attacker was able to steal the database, then there’s a possibility that he’s able to steal the key file as well. Plus, the system administrator is able to read everybody’s passwords, unless he’s restricted access to either the key file or the database.

The solution is to store passwords as salted hashes. One calculates a salted hash as follows:

salted_hash = hashing_algorithm(salt + cleartext_password)

Here, salt is a random string. After calculating the salted hash, one should store the salted hash in the database, along with the (cleartext) salt. It is not necessary to keep the salt secret or to obfuscate it.

When a user logs in, one can verify his password by re-computing the salted hash and comparing it with the salted hash in the database:

salted_hash = hashing_algorithm(salt_from_database + user_provided_password)
if (salted_hash == salted_hash_from_database):
    user is logged in
else:
    password incorrect

The usage of the salt forces the attacker to either brute-force the hash or to use a ridiculously large rainbow table. In case of the latter, the sheer size of the required rainbow table can make it unpractical to generate. The larger the salt, the more difficult it becomes for the cracker to use rainbow tables.

However, even with salting, one should still not use SHA1, SHA2, Whirlpool or most other hashing algorithms because these algorithms are designed to be fast. Although brute forcing SHA2 and Whirlpool is hard, it’s still possible given sufficient resources. Instead, one should pick a hashing algorithm that’s designed to be slow so that brute forcing becomes unfeasible. Bcrypt is such a slow hashing algorithm. A speed comparison on a MacBook Pro with 2 Ghz Intel Core 2 Duo:

  • SHA-1: 118600 hashes per second.
  • Bcrypt (with cost = 10): 7.7 hashes per second.

Theoretically it would take 4*10^35 years for a single MacBook Pro core to crack an SHA-1 hash, assuming that the attacker does not harness any weaknesses in SHA-1. To crack a bcrypt hash one would need 6*10^39 years, or 10000 more times. Therefore, we recommend the use of bcrypt to store passwords securely.

There’s even a nice Ruby implementation of this algorithm: bcrypt-ruby! Up until recently, bcrypt-ruby was only available for MRI (“Matz Ruby Interpreter”, the C implementation that most people use). However, we’ve made it compatible with JRuby! The code can be found in our fork at Github. The current version also has issues with Ruby 1.9, which we’ve fixed as well. The author of bcrypt-ruby has already accepted our changes and will soon release a new version with JRuby and Ruby 1.9 support.

Further recommended reading

How to Safely Store a Password by Coda Hale.

Getting ready for Ruby 1.9.1

By Hongli Lai on February 2nd, 2009

We are excited about Ruby 1.9.1. Of course, with all the performance improvements, who wouldn’t be? Unfortunately a large number of Ruby libraries and extensions still don’t work on 1.9.1, so Ruby 1.9 cannot be considered production-ready yet. Ryan Bigg has done an excellent job on documenting most of the problems that one would encounter when trying to get a basic Rails app up-and-running on Ruby 1.9.1. Basically, the problems he countered were:

  • 2.2.2 isn’t compatible with 1.9.1. Use Rails 2.3.0 RC1 or Rails edge.
  • The mysql gem needs patching.
  • The hpricot gem needs patching.
  • The postgres gem needs patching.
  • Thin needs patching.
  • The fastthread gem needs patching.
  • Mongrel needs patching.

But what about Phusion Passenger? Good news:
Phusion Passenger is Ruby 1.9.1-compatible since this commit (today).

Here’s a screenshot of a Rails 2.3.0 app running in Phusion Passenger on Ruby 1.9.1:

passenger-ruby19

Do you see the changes? Me neither. That’s the point. πŸ™‚

We’ve encountered the following issues upon trying to get a simple Rails 2.3 app up running with Phusion Passenger and Ruby 1.9.1:

Fastthread isn’t compatible with 1.9
Both Mongrel and Phusion Passenger depend on Fastthread, which is a threading library that fixes some threading implementation bugs in older versions of Ruby 1.8. Fastthread is only a required dependency when running on older versions of Ruby 1.8. Unfortunately there’s no way to tell RubyGems “we depend on fastthread, but only when running on older versions of Ruby 1.8, and not on JRuby”.

Fastthread doesn’t compile on Ruby 1.9 (or on JRuby or other Ruby implementations for that matter), so when you type “gem install passenger” or “gem install mongrel” on Ruby 1.9, the installation fails with a ton of compile errors.

We’ve patched fastthread so that it becomes a no-op on Ruby 1.9.1 and on JRuby (that is, fastthread will install correctly but it won’t do anything). These patches have been submitted to Mentalguy, the maintainer of fastthread.

The sqlite3-ruby gem doesn’t work on 1.9
Jeremy Kemper submitted a 1.9 compatibility patch in the past, which had been committed. Unfortunately even with this patch, sqlite3-ruby isn’t compatible with 1.9.1.

We’ve gone ahead and fixed 1.9.1 support. The patch can be found here: http://rubyforge.org/tracker/index.php?func=detail&aid=23792&group_id=254&atid=1045

Hongli Lai Ninh Bui

Passing environment variables to Ruby from Phusion Passenger

By Hongli Lai on December 16th, 2008

Update June 4 2013: This article is completely obsolete. In Phusion Passenger 4, using SetEnv and PassEnv in Apache and env in Nginx works as expected. Detailed information can be found in the Phusion Passenger manual, section “About environment variables”.

Phusion Passenger manages Ruby/Rails process automatically. Sometimes it is necessary set environment variables or to pass environment variables to the Ruby interpreter. This particular aspect of Phusion Passenger isn’t very well documented, so it’s time for a blog post.

Environment variables that may be set after Ruby is started

Some environment variables may be set before or after Ruby is started. These include:

PATH
The search path for binaries.
LD_LIBRARY_PATH
The search path for shared libraries.

It really doesn’t matter where these environment variables are set, as long as you set them before you use them. These variables may be set in environment.rb or, if you’re using Apache, using the SetEnv directive.

Setting PATH, LD_LIBRARY_PATH and similar variables

Suppose that your environment.rb runs the program “frobnicate”, and this program is located in /opt/frobnicator/bin, which is not in PATH by default. Furthermore, the “frobnicate” program requires shared libraries which are located in /opt/awesome_runtime/lib. Suppose your environment.rb current looks like this:

...
Rails::Initializer.run do |config|
  ...
end
...
system("frobnicate")    # => ERROR: command not found

Set PATH just before the system() call so that it can find the program:

ENV['PATH'] = "#{ENV['PATH']}:/opt/frobnicator/bin"
system("frobnicate")    # => ERROR: cannot load libawesome_runtime.so

Now set LD_LIBRARY_PATH so that the program can find its libraries:

ENV['PATH'] = "#{ENV['PATH']}:/opt/frobnicator/bin"
ENV['LD_LIBRARY_PATH'] = "#{ENV['LD_LIBRARY_PATH']}:/opt/awesome_runtime/lib"
system("frobnicate")    # => success!

On Apache you can also use the SetEnv directive instead of hardcoding such settings your app:

# Outside any virtual host block:
SetEnv PATH /usr/bin:/usr/local/bin:/bin:/opt/frobnicator/bin
SetEnv LD_LIBRARY_PATH /opt/awesome_runtime/lib

Setting GEM_PATH, the RubyGems search path

If you’re on a shared host (e.g. Dreamhost) or on some other server for which you do not have root privileges, then you have no choice but to install gems to somewhere inside your home folder. You also need to tell RubyGems to look in there, and that’s what GEM_PATH is for.

Suppose that you’ve installed the gem “ruby-frobnicator” into /home/foobar/my_gems. In your environment.rb you must set GEM_PATH and call Gem.clear_paths just before requiring the gem, like this:

...
Rails::Initializer.run do |config|
  ...
end
...
ENV['GEM_PATH'] = "/home/foobar/my_gems:#{ENV['GEM_PATH']}"
Gem.clear_paths
require 'ruby-frobnicator'    # => it works!

Environment variables that must be set before Ruby is started

Some environment variables must be set before Ruby is started because the Ruby interpreter itself uses them. The RailsBench GC settings environment variables, which are now supported by Ruby Enterprise Edition, are examples of such environment variables.

You can set these environment variables by writing a wrapper script. Recall that Phusion Passenger has a “PassengerRuby” configuration option which typically looks like this:

PassengerRuby /usr/bin/ruby

You can point this to a wrapper script:

PassengerRuby /usr/local/my_ruby_wrapper_script

/usr/local/my_ruby_wrapper_script can set the environment variables prior to executing the real Ruby interpreter:

#!/bin/sh
export RUBY_HEAP_MIN_SLOTS=10000
export RUBY_HEAP_SLOTS_INCREMENT=10000
export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1.8
export RUBY_GC_MALLOC_LIMIT=8000000
export RUBY_HEAP_FREE_MIN=4096
exec "/usr/bin/ruby" "$@"

A few notes for those who are not familiar with writing shell scripts:

  • Make sure you make /usr/local/my_ruby_wrapper_script executable with chmod +x.
  • Make sure that you prepend the “export” keyword to all environment variable setter statements.
  • The last line says “replace the current process with /usr/bin/ruby, and pass all commandline argument that I’ve received to Ruby”. Make sure that $@ is wrapped inside double quotes, otherwise filenames with spaces in them won’t be passed correctly to the Ruby interpreter.

But wait, I’ve already set environment variables in my /etc/bashrc or /etc/profile. Why don’t they work?

If you’ve set environment variables in your /etc/bashrc or /etc/profile, then these environment variables are made available in your shell. However, on most operating systems, Apache is not started from the shell and does not load environment variables defined in bashrc/profile, which is why setting environment variables in /etc/bashrc and /etc/profile usually has no effect on Apache (and by induction, on Passenger and Rails processes).

Final words

This is just a quick blog post which I’ve written after seeing many people asking questions on this subject. This subject deserves proper official documentation, but I haven’t had the time to do it yet. If anybody wants to submit a documentation patch then please feel free to do so. In the long term it would probably be nice if one can pass environment variables to the Ruby interpreter via Apache configuration options, but it’s not a very high priority issue at this moment.

Phusion Passenger (mod_rails) version 1.0.2 released, and more

By Ninh Bui on April 29th, 2008

It has been two weeks since the release of Passenger version 1.0.1. More and more people are switching to Passenger, and most are very pleased with the quality of our initial release. πŸ™‚ But in these past 2 weeks, we’ve continued to make improvements to Passenger. So today, we’re happy to announce the release of Passenger version 1.0.2. πŸ˜€

Hongli Lai Ninh Bui

Featured improvements and changes

100% support for MacOS X’s default Apache

Passenger has always supported MacOS X. This fact is demonstrated in our screencast, created by Ryan Bates on a Mac. However, there was an inconvenience: Passenger was incompatible with the default Apache installation, as provided by OS X. The installer warned about that. As a result, OS X users had to install Apache via MacPorts or by hand.

But no more. Thanks for the help of Weyert de Boer and the people at Fingertips, we’ve been able to track down the problem. Passenger now fully supports MacOS X’s default Apache! There is no need to install Apache via MacPorts anymore. πŸ™‚

RubyGems-related fixes: Rails < 2.0 is now supported

The Passenger gem specifies Rails 2.0 as a dependency. This seemed to be a good idea at the time: Passenger is to be used in combination with Rails, and we figured that by specifying Rails as a dependency, the user will have one command less to type.

But it turned out that some RubyGems versions will load Rails 2.0 during Passenger’s startup, even though Passenger didn’t explicitly tell RubyGems to do that. As a result, some people were having trouble with using Passenger with Rails < version 2.0. This issue has been fixed.

Memory statistics tool

Some people have attempted to analyze Passenger’s memory usage. But standard tools such as ‘top’ and ‘ps’ don’t always report the correct memory usage.

We’ve provided a tool, passenger-memory-stats, which allows people to easily analyze Passenger’s and Apache’s real memory usage. For example:

$ sudo ./bin/passenger-memory-stats
------------- Apache processes --------------
PID    PPID  Threads  VMSize   Private  Name
---------------------------------------------
5947   1     9        90.6 MB  0.5 MB   /usr/sbin/apache2 -k start
5948   5947  1        18.9 MB  0.7 MB   /usr/sbin/fcgi-pm -k start
6029   5947  1        42.7 MB  0.5 MB   /usr/sbin/apache2 -k start
6030   5947  1        42.7 MB  0.5 MB   /usr/sbin/apache2 -k start
6031   5947  1        42.5 MB  0.3 MB   /usr/sbin/apache2 -k start
6033   5947  1        42.5 MB  0.4 MB   /usr/sbin/apache2 -k start
6034   5947  1        50.5 MB  0.4 MB   /usr/sbin/apache2 -k start
23482  5947  1        82.6 MB  0.4 MB   /usr/sbin/apache2 -k start
### Processes: 8
### Total private dirty RSS: 3.50 MB

--------- Passenger processes ---------
PID    Threads  VMSize   Private  Name
---------------------------------------
6026   1        10.9 MB  4.7 MB   Passenger spawn server
23481  1        26.7 MB  3.0 MB   Passenger FrameworkSpawner: 2.0.2
23791  1        26.8 MB  2.9 MB   Passenger ApplicationSpawner: /var/www/projects/app1-foobar
23793  1        26.9 MB  17.1 MB  Rails: /var/www/projects/app1-foobar
### Processes: 4
### Total private dirty RSS: 27.76 MB

The private dirty RSS field shows the *real* memory usage of processes. Here, we see that all the Apache worker processes only take less than 1 MB memory each. This is a lot less than the 50 MB-ish memory usage as shown in the “VMSize” column (which is what a lot of people think is the real memory usage, but is actually not).

Please note that this tool only works on Linux. Unfortunately other operating systems don’t provide facilities for determining processes’ private dirty RSS.

Improved stability

If the framework spawner server or application spawner crashes, then Passenger 1.0.1 will keep showing error messages until one restarts Apache. Passenger 1.0.2 will automatically restart spawner servers when they crash, thus lowering maintenance burden even more.

Setting ENV[‘RAILS_ENV’] in environment.rb now works
A bug caused ENV[‘RAILS_ENV’] in environment.rb to be ignored. This has now been fixed.
Support for custom page caching directories

Page caching was supported by Passenger, but setting a custom (non-standard) page caching directory did not work. This has now been fixed. But please note that Passenger won’t be able to accelerate page cache files in non-standard page caching directories.

Usability and documentation improvements

The community has provided a lot more insight on things that can go wrong. We’ve done our best to document all troubleshooting-related issue into our Users guide. We’ve also adapted some error messages so that users can solve the problem without reading the manual.

Thanks for all the feedback people! πŸ™‚

Fixed conflicts with system-provided Boost library

Passenger makes use of the Boost C++ library. Its sources are included into the Passenger sources. But if the system already has a different Boost version installed, then the two Boost libraries would conflict with each other, and Passenger would fail to install. We’ve made sure that this doesn’t happen: now, installation will succeed even if there’s already another Boost version installed.

Improved SSL compatibility

There was a problem with SSL hosts, which would only be triggered if “SSLOptions +ExportCertData” is set. This issue has now been fixed.

Improved support for graceful restarts

If you installed Passenger for the first time, then the first graceful Apache restart would not properly initialize Passenger. This issue has now been solved.

There are also a few small improvements and changes that aren’t worth mentioning.

How do I upgrade?

Just install it like you did the first time:

gem install passenger

and

passenger-install-apache2-module

Please don’t forget to copy & paste the Apache config snippet that the installer gives you.

Enterprise Licenses, donations and t-shirts

In many ways, Phusion Passenger (mod_rails) has been an overwhelming success to us, and we’re very grateful for the community support you guys have given us. Also, a lot of companies and individuals have been more than generous in purchasing an Enterprise License for Phusion Passenger (mod_rails). In particular, we’d like to thank all the people who have donated over a certain amount and thought it would only be fitting to send them something concrete as a reminder of this generous act. After giving it a lot of thought, we came with something really shabby (or at least we ‘part-time fashion connoisseurs’ think so πŸ˜‰ ).

To celebrate our first successful open source product launch here at Phusion, we’ve decided to silkscreen-print 100 limited edition Phusion t-shirts, each hand-numbered from 1 to 100. A few of these will go to our friends at Apple, Sun Microsystems and 37 signals, and the remainder of the shirts will go to those who have donated over 200 USD in total (we’ll take care of the shipping fees). Needless to say, these shirts are going to be hot as heck at IT conferences such as Railsconf, as they have been silkscreen-printed by the same people who are responsible for printing the shirts for the uberhip brands Rockwell, Freshcotton and Top Notch. The shirts themselves are super premium t’s which weigh 205gr/m2. To emphasize this even more, we’ve arranged for a photo shoot with a few professional lady models πŸ˜‰ and just like you, we can’t wait to see the result of this. Hopefully, you’ll be able to see the result soon!

An artist’s impression of the Phusion t-shirt

People who haven’t donated yet, or donated less than 200 USD but who want a piece of the t-shirt action as well will get the opportunity to “set this right” in the second (current) and third batch of enterprise licenses by donating the remainder amount to us under the same PayPal account. We’ll try to sort this out then as soon as possible. Needless to say, first come, first serve will be maintained, so if you want a shirt, be sure to act fast as supplies are bound to not last for very long! You probably don’t want to be figuring out that you actually wanted a shirt like this when it’s too late right? πŸ˜‰ Also, we’ve only got a limited amount in each size (especially the sizes small and XXL are likely to run out fast, and not to mention the girlie sized shirts for the ladies).

Lastly, we’re very grateful for all donations, and it is for this reason that we’ll also occasionaly randomly pick a few people from the donation list that haven’t donated over 200 USD for a Phusion t-shirt as well ;-). So in short, whatever amount you decide to donate, be sure to include your shirt size as well from now on as it might be your lucky day πŸ˜‰

RailsConf

Not only community wise, but also commercial wise, Phusion Passenger has opened up a lot of doors for Phusion that would otherwise likely have remained closed. For starters, we’ll be talking at Railsconf in a little more than a month about Phusion Passenger and the highly anticipated Ruby Enterprise Edition. It seems that the latter has already generated a lot of buzz and that this for the greater part, is because of its name. We actually think this is a good thing since we don’t believe that there is such a thing as bad publicity. πŸ˜‰ Don’t worry too much about it though, Railsconf will provide us with the perfect opportunity to dive into this subject a little bit deeper and hopefully, you’ll agree with us on that it’ll make a lot of sense to call it Ruby Enterprise Edition. We’ll also do something that is probably unprecedented with regards to talks so be sure to check us out over there, even if it’s just for the meet and greet / casual chat. πŸ˜‰

We’re on the RailsConf speakers list!

Side notes

We’re also still hard at work on writing a series of articles on both Phusion Passenger as well as Ruby Enterprise Edition from which we’ll also distill a scientific paper to be published on eeprints at the University of Twente (rocking! ;-)). Needless to say, these articles will be published for your reading pleasure as well. πŸ˜‰

As you may have already noticed, Phusion recently consisted of mainly Hongli Lai and Ninh Bui. Even though we two make up for one hell of a team, we both definitely started feeling the growth pains of a healthy growing startup company. A little while ago we posted some job openings in the hopes of increasing Phusion’s capacity, but unfortunately, most of these applications were from outside of the Netherlands.

Today however, we’re pleased to announce that our good friend Tinco Andringa has decided to join the fray by joining Phusion. He’s not alone in this though, since our other good friend Maurits Dijkstra has also decided to do the same. And yes, the latter of the two IS related to the famous Edsger Dijkstra, which you may already know from Dijkstra’s shortest path algorithm πŸ˜‰ (but that wasn’t the main reason why we wanted him on board at Phusion per se πŸ˜‰ ) Just like with Hongli and I, Maurits’ and Tinco’s computer science education find their origin at the Universiteit Twente and both have built a nice career on the side as software engineers as well: with this configuration, we hope to be able to even deliver better on our services and products!

Well, that wraps it up for today! Stay tuned though, as we’ve only started to ‘bring it on’! πŸ˜‰

With kind regards, your friends at Phusion,

Hongli Lai Ninh Bui

– Tinco Andringa
– Maurits Dijkstra