Nuts & Bolts: HAproxy

Mark Imbriaco wrote this on Sep 03 2008 28 comments

A common request we get from readers is to describe in more detail how our server infrastructure is setup. That question is so incredibly broad that it’s hard to answer it in any kind of comprehensive way, so I’m not going to try to. Instead, I’m keeping the general desire for more technical details in mind as I work through day-to-day issues with our configuration, and I’ll try to occasionally write about things that I think might be of interest. The topic for today is HAproxy.

HAproxy is a load balancing proxy server that we use between our front-end web servers (Apache) and our back-end application servers (Mongrel). It allows us to spread the traffic for dynamic content among a number of application servers and deals with things like failover if any of the individual Mongel instances happen to be down.

There are a lot of software load balancers out there and we’ve tried a number of them. Some of the reasons we finally ended up going with HAproxy include:

Speed. It’s really, really fast.
It’s efficient. One of our instances is handling around 700 requests per second and using less than 5% CPU and only around 40MB of RAM on the Xen instance that it’s running on.
It allows us to make configuration changes gracefully, without breaking any existing connections.
It allows us to queue requests up if all of our Mongrels are busy.

Of these, the last item is the killer feature for our purposes.

I’ll let you in on a secret—this post has been hanging around for a couple of months now and I’ve tried several times to come up with a good way to describe how the queueing works, but I’ve never been satisfied with the results. With that in mind, I decided to take a page from the 37signals Live playbook and do a brief screencast to illustrate the power of this feature.

I plan to write more about our infrastructure as time permits. If there are particular topics that you’d like to see covered in a future installment, please email [email protected].

Mark Imbriaco wrote this on Sep 03 2008 There are 28 comments.

Tammer Saleh

on 03 Sep 08

Mark – Thanks for describing this part of your network configuration. You’re introducing HAProxy in part to have automatic failover when one of your mongrel servers dies. Doesn’t sending all dynamic content through one Xen instance also introduce a single point of failure? Are you running a cluster of HAProxy instances?

Thanks again for the really helpful information.

MI

on 03 Sep 08

Tammer: Yes, we have multiple HAproxy instances so that there isn’t a single point of failure introduced from it. Typically we run HAproxy on the same machine that a webserver is running on, and we give each a subset of the global pool of Mongrel processes.

I’m glad you enjoyed it!

Nathan

on 03 Sep 08

Thanks for this, I’ve left a ton of ‘more technology posts please’ comments – looks like they paid off.

Piku

on 03 Sep 08

Keep posting tech stuff, it’s interesting to see what are you using and what problems you faced.

Ryan Angilly

on 03 Sep 08

This an interestingly timed post, because I’ve been spending all morning trying to solve a problem with my Apache proxy_balancer and my mongrel clusters. It’s not exactly on topic, but close enough:

My mongrel’s pull a file from S3, and then serve it up to the user. I’ve been using XSendfile, and it’s been fine since up until now the mongrels and Apache have been on the same instance, but now we’re scaling out and the filename in the xsendfile header isnt on the filesystem where Apache is.

Does this HAproxy address the problem of xsendfile-ing a file from a mongrel cluster through the proxy balancer? If not, have any other ideas?

Brian Jones

on 03 Sep 08

So just to clarify, it’s setup like this:

Apache -> HAProxy -> Mongrel

So you would be using HAProxy instead of the Apache load balancer?

Brian.

MI

on 03 Sep 08

Ryan: There are a couple of possible solutions: 1. Generate tokenized URLs for S3 and let the file get served directly from S3, 2. Use a shared filesystem like NFS so that your Mongrels and Apache instances see the file in the same path. HAproxy doesn’t particularly help with either.

Brian: Exactly right, with a hardware based load balancer in front of the Apache instances.

Ryan Angilly

on 03 Sep 08

I actually (think) I found a simpler way. I ended up putting an apache balancer in front of each mongrel cluster. It intercepts the xsendfile while still on the same instance, and then just returns the whole thing to the top level balancer. So far so good.

Tim

on 03 Sep 08

@Mark / 37signals

In the screencast, you mention that using roundrobin is a naive approach.

What would be the better alternative and what would the HAproxy config look like? Would it also use the “max connection 1” directive?

Duane

on 03 Sep 08

Thanks for this content, I am looking forward to more posts about your infrastructure.

MI

on 03 Sep 08

Tim: You’re right, I said that, but it’s not what I really meant. :) I meant that it was naive without the maxconn 1, at least for traditional single threaded Rails apps that block in Mongrel.

Tim

on 03 Sep 08

@MI

“at least for traditional single threaded Rails apps that block in Mongrel.”

Since the new release of Rails will now be thread-safe, do you have any recommendations as to what the new HAproxy / Mongrel config should be to better utilize the non-blocking nature of the new release of Rails.

Thanks in advance.

MI

on 03 Sep 08

Thread safe, but not necessarily concurrent. For the time being, I think it still makes sense to only send a single connection to a Mongrel at a time. That said, I haven’t got any benchmarks to back that up just yet, but my gut tells me that the thread safety isn’t going to make much difference right now if you’re using a Ruby runtime that doesn’t have native threads—basically all of them except for JRuby at this point.

Sebastian

on 03 Sep 08

Thanks for the article, Mark!

But one question: Is it possible to use Haproxy with Https/SSL? I vaguely remember reading somewhere about the missing ssl-support.

MI

on 03 Sep 08

Sebastian, it doesn’t look like HAproxy natively supports SSL, no. Since we only use it between Apache and backend Mongrel processes, and all of that traffic happens on our internal networks, we don’t require SSL.

Ladislav Martincik

on 03 Sep 08

Thanks for the screencast, very nice.

I have a specific question about HAProxy. I’m using httpchk for health check of mongrels. I didn’t see it in your config file, so are you using it or not? Also these health checks are generating requests like “[OPTIONS]” with URI [http:// / ] and I’ve tried to rewrite this with Mongrel handler so it didn’t go to production.log file, but it didn’t work for me.

So the question is are you using health checks?

Thank you, Ladislav

Jeremy Ricketts

on 03 Sep 08

That… was… awesome. I would really like to see more of these types of posts. As a newcomer to Rails, I feel like there aren’t nearly enough large scale Rails-driven operations that show us what’s going on when it comes to their infrastructure. Not that they are obligated to do this. But it’s really helpful to see what people are doing to grow scalable, dependable operations built on RoR.

Phil Dokas

on 03 Sep 08

What’s going on with TextMate in that video? Is that drawer from an unreleased version or is it from a plugin?

MI

on 03 Sep 08

Ladislav: We don’t use httpchk, no.

Phil: It’s the MissingDrawer plugin.

Stan Cheng

on 03 Sep 08

I’m researching how to setup a scalable environment on EC2 using Apache/Passenger, specifically with regards to web/app servers.

Could something like HAProxy be used with Phusion Passenger? Does it even make sense to do this, or is this a scenario it would make sense to have a hardware load balancer sitting in front of multiple apache/passenger instances?

Avi Flombaum

on 03 Sep 08

Wow Mark, thanks for sharing! I’ve been using Nginx but after reading this and searching a bit on HAProxy (http://affectioncode.wordpress.com/2008/06/28/another-comparison-of-haproxy-and-nginx/ ) being extremely helpful, the next app I setup, I’ll give HAProxy a spin! More Nuts & Bolts!

Sam Barnum

on 03 Sep 08

Thanks for the informative post. What are you using for database clustering? Do you set up dedicated database machines which replicate?

Willy Tarreau

on 03 Sep 08

Hi Mark!

as the author of HAProxy, I can say that it’s probably the coolest and the most explicit demonstration that I’ve seen yet! I’ll add a link from my site and recommend your presentation to several people who sometimes wonder how all that works. Great work, really.

Cheers, Willy

MI

on 03 Sep 08

Willy: Thanks for the kind words. I should be thanking you for HAproxy, not the other way around! It seriously is one of the most reliable pieces of software we use, so thanks!

Eamon

on 04 Sep 08

Folks may also be interested in Perlbal. It’s a reverse proxy load balancer that can also serve static files directly and act as an SSL endpoint. Plus, it’s written in Perl and easily customized. It’s meant huge savings for our infrastructure.

Justin

on 06 Sep 08

Thanks for the great screencast. We just started running haproxy in production and it seems to be working great.

adam

on 07 Sep 08

Thanks for the excellent screencast – most helpful. I’ve read that using: balance leastconn As well as the maxconn=1 setting helps.

Do you use the roundrobin balancing algorithm only?

many thanks

Adam.

Clayton

on 08 Sep 08

Thanks for the great screencast. I’d love to see more technical stuff like this in the future.

This discussion is closed.

About Mark Imbriaco

Ran ops for 37signals, Heroku, and LivingSocial.

Read all of Mark Imbriaco’s posts, and follow Mark Imbriaco on Twitter.

If you liked this Sysadmin post by Mark Imbriaco, you’ll probably like reading Nuts & Bolts: Potpourri, Nuts & Bolts: Campfire loves Erlang., and Nuts & Bolts: Storage