Ruby on Rails scalability/performance?

Ruby on-RailsRubyScalability

Ruby on-Rails Problem Overview


I have used PHP for awhile now and have used it well with CodeIgniter, which is a great framework. I am starting on a new personal project and last time I was considering what to use (PHP vs ROR) I used PHP because of the scalability problems I heard ROR had, especially after reading what the Twitter devs had to say about it. Is scalability still an issue in ROR or has there been improvements to it?

I would like to learn a new language, and ROR seems interesting. PHP gets the job done but as everyone knows its syntax and organization are fugly and it feels like one big hack.

Ruby on-Rails Solutions


Solution 1 - Ruby on-Rails

To expand on Ryan Doherty's answer a bit...

I work in a statically typed language for my day job (.NET/C#), as well as Ruby as a side thing. Prior to my current day job, I was the lead programmer for a ruby development firm doing work for the New York Times Syndication service. Before that, I worked in PHP as well (though long, long ago).

I say that simply to say this: I've experienced rails (and more generally ruby) performance problems first hand, as well as a few other alternatives. As Ryan says, you aren't going to have it automatically scale for you. It takes work and immense amounts of patience to find your bottlenecks.

A large majority of the performance issues we saw from others and even ourselves were dealing with slow performing queries in our ORM layer. We went from Rails/ActiveRecord to Rails/DataMapper and finally to Merb/DM, each iteration getting more speed simply because of the underlying frameworks.

Caching does amazing wonders for performance. Unfortunately, we couldn't cache our data. Our cache would effectively be invalidated every five minutes at most. Nearly every single bit of our site was dynamic. So if/when you can't do that, perhaps you can learn from our experience.

We had to end up seriously fine tuning our database indexes, making sure our queries weren't doing very stupid things, making sure we weren't executing more queries than was absolutely necessary, etc. When I say "very stupid things", I mean the 1 + N query problem...

# 1 query
Dog.find(:all).each do |dog|
   # N queries
   dog.owner.siblings.each do |sibling|
      # N queries per above N query!
      sibling.pets.each do |pet|
         # Do something here
      end
   end
end

DataMapper is an excellent way to handle the above problem (there are no 1 + N problems with it), but an even better way is to use your brain and stop doing queries like that. When you need raw performance, most of the ORM layers won't easily handle extremely custom queries, so you might as well hand write them.

We also did common sense things. We bought a beefy server for our growing database, and moved it off onto it's own dedicated box. We also had to do TONS of processing and data importing constantly. We moved our processing off onto its own box as well. We also stopped loading our entire freaking stack just for our data import utilities. We tastefully loaded only what we absolutely needed (thus reducing memory overhead!).

If you can't tell already... generally, when it comes to ruby/rails/merb, you have to scale out, throwing hardware at the problem. But in the end, hardware is cheap; though that's no excuse for shoddy code!

And even with these difficulties, I personally would never start projects in another framework if I can help it. I'm in love with the language, and continually learn more about it every day. That's something that I don't get from C#, though C# is faster.

I also enjoy the open source tools, the low cost to start working in the language, the low cost to just get something out there and try to see if it's marketable, all the while working in a language that often times can be elegant and beautiful...

In the end, it's all about what you want to live, breathe, eat, and sleep in day in and day out when it comes to choosing your framework. If you like Microsoft's way of thinking, go .NET. If you want open source but still want structure, try Java. If you want to have a dynamic language and still have a bit more structure than ruby, try python. And if you want elegance, try Ruby (I kid, I kid... there are many other elegant languages that fit the bill. Not trying to start a flame war.)

Hell, try them all! I tend to agree with the answers above that worrying about optimizations early isn't the reason you should or shouldn't pick a framework, but I disagree that this is their only answer.

So in short, yes there are difficulties you have to overcome, but the elegance of the language, imho, far outweighs those shortcomings.

Sorry for the novel, but I've been there and back with performance issues. It can be overcome. So don't let that scare you off.

Solution 2 - Ruby on-Rails

RoR is being used with lots of huge websites, but as with any language or framework, it takes a good architecture (db scaling, caching, tuning, etc) to scale to large numbers of users.

There's been a few minor changes to RoR to make it easier to scale, but don't expect it to scale magically for you. Every website has different scaling issues, so you'll have to put in some work to make it scale.

Solution 3 - Ruby on-Rails

Develop in the technology that is going to give your project the best chance of success - quick to develop in, easy debugging, easy deployment, good tools, you know it inside out (unless the point is to learn a new language), etc.

If you get tens of million of uniques a month you can always hire in a couple of people and rewrite in a different technology if you need to as ...

... you'll be rake-ing in the cache (sorry - couldn't resist!!)

Solution 4 - Ruby on-Rails

First of all, it would perhaps make more sense to compare Rails to Symfony, CodeIgniter or CakePHP, since Ruby on Rails is a complete web application framework. Compared to PHP or PHP frameworks, Rails applications offer the advantages that they are small, clean, and readable. PHP is perfect for small, personal pages (originally it stood for "Personal Home Page"), while Rails is a full MVC framwork which can be used to build large sites.

Ruby on Rails has not a larger scalability issue than comparable PHP frameworks. Both Rails and PHP will scale well if you have only a moderate number of users (10,000-100,000) which operate on a similar number of objects. For a few thousand users a classic monolithic architecture will be sufficient. With a bit of M&M (Memcached and MySQL) you can also handle millions of objects. The M&M architecture uses a MySQL server to handle writes and Memcached to handle high read loads. The traditional storage pattern, a single SQL server using normalized relational tables (or at best a SQL Master/Multiple Read Slave setup), no longer works for very large sites.

If you have billions of users like Google, Twitter and Facebook, then probably a distributed architecture will be better. If you really want to scale your application without limit, use some kind of cheap commodity hardware as a foundation, divide your application into a set of services, keep each component or service scalable itself (design every component as a scalable service), and adapt the architecture to your application. Then you will need suitable scalable datastores like NoSQL databases and distributed hash tables (DHTs), you will need sophisticated map-reduce algorithms to work with them, you will have to deal with SOA, external services, and messaging. Neither PHP nor Rails offer a magic bullet here.

Solution 5 - Ruby on-Rails

What is breaks down to with RoR is that unless you're in Alexa's top 100, you will not have any scalability problems. You'll have more issues with stability on shared hosting unless you can squeeze Phusion, Passenger, or Mongrel out.

Solution 6 - Ruby on-Rails

Take a little while to look at the problems the Twitter people had to deal with, then ask yourself if your app is going to need to scale to that level.

Then build it in Rails anyway, because you know it makes sense. If you get to Twitter-level volumes then you'll be in the happy position of considering performance optimisaton options. At least you'll be applying them in a nice language!

Solution 7 - Ruby on-Rails

You can't compare PHP and ROR, PHP is a scripting language as Ruby, and Rails is a framework as CakePHP.
Stated that, I strongly suggest you Rails, because you will have an application strictly organized in MVC pattern, and this is a MUST for your scalability requirement. (Using PHP you had to take care about the project organization on your own).
But for what about scalability, Rails it's not just MVC: For instance, you can start to develop your application with a database, changing it on road without any effort (in the most part of cases), so we can state that a Rails application is (almost) database indipendent because it's ORM (that allow you to avoid database query), you can do a lot of other stuff.
(take a look to this video http://www.youtube.com/watch?v=p5EIrSM8dCA )

Solution 8 - Ruby on-Rails

Just wanted to add some more info to Keith Hanson's smart point about 1 + N problem where he states:

> DataMapper is an excellent way to handle the above problem (there are no 1 + N problems with it), but an even better way is to use your brain and stop doing queries like that. When you need raw performance, most of the ORM layers won't easily handle extremely custom queries, so you might as well hand write them.

Doctrine is one of the most popular ORM's for PHP. It addresses this 1 + N complexity problem intrinsic to ORMs by providing a language called Doctrine Query Language (DQL). This allows you to write SQL like statements that use your existing model relationships. e.g

$q = Doctrine_Query::Create()
->select(*)
->from(ModelA m)
->leftJoin(m.ModelB)
->execute()

Solution 9 - Ruby on-Rails

I'm getting the impression from this thread that the scalability issues of ROR come down primarily to the mess that ORMs are in with regard to loading child objects - ie the '1+N' problem mentioned above. In the above example that Ryan gave with dogs and owners:

Dog.find(:all).each do |dog|
   #N queries
   dog.owner.siblings.each do |sibling|
      #N queries per above N query!!
      sibling.pets.each do |pet|
         #Do something here
      end
   end
end

You could actually write a single sql statement to get all that data, and you could also 'stitch' that data up into the Dog.Owner.Siblings.Pets object heirarchy of your custom-written objects. But could someone write an ORM that did that automatically, so that the above example would incur a single round-trip to the DB and a single SQL Statement, instead of potentially hundreds? Totally. Just join those tables into one dataset, then do some logic to stitch it up. It's a bit tricky to make that logic generic so it can handle any set of objects but not the end of the world. In the end, tables and objects only relate to each other in one of three categories (1:1, 1:many, many:many). It's just that no one ever built that ORM.

You need a syntax that tells the system upfront what children you want to load for this particular query. You can sort of do this with the 'eager' loading of LinqToSql (C#), which is not a part of ROR, but even though that results in one round trip to the DB, it's still hundreds of separate SQL statements the way it has currently been set up. It's really more about the history of ORMs. They just got started down the wrong path with that and never really recovered in my opnion. 'Lazy loading' is the default behavior of most ORMs, ie incurring another round trip for every mention of a child object, which is crazy. Then with 'eager' loading - loading the children upfront, that is set up statically in everything I am aware outside of LinqToSql - ie which children always load with certain objects - as if you would always need the same children loaded when you loaded a collection of Dogs.

You need some kind of strongly-typed syntax saying that this time I want to load these children and grandchilren. Ie, something like:

Dog.Owners.Include()
Dog.Owners.Siblings.Include()
Dog.Owners.Siblings.Pets.Include()

then you could issue this command:

Dog.find(:all).each do |dog|

The ORM system would know what tables it needs to join, then stitch up the resulting data into the OM heirarchy. It's true that you can throw hardware at the current problem, which I'm generally in favor of, but it's no reason the ORM (ie Hibernate, Entity Framework, Ruby ActiveRecord) shouldn't just be better written. Hardware really doesn't bail you out of an 8 round-trip, 100-SQL statement query that should have been one round trip and one SQL statement.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionryeguyView Question on Stackoverflow
Solution 1 - Ruby on-RailsKeith HansonView Answer on Stackoverflow
Solution 2 - Ruby on-RailsRyan DohertyView Answer on Stackoverflow
Solution 3 - Ruby on-RailsRichHView Answer on Stackoverflow
Solution 4 - Ruby on-Rails0x4a6f4672View Answer on Stackoverflow
Solution 5 - Ruby on-RailsphresusView Answer on Stackoverflow
Solution 6 - Ruby on-RailsMike WoodhouseView Answer on Stackoverflow
Solution 7 - Ruby on-RailsJoeView Answer on Stackoverflow
Solution 8 - Ruby on-RailsAree CohenView Answer on Stackoverflow
Solution 9 - Ruby on-RailsJeffView Answer on Stackoverflow