Twitter is implemented with Ruby and Ruby-on-Rails, and suffers from a serious design issue that can also bottleneck ColdFusion projects, especially those developed using similar framework systems like Reactor and Model-Glue.
What’s the problem? Here’s the excerpt that started it all from the interview with Twitter Developer Alex Payne, “The common wisdom in the Rails community at this time is that scaling Rails is a matter of cost: just throw more CPUs at it. The problem is that more instances of Rails means more requests to your database. At this point in time there’s no facility in Rails to talk to more than one database at a time.”
Translated into ColdFusion terms, the framework out of the box wants a single DSN defined that it uses for all of the objects/tables in the system. This means that all requests must be, by definition, routed to the same database server. And in Twitter’s case, that server is getting overloaded with requests.
Compare that to the common ColdFusion practice (or at least, my common practice) of always setting up multiple client databases, even when initially they all reside on a single server. I typically create distinct databases, and distinct DSNs, for cfclient, membership, orders, content, and logging. Sometimes I have even more.
While the separation may cause the occasional logistical problem, I find that the peace of mind I and my clients get to be well worth the price. If a site “hits” and I need to scale, I can immediately pull the logging or content databases off onto their own servers with no impact to the existing code, simply by moving the data and changing the database DSN in the ColdFusion administrator.
So what’s the connection to ColdFusion?
Unfortunately, some CRUD (create, read, update, delete) systems like Reactor andTransfer assume that there’s only one DSN in the configuraton file, or one datasource file. As such, in a typical installation the objects created can only talk to a single database, putting us into the same situation as Twitter and its Ruby-on-Rails stack.
One solution with Reactor is to create multiple reactor factories. So in one case you’d call “application.contentFactory” to get an object and in another you’d call “application.loggingFactory”. While feasible, from my perspective this solution is less than optimal, since your code has to now know what data resides where, and which factory to call to get it. Move a table, and you need to dig through your code changing factory references.
And one company I consulted with was distinctly database-happy. To use Reactor there I would have needed to create and manage 47 different factories connecting to 47 different SQL Server databases, almost mandating a Reactor factory-factory. Messy.
Further, event-driven MVC frameworks like Model-Glue often use an ORM adaptor that returns a reference to the currently defined ORM factory, again making it difficult to multiple instances of a factory (or multiple databases) in a single application.
What all of this comes down to is knowing the limitations of the tools you use, and in planning for the future. It means doing a proper design and logically splitting up your data before you and your site get crunched and everyone ends up under the gun.
It also means talking to your tool and framework developers, and pushing for changes where they’re needed. After all, you don’t want to be reading a year or so about how your site doesn’t scale.