ECMS: O/R vs. ZODB

Tres Seaver

2005-08-23 05:00

The week before last, Florent Guillaume asserted that "object-relational [is] the keystone of a proper enterprise-grade application server", which left me more than a little uncomfortable. I spent a number of years working with "enterprise-grade" OO applications built atop Oracle and Sybase; from that experience, I am convinced that Florent underestimates the penalties associated with requiring that any ECMS must be based on RDBMS technology.

Gary's thoughtful response laid out a lot of the issues very clearly. I would like to note a couple of additional observations:

I have built large-scale CMS applications (hundreds of thousands of content objects, etc.) using the ZODB, and am aware of the associated pitfalls. Some patterns for avoiding such problems (keeping catalogs in separate ZODB storages, for instance) may not be widely known in the Zope community, but they have proven effective. Multi-version concurrency control, new in ZODB 3.4 / Zope 2.8, should also mitigate some of the same problems.
Gary mentions the possible performance tradeoffs associated with RDBMS-centric designs. The more crucial one, at least in my experience, has been "development efficiency": introducing an RDBMS into the architecture almost guarantees that the "development velocity" of the project will drop (and remain lower), compared to an equivalent project which uses the ZODB. Not only is there another layer to maintain (two, actually: the RDBMS schema and the O/R mapping layer), but the "ownership" of the layers is typically different. Having to fight with the DBA in order to make schema changes needed by the application is itself slow, and also encourages nasty hackery to avoid the conflict itself.
One attack usually leveled against a ZODB-based CMS is that it is not ad-hoc searchable using the de-facto standard query language (SQL). While true, the assertion that RDMBS applications are more easily queryable is somewhat naive: once the scale of the application increases, the schema tunings which support high-volume data entry are usually in direct conflict with those which support ad-hoc queries. In such cases, the usual solution ("data warehousing") is to create a derived schema with different indexes and schema, customized to support ad hoc queries. A similar strategy could be used with a ZODB-based CMS, asynchronously indexing the content into a "query-centric" RDBMS.