Skip to content

The Elephant in the Computer Room

I was sitting next to Jay Fry in Las Vegas (not at the tables, honest), listening to Tom Bittman’s keynote opening the Gartner Data Center Conference in December, when Tom said “[according to Gartner's analysis], if you fully utilize your own equipment, Amazon [EC2] will cost you twice as much.”  Jay kept on furiously taking notes, but I missed the next few minutes of Tom’s speech.  I was thinking “boy, that’s a big ‘if’.”

Analyses like Gartner’s have sparked many discussions on the true cost of IaaS vs. the true cost of operating your own gear, or even outsourcing operations.  A recent example of the math can be found in CIO’s Bernard Golden’s fourth part (of six, concluded this week) in the series “The Case Against Cloud Computing” (which is really about making the case for cloud computing by examining critics’ arguments then offering refuting remarks).  Bernard relays a couple of calculations of the TCO of Amazon EC2 large and small instances, summing to at-first-blush large amounts, and then advises cloud shoppers to “do the math correctly”, meaning (I take it), correctly account for all the costs of running your own gear or outsourcing, and implying that, if you are honest with your corporate self, you’ll see that Amazon’s EC2 servers are really not that expensive after all.

I think Bernard strikes a resonant note when, at the end of the article, among “the cloud cost advantages”, he lists things like “the pricing is transparent” and “the pricing is fixed”.  Business accounting sometimes seems deliberately designed to be opaque and complex.  Often IT-related costs are concealed in non-IT cost centers.  IT rarely gets the data center electric bill, for instance, instead it goes to Facilities, as does the cost of the computer room, HVAC, UPS, etc.  Sometimes it seems like accounting just punts on trying to figure out where costs should be charged, instead uniformly allocates them according to non-usage-related financial accounting formulae across cost centers.  For example, in budget-speak, every employee comes laden with “burden”, the allocated cost of the real estate occupied by their office and common areas, GA “overhead” like HR/accounting/receptionists/security/maintenance, etc.  And to further complicate the maze, capital costs — the cost of the assets themselves — are subject to arcane depreciation and cost-of-capital machinations by the financial wizards, making understanding “true” IT cost for a given server also subject to choices a business makes about tax treatment of assets (e.g., depreciation schedule and method), and how they pay for them.

By comparison, how refreshing to just get an invoice with a bottom-line number from your external cloud IaaS supplier, even if it might be larger than the cost of buying and running it yourself.

But Bernard’s sources are splitting hairs when they argue about the cost of servers, owned/operated by IT or rented from the cloud.  A more important question is, how much does the consumable application or end-user service provided by those servers cost?  That’s what the enterprise really cares about, and this higher-level view exposes a factor that overwhelms any of the cost-of-server factors: Utilization.   Remember that Tom Bittman’s statement was qualified “if you fully utilize…”  That “if” dominates the cost-of-service calculation and it applies to usage of external cloud IaaS sources like Amazon, as well as internal or private resources.  Utilization is the elephant in the computer room. 

Think about it.  No matter how we reasonably construct an equation for the internal cost of a server, the true cost of the end-user service or application is proportional to how efficiently IT converts that server capacity into useful work.  Utilization is a good measure of the efficiency of this conversion, and it has a huge effect on the cost of IT.  If you average 20% server utilization, for example, it’s a 5x potential multiplier on the sum of your server costs.  If you average 10% (or less — you know who you are), it’s a 10x multiplier.  So what if you get an additional 20% discount on new servers from Dell?  A drop in the bucket.  Doubled the number of servers each admin can manage?  Big deal.  Double your utilization and you can cut your IT budget in half without affecting service levels.

So, does the cloud help your utilization?  Maybe, but not because of the cost or pricing structure. Computational IaaS is sold by the glass, not by the drink.  It doesn’t matter if you quaff it dry or just sip the suds at the top, you pay the same amount.  

For instance (regrettable pun intended), Amazon doesn’t charge for EC2 by the CPU cycle or by the number of actual instructions executed (this lack of granularity is the key difference between storage and computation economics).  Instead, they charge by the instance-hour, a wall-clock-timed allocation of peak capacity that it’s up to the user to employ efficiently or wastefully, just like a real server you buy, plug in, and run yourself.  Low utilization, high utilization, Amazon doesn’t care.  You get the same bill either way, but that bill might be 5 or 10 times larger than it needs to be if your utilization is still running at industry standard averages.  

(If Amazon wanted to solve the server utilization problem at a stroke, they could charge for actual processor time instead of instance hours, the way time-share computers used to be billed.  If I’m only using 10% of a $.10/hour instance, only charge me a penny/hour.  Not much chance of that happening, but it’s not the only solution.)

The way that cloud computing can help utilization is by being an elastic resource, not by being a cheaper resource.  That elasticity, which comes from the ability to dynamically allocate shared computational resources to provide services in proportion to demand — and just as dynamically deallocate them again — is what fundamentally drives up utilization by reducing over-provisioning.  It cannot eliminate over-provisioning (even Amazon needs to over-provision EC2 to accommodate fluctuations in demand, and you can bet that cost is passed on to users), but it can dramatically reduce it by eliminating application silos and changing the capacity planning equation from “what is the peak this application will ever need?” to “what is the peak this pool of applications will ever need?”  By pooling applications (and pooling resources), non-coincident demand peaks are handled for free.  The more applications and resources you pool, the smaller the over-provisioning factor, and the higher the average utilization.

Elasticity is the key benefit from internal or private clouds as well.  Cassatt (my benevolent employer) makes internal cloud-enabling software that dynamically allocates servers from a shared resource pool to applications in proportion to service demand.  Just like Amazon, over-provisioning cannot be eliminated, but it can be dramatically reduced, and utilization correspondingly increased, slashing the cost of IT.  Increased utilization makes all the other benefits of cloud computing, like business agility and infrastructure resiliency, basically free (as in “free beer”, not “free speech”).

2 Comments

  1. Cloudwatcher wrote:

    It’s my guess that improved “average server utilization rates” are not a measurement worth pursuing — ever! I know this is heresy in the cloud but you are dead right: it’s the service not the server that matters.

    What really matters is how fast an enterprise can add servers to - or release them from a service/application. (Has anyone thought to popularize such a measurement?) This sort of dynamic provisioning will invariably cause server utilzation rates to rise. More importantly the current availabiltiy threshold of the service will not only be protected but it should be vastly improved.

    I’m not sure how focusing on server utilization as an end goal will lead to anything but false positives while jeopardizing a service’s availability.

    Monday, March 30, 2009 at 10:35 am | Permalink
  2. Steve O. wrote:

    Thanks, Cloudwatcher, for the comments. I can’t dismiss “average server utilization” as a valid metric, but do think we agree that mindless pursuit of increased utilization alone could foolishly sacrifice service levels, which are way more important as they directly support the operation of the enterprise, while increasing utilization just saves some money.

    I couldn’t agree more with you that dynamic scaling of services — “agility” — is what really matters. Increased agility increases business competitiveness and can help grow the top line — again, way more important than merely reducing the IT budget.

    Tuesday, March 31, 2009 at 6:14 am | Permalink

Post a Comment

You must be logged in to post a comment.