Skip to content

500 Words Or Less

Funny how seeing your start-up nearly do a convincing imitation of a smoking crater, having an enormous IT management software company swoop in at the 11th hour to make a surprising glove save (only spilled a little), and making the personal transition from a company of O(100) individuals or fewer at any time to one with over 13K employees makes time fly by almost unnoticed.  I am shocked and chagrined to see how long it’s been since I’ve posted.  Nothing to be done for it now, of course, but climb back in that saddle (I’m losing count; is that three or four metaphors in the mix?) and write.

Good news for me, though: Despite the elapsed time, cloud computing is still in its relative infancy, plenty yet to figure out, so at least I don’t have to change the theme of this blog.

So, writing again, but baby steps first:  Here’s a pointer to a post I guest-wrote for a new MITRE cloud blog, part of a monthly series where they ask a question (this month it’s “what do you perceive as the most significant concern for federal organizations who want to use cloud computing?”) and multiple correspondents from industry take a whack at an informative answer.  Fortunately for me, it’s multiple choice, though I was hoping for true/false to improve my odds of getting it right.  Who is MITRE?  From their web site: “As a public interest company, MITRE works in partnership with the government applying systems engineering and advanced technology to address issues of critical national importance.”  You can check them out in general at MITRE.org, or read their new “Ahead In The Clouds” blog (and my modest contribution — rules stipulate 500 words or less — for January) here.

(Did I mention I work for CA now as a “Distinguished Engineer” (”distinguished” means I’m old) working on cloud strategy and technology?  How embarrassing to have to read my own ancient posts to see where I left things…)

Everybody Lies

It’s no secret that Cassatt (my benevolent employer) has been struggling to make a go at selling internal cloud management software.  Our CEO, Bill Coleman, is on record stating that we ”vastly underestimated the social and cultural challenge of Cassatt.”  Surprisingly, this “challenge” often first manifests itself, not as a conceptual “value of cloud computing” sort of issue, but instead in the degree of difficulty assessing the starting point, the current condition and usage of the data center in question, and this lack of detailed self-knowledge can slow or completely stymie efforts to follow a successful PoC (Proof of Concept) technology demonstration with a fruitful deployment into production.  

How does this happen?  How can an IT organization, having recognized it has a data center efficiency problem (low utilization, high costs, rigid/brittle architecture, out of power/cooling/space, etc.) and vetted and tested potential solutions (and I’m not just talking cloud infrastructure management technology) to its satisfaction, find itself unable to productively roll it out where it will do the most good?

Perhaps a clue can be found on the tube (you know, television… like Hulu, only with more commercials).  The favorite saying of fictional TV doctor/diagnostician extraordinaire Gregory House, MD, is “everybody lies”.  In his case(s), he’s talking about patients’ universal propensity for falsifying or withholding critical medical history or symptom information that would be invaluable for diagnosing their disease, their untruthfulness generally attributable to protecting self-interest or avoiding embarrassment.  

Sometimes, life imitates art (if one is willing to call television “art”, or, for that matter, IT infrastructure management “life”).  As in any “House” episode, improving data center efficiency also involves a diagnosis phase, requiring collection of important deployment and operational data about the organization and utilization of resources and applications already running in the data center.  Whether the intended medicine is virtualization, automated provisioning, active power management, creation of a full-on internal cloud, or merely to discover and surgically remove orphan servers, like Dr. House’s experience, discovering exactly what is running in the data center, what it’s running on, how the pieces are all connected, who’s using what, and how much can be a surprisingly difficult forensic exercise — and for reasons Dr. House would find familiar.

Take a look at these real-world examples of the kinds of roadblocks we’ve encountered as IT tried to collect and analyze their applications and environments:

  • Misleading:  Customer A, considering active power management in their data center, polled server/app owners on what periods of time their servers might be unused and could be powered off.  Cross checking revealed contradictory answers for the exact same apps/servers that varied so wildly that all data had to be thrown out.
  • Uncooperative:  Even though Customer B had a leading monitoring tool already in use in the data center capable of collecting the desired utilization and deployment data, the efficiency task force that wanted the info didn’t “own” the tool and couldn’t get the monitoring tool admins to turn it on everywhere or collect/report on results.
  • Confidently wrong:  In Customer C’s dev/test data center, management claimed that their global outsourced development team made round-the-clock use of the systems, handing off servers from one shift to the next and leaving virtually no opportunity for savings.  A subsequent Active Profiling engagement revealed a large fraction of orphans, with the majority of all systems idle most of the time.
  • Dodging the question:  Customer D’s multi-data-center IT transformation effort, resourced by external professional services hired guns, planned a comprehensive database of servers/apps to guide server consolidation and deployment of virtualization, automated provisioning, and dynamic allocation.  Their multi-page survey elicited few responses, and those few collected were inconsistently and sparsely filled out.  End result: the entire transformation effort finally ground to a halt due to their inability to determine what to do where first.  

As you can see, rarely is an inability to glean facts the result of anyone’s direct intent to mislead.  Usually, it’s more like Obi-Wan Kenobi’s lame rationalization to an outraged Luke for obscuring the relationship between young Skywalker and the dark Lord Vader; people describe the world using statements that are “true, from a certain point of view”, and often can’t even see the need for the qualifier.  This is one reason science (the practice of finding facts, not truth) can be so difficult.  People aren’t machines (a good thing, but sometimes problematic) and aren’t calibrated.  As I’ve said before, our perceptions are generalized and heavily influenced by our filters, and when we make assertions based on those generalizations, it’s probably not even fair to expect them to represent the facts, and too judgmental to call them “lies” just because they are erroneous.

Lessons learned?  Data center efficiency projects need an accurate map and reliable profile of resources, apps, and utilization to guide implementation.  Don’t underestimate the difficulty of obtaining that detailed info, nor the social and organization hierarchy roadblocks that can defeat seemingly-reasonable efforts at data collection.  If possible, find a passive way to capture information that doesn’t require cooperation of app/resource owners (or maybe even entrenched IT operators and their existing tools) or rely on human estimates.  They’re probably wrong, possibly lying (but then, isn’t everybody?).

The Cloud’s Green Lining

[My apologies for the April posting hiatus.  You may have heard that Cassatt, my benevolent employer, may be "nearing the end".  The best horror movie scripts consist of prolonged uncertainty and suspense punctuated by frequent protagonist near-death experiences and oft-revived monsters.  I don't know how it's going to turn out, but it's been a great movie so far, and we aren't dead, yet.]

University of Michigan and Carnegie Mellon University researchers recently presented a paper at ASPLOS ’09 (the annual Architectural Support for Programming Languages and Operating Systems conference) in March proposing “PowerNap”, a dynamic and very fine-grain approach to turning off servers for idle periods as short as a second or less, making use of “sleep” capabilities built into most server components derived from commodity desktops.  Unlike personal computers, they propose automating activation of server “sleep” cycles between tasks.  Their hardware approach was guided by actual traces of utilization for general IT server applications (e.g., email) and “Web 2.0” applications.  

I love out-of-the-box thinking (even when the thinking is about, well, boxes).  I think the paper is clever and like how the authors expose some ugly truths about power efficiency in servers, in particular, the routine over-design of power supplies to handle peak loads when fully populated with disks, memory, NICs, etc., and the resulting horrible efficiency in more common lightly-populated configurations (hmm, over-provisioning for peak loads results in low efficiency…  where have I heard that before?).  Better read the fine print (or at least the power supply efficiency curves) next time you think you’re buying a green server.

Skeptic that I am, I wonder about some of the data they collected to fuel their efficiency models (I suspect round-robin load balancer distribution of at least the Web 2.0 workload artificially made the traffic they were recording look more fine grain than may necessarily be the case) and I expect “PowerNapping” servers may take a while to materialize in your favorite commodity server provider catalog.  However, it is undeniable that wide-spread availability of commodity servers that would automatically “power nap” between transactions could save big on energy costs in even traditionally-managed data centers.  

Fortunately, there is no need to wait.  While not as fine grain as “PowerNap”, adopting cloud computing, internal or external, saves energy and reduces carbon footprint in a variety of ways, and is accessible today:

  • Pooling/sharing resources increases utilization:  Using an external cloud or creating an internal cloud from existing data center resources implicitly means multi-tenant sharing of a pool of virtual and physical resources.  Sharing is good, driving utilization of resources higher than if applications remained isolated in over-provisioned silos.  Higher utilization means fewer resources necessary to provide equivalent levels of services, saving energy and the carbon cost of the resources themselves.  Of course, unallocated resources in the pool can be kept powered off until required. 
  • Automation reduces IT staff:   IT departments don’t like to hear it and vendors don’t like to say it, but cloud computing requires fewer people than traditional IT methods, and the remaining people need to be able to think and act more like managers than admins.  Clouds are dynamic and allocation/metering of resources must be automated to be practical at anything beyond the smallest scale.  Automation increases the number of servers and scale of capacity that can be administered by a person.  This not only reduces the number of people needed to provide a given level of capacity, saving money and the relative carbon footprint of IT, but it makes the jobs of IT more interesting — and less dangerous (studies show that most service outages are due to human error, just like HAL said) as well.  Again, this is true of internal and external clouds, from the perspective of the data center operators.  If you’re using an external cloud, you benefit from the automation of your outsourced IT capacity (well, perhaps not you, particularly if you’re one of the people displaced during the automation of your data center). 
  • Cost transparency encourages thriftiness: Internal cloud granular reporting/metering/billing of dynamically-allocated resources illuminates the wasteful hiding places in traditionally-managed data centers.  Like corruption and mold, waste can’t stand the bright light of day.  Orphan servers and underutilized resources are assimilated into the pool and can be automatically deactivated if not in use.  Of course, external clouds expose costs even more directly, by eliminating capital costs and billing directly only for the resources as they are consumed.

The good news is that these cloud computing infrastructure benefits are compatible and synergistic with hardware efficiency efforts, even those as extreme as PowerNap.  Dynamic allocation and sharing of resources among applications and between tenants adds complementary operational and deployment efficiencies that optimize hardware use, regardless of its inherent energy efficiency.  By all means, buy the most efficient gear you can, then deploy it in a cloud.

Amazon Introduces Inelastic Cloud

Amazon announced a new pricing structure for EC2 last Thursday based on “reserved instances”, the ability for a customer to pay an up-front fee that will set aside a Linux/Unix AWS instance (Windows reserved instances not yet available) for 1 or 3 years.  In return for the one-time fee, reserved instances carry a per-hour price tag that is only 30% of the cost of the existing on-demand variety of EC2 instance.  On-demand instance availability and pricing remains the same.

Amazon says they created reserved instances in response to customer requests for lower prices in return for a long-term commitment, as well as customer calls for a way to guarantee instance availability, particularly for disaster recovery use cases.

If you run the numbers, you see that reserving and operating a reserved instance 24/7 for a year costs 67% of what an on-demand instance would cost for a year.  The higher up-front cost of a 3-year commitment is amortized over a longer period, so 24/7 operations for 3 years would cost only 49% if using reserved instances instead of on-demand instances.  Actual savings would be somewhat less, as Amazon’s separate charges for bandwidth, storage, and IP addresses are the same for reserved and on-demand instances.  

It’s interesting to note that the cost savings ratios are exactly the same for all 5 instance sizes/prices (standard small/large/extra-large and high CPU medium/extra-large), perhaps providing hints of Amazon’s underlying cost structure.  By guaranteeing “there’s no chance of encountering any transient limitations in EC2 capacity” for reserved instances, Amazon is — at least implicitly — promising not to over-sell available reserved capacity, so the fixed-cost portion of the price should approximate Amazon’s margined TCO for a server (scaled by how they define a “Compute Unit” and the number of Compute Units provided by the instance type).  Hint or not, the announcement is providing more fodder for those arguing the nitty-gritty of the costs of external clouds vs. DIY (e.g., see Gartner’s Lydia Leong’s cautionary post).

Last week, I talked about the cost of cloud computing, arguing that the largest factor in the IaaS cost equation, for internal (private) or external (public) clouds (e.g., Amazon), is how efficiently they are employed, their utilization.  Chronic low utilization, the shame of traditionally-managed data centers world-wide, is pretty much the result of over-provisioning, reserving more capacity than an application or service needs at a given point in time, more capacity in the form of a larger-than-necessary server (for single-server apps), or in more servers than necessary (for distributed apps).  Reduce over-provisioning, and you probably increase utilization (let’s not quibble over corner cases) and do something good for data center TCO.

From the start, Amazon EC2 offered solutions to reduce both single-server and distributed over-provisioning, by providing varying-capacity individual servers as well as on-demand pay-as-you-go provisioning of servers from a shared resource pool.  However, reserved instances are something of a step back from both.  

Amazon suggests we should “think of the one-time fee as somewhat akin to acquiring hardware”, and it is, with the same kinds of limitations.  Reserved instances must be purchased/located in a particular “availability zone” (think “data center”) and region (though availability zones are US-only, so far).  Unlike on-demand instances, which can be launched in any zone, a customer must use the particular reserved resource in the original zone for which it was purchased, and there is currently no way to relocate reserved instances, so customers designing DR scenarios or trying to locate services near regional consumers should plan carefully.  Once the instance is purchased, it’s locked in place.  

Similarly, unlike on-demand instances, a reserved instance is what it is (i.e., standard small/large/extra-large or high CPU medium/extra-large).  Once purchased, the instance type cannot be changed.  If appetite for your application grows to require a larger server, or the server proves too large, reserved instances can’t be traded for a more appropriately sized resource the way on-demand instances can.  In this respect, Amazon’s resources fall further behind the arbitrary granularity of virtualization density that can be achieved with privately-operated servers.

While reserved instances and their hybrid up-front/by-the-hour cost model, used appropriately to host the most predictably-constant, high-utilization applications, can lower the cost of cloud computing (at least Amazon-based cloud computing), the inelastic aspects may be better for Amazon than for users.  Amazon gets more predictable capacity planning and revenue, and probably captures additional customers.  Users get a guarantee of instance availability (I understand the importance for a particular set of use cases, but it makes me wonder how often on-demand instance launch failures happen) and a lower price on any workload that consumes an instance more than 50-67% of the time, but only by sacrificing elasticity.  I would rather have seen a model that enabled more rather than less elasticity, like utilization-based pricing (e.g., if I’m only using 10% of the capacity of a $.10/hour instance, only charge me a penny/hour).

Werner Vogels, Amazon’s CTO, notes that reserved instances offer IT shops thinking about a move to cloud computing “a transition model that is closer to their current strategy”.  It’s possible that might not be a good thing.  “Current strategy” has produced the traditional IT management policies and practices that have filled data centers with under-utilized, over-provisioned application silos.  Should we be surprised if reserved instances entice some organizations to waste as much or more money in the cloud as they do today in their own data centers?

The Elephant in the Computer Room

I was sitting next to Jay Fry in Las Vegas (not at the tables, honest), listening to Tom Bittman’s keynote opening the Gartner Data Center Conference in December, when Tom said “[according to Gartner's analysis], if you fully utilize your own equipment, Amazon [EC2] will cost you twice as much.”  Jay kept on furiously taking notes, but I missed the next few minutes of Tom’s speech.  I was thinking “boy, that’s a big ‘if’.”

Analyses like Gartner’s have sparked many discussions on the true cost of IaaS vs. the true cost of operating your own gear, or even outsourcing operations.  A recent example of the math can be found in CIO’s Bernard Golden’s fourth part (of six, concluded this week) in the series “The Case Against Cloud Computing” (which is really about making the case for cloud computing by examining critics’ arguments then offering refuting remarks).  Bernard relays a couple of calculations of the TCO of Amazon EC2 large and small instances, summing to at-first-blush large amounts, and then advises cloud shoppers to “do the math correctly”, meaning (I take it), correctly account for all the costs of running your own gear or outsourcing, and implying that, if you are honest with your corporate self, you’ll see that Amazon’s EC2 servers are really not that expensive after all.

I think Bernard strikes a resonant note when, at the end of the article, among “the cloud cost advantages”, he lists things like “the pricing is transparent” and “the pricing is fixed”.  Business accounting sometimes seems deliberately designed to be opaque and complex.  Often IT-related costs are concealed in non-IT cost centers.  IT rarely gets the data center electric bill, for instance, instead it goes to Facilities, as does the cost of the computer room, HVAC, UPS, etc.  Sometimes it seems like accounting just punts on trying to figure out where costs should be charged, instead uniformly allocates them according to non-usage-related financial accounting formulae across cost centers.  For example, in budget-speak, every employee comes laden with “burden”, the allocated cost of the real estate occupied by their office and common areas, GA “overhead” like HR/accounting/receptionists/security/maintenance, etc.  And to further complicate the maze, capital costs — the cost of the assets themselves — are subject to arcane depreciation and cost-of-capital machinations by the financial wizards, making understanding “true” IT cost for a given server also subject to choices a business makes about tax treatment of assets (e.g., depreciation schedule and method), and how they pay for them.

By comparison, how refreshing to just get an invoice with a bottom-line number from your external cloud IaaS supplier, even if it might be larger than the cost of buying and running it yourself.

But Bernard’s sources are splitting hairs when they argue about the cost of servers, owned/operated by IT or rented from the cloud.  A more important question is, how much does the consumable application or end-user service provided by those servers cost?  That’s what the enterprise really cares about, and this higher-level view exposes a factor that overwhelms any of the cost-of-server factors: Utilization.   Remember that Tom Bittman’s statement was qualified “if you fully utilize…”  That “if” dominates the cost-of-service calculation and it applies to usage of external cloud IaaS sources like Amazon, as well as internal or private resources.  Utilization is the elephant in the computer room. 

Think about it.  No matter how we reasonably construct an equation for the internal cost of a server, the true cost of the end-user service or application is proportional to how efficiently IT converts that server capacity into useful work.  Utilization is a good measure of the efficiency of this conversion, and it has a huge effect on the cost of IT.  If you average 20% server utilization, for example, it’s a 5x potential multiplier on the sum of your server costs.  If you average 10% (or less — you know who you are), it’s a 10x multiplier.  So what if you get an additional 20% discount on new servers from Dell?  A drop in the bucket.  Doubled the number of servers each admin can manage?  Big deal.  Double your utilization and you can cut your IT budget in half without affecting service levels.

So, does the cloud help your utilization?  Maybe, but not because of the cost or pricing structure. Computational IaaS is sold by the glass, not by the drink.  It doesn’t matter if you quaff it dry or just sip the suds at the top, you pay the same amount.  

For instance (regrettable pun intended), Amazon doesn’t charge for EC2 by the CPU cycle or by the number of actual instructions executed (this lack of granularity is the key difference between storage and computation economics).  Instead, they charge by the instance-hour, a wall-clock-timed allocation of peak capacity that it’s up to the user to employ efficiently or wastefully, just like a real server you buy, plug in, and run yourself.  Low utilization, high utilization, Amazon doesn’t care.  You get the same bill either way, but that bill might be 5 or 10 times larger than it needs to be if your utilization is still running at industry standard averages.  

(If Amazon wanted to solve the server utilization problem at a stroke, they could charge for actual processor time instead of instance hours, the way time-share computers used to be billed.  If I’m only using 10% of a $.10/hour instance, only charge me a penny/hour.  Not much chance of that happening, but it’s not the only solution.)

The way that cloud computing can help utilization is by being an elastic resource, not by being a cheaper resource.  That elasticity, which comes from the ability to dynamically allocate shared computational resources to provide services in proportion to demand — and just as dynamically deallocate them again — is what fundamentally drives up utilization by reducing over-provisioning.  It cannot eliminate over-provisioning (even Amazon needs to over-provision EC2 to accommodate fluctuations in demand, and you can bet that cost is passed on to users), but it can dramatically reduce it by eliminating application silos and changing the capacity planning equation from “what is the peak this application will ever need?” to “what is the peak this pool of applications will ever need?”  By pooling applications (and pooling resources), non-coincident demand peaks are handled for free.  The more applications and resources you pool, the smaller the over-provisioning factor, and the higher the average utilization.

Elasticity is the key benefit from internal or private clouds as well.  Cassatt (my benevolent employer) makes internal cloud-enabling software that dynamically allocates servers from a shared resource pool to applications in proportion to service demand.  Just like Amazon, over-provisioning cannot be eliminated, but it can be dramatically reduced, and utilization correspondingly increased, slashing the cost of IT.  Increased utilization makes all the other benefits of cloud computing, like business agility and infrastructure resiliency, basically free (as in “free beer”, not “free speech”).

“Now, that’s a knife…”

Last week I focused (somewhat critically) on two cloud taxonomy/ontology proposals that had been kicking around, arguing that they both were neither taxonomy nor ontology, and in fact fell short of being very useful tools for categorizing, hence understanding, the organization of and relationships among the diverse entities in the cloud domain.  Well, I’m on vacation this week, but can’t help drawing attention to a new cloud taxonomy effort initiated by Jean-Lou Dupont.  The 451’s Rachel Chalmers called it “a real beauty”, and I’m inclined to agree — not because I think it’s complete or necessarily accurate, but because it takes a unique and so-far-surprisingly-productive approach.  Jean-Lou has cleverly created the taxonomy diagram as a wikimap in Mindmeister, allowing anyone to weigh in directly just by editing.   I grabbed the diagram during a few minutes of wifi access at the airport and immediately found it useful, even in it’s primitive early state.  I predict it will provide a framework for many interesting (and likely elevated-temperature) discussions and debates over coming weeks.  

It’s mainly taxonomy-by-enumeration, so far, with some branch labels more-or-less self-explanatory (SaaS, PaaS, IaaS, of course, and on sub-branches, SADIST-PIMP), while others have definitions and distinctions that are probably far from being agreed by consensus.

Jean-Lou’s map is growing rapidly (”probably tripled today”, he says in a comment to Rachel), and the wiki model makes it inclusive.  I have to believe there will come a time when this tree will sorely need pruning (criteria and candidates for cutting will be yet another hot topic, I’m sure, and perhaps there’s a point where moderation may be a good idea), but for now, I’d be content to let it grow organically as contributors add companies, products/technologies, and sub-categories, just as it’s good practice to be inclusive and accepting of (nearly any) proffered brainstorms in the early stage of any creative exercise.  There are obviously too many branches and leaves that are simply populated with company names under a single taxonomic label (I’m sure those companies will add distinguishing characteristics to differentiate themselves), and too few specific technologies as leaf cells, too little mention of distinguishing characteristics that might differentiate leaves that share a branch, but these are merely the visible signs of both the early stage of the map’s development and the rush to populate it.  

The taxonomy is also quite heterogeneous, including a nascent (heck, it’s all nascent) “community” family tree.  I’ll have to think about whether this “side band” of communications/analysis constitutes a worthy branch of cloud taxonomy (ideas as a service?), but it is an interesting touch, and I expect more surprises as different viewpoints are incorporated.

It’s going to be another week before I get more than transient internet access, so further exploration and experimentation will have to wait, but I’m already excited to see how far it’s expanded and refined when next I can grab an update.  Nice work Jean-Lou!

Cloud Burst

Recently, ex-Cassatter (”Cassattian”? “Cassattite”?) turned Ciscoer (uh… Ciscoan?), always-blogger James Urquhart called attention to both the need and a couple of proposals for cloud computing taxonomies/ontologies.  ”That would be nice”, I thought, because taxonomies and ontologies can bring a lot of clarity and precision to an otherwise murky, poorly-specified picture.  After taking a look at the proposals and discussion, I wondered if we were talking about the same things.

Taxonomies are frameworks for understanding the often-hierarchical organization of related “things”.  Probably the most famous and familiar taxonomy is the Linnaean taxonomy of life we all learned in school (you remember: kingdom, phylum, yadda, yadda, yadda, species, sub-species), but almost everything we have and do can be, and often is, organized into taxonomies.  With so many claimants touting cloud products and services, potential customers of those products and services (which include nearly everyone producing or consuming IT products and services) could probably use a comprehensive classification system to help put things in their proper context in the cloud ecosystem.  

Ontologies offer deeper context than taxonomies.  Where a taxonomy provides a classification system, a way of distinguishing and categorizing domain members, an ontology adds formal specification of relationships and interactions between members and classes that can be used to draw inferences beyond mere categorization.  Taxonomies often are expressed as trees or tables because they are, in essence, a sorting of members into smaller and smaller groups by successive application of more and more specialized criteria.  Ontologies may be expressed as more generally connected graphs or even in one of several formal specification languages.  For instance, the World Wide Web Consortium’s Semantic Web project has defined the web ontology language, OWL, as part of an ambitious effort to enable computers to use and “understand” (i.e., reason about) the web.  A true cloud ontology could enhance interoperability by specifying roles, relationships, and interactions between cloud domain members so completely and formally as to constitute (or at least facilitate creation of) APIs.  Cool, but much more difficult to achieve than a sort into taxonomic groups.

We already have one generally-accepted taxonomy in cloud-space, the SaaS/PaaS/IaaS, or “SPI” taxonomy.  It’s simple and clear, but it’s also informal and fails to account for many dimensions and elements of cloud-dom, including such “nuances” as delineating private or internal clouds from the external variety, providing places for components like service “governors”, and criteria for sorting into sub-categories of I, P, and S.  In fact, there are a lot more “aaS” categories out there.  David Linthicum convincingly lists 10 here (wisely, he doesn’t use the words “taxonomy” or “ontology”); his aaS “framework” includes storage, database, information, process, application, platform, integration, security, management, and testing — which William Vambenepe reorders and wickedly labels “SADIST-PIMP”.  Of course, many of these unmapped aspects of the cloud domain are also controversial and/or rapidly-evolving.  That makes figuring out whether they fit in a general cloud taxonomy – and if so, how — all the more important. 

So, do the new proposed candidates for a cloud taxonomy/ontology help?  Sadly, not really.

The Youseff/Butrico/Da Silva paper, somewhat over-titled ”Toward a Unified Ontology of Cloud Computing“, with authors from UCSB and IBM, falls far short of SADIST-PIMP as an enlightening taxonomy, much less blazing a trail toward an applicable ontology.  In addition to the subdivision of IaaS into “computation resources” (itself recursively labeled IaaS), “storage”, and “communications” (aren’t storage and communications also infrastructure?), the proposal adds two additional, somewhat discordant, layers to the standard SPI taxonomy: “kernel” and “HaaS” (hardware as a service).  Kernel refers to any and all software management of the underlying hardware, including hypervisors and operating systems, but the authors focus most on grid middleware, like Globus, as representative of this layer.  HaaS is epitomized, according the the authors, by hardware leases containing SLA terms, but the reference (a CNET article written from IBM’s press release) describes the complete outsourcing of Morgan Stanley’s IT to IBM.  The article calls it utility computing, and it may be (Morgan Stanley’s apps and infrastructure were all moving to centralized IBM data centers), but it doesn’t look much like HaaS.  Overall, the paper offers little new and useful classification structure and the embellishments actually detract from the clarity of SPI.

Chris Hoff, inspired by SADIST-PIMP and the Youseff/Butrico/Da Silva paper as reported by John Willis, bravely hosted something of a community effort at a “cloud taxonomy and ontology”, seeded by his own “mashup” of the predecessor material but withholding most explanations that might have clarified some otherwise very professional-looking illustrations.  Like the Youseff/Butrico/Da Silva paper, the net result is something of an embellishment of SPI, but perhaps a bit more useful as it also contains a representation of a cloud delivery stack (though not necessarily a correct or complete one).  Again, despite the title, it doesn’t really constitute a system of classification, so it’s hard to claim it’s a useful taxonomy (beyond the embedded SPI taxonomy aspect), and (probably inherently, as it is only an illustration and not a specification) it is not an ontology.  I might find more constructive criticism to offer, but the dearth of description and discussion of what it really means (beyond the blog’s comments, which were apparently truncated by TypePad) make the diagram something of a Rorschach test.  Anyone discussing it may be revealing more about themselves than what the concepts suggested by the diagram might actually mean.

I can’t blame the authors for not producing more useful tools.  If he were alive, I’m certain Douglas Adams would describe the width and breadth of cloud computing as “big; vastly, hugely, mind-bogglingly big”.  Like real clouds (the water-based atmospheric phenomena), it’s not a static thing, and there is tremendous variety as well as lots of confusing things that appear cloud-like, but arguably may just be smoke.  I think a genuinely useful cloud taxonomy gets built by first enumerating the fundamental classification principles and defining distinguishing attributes (something I’ll take a whack at in a future post; in the mean time, what do you think that list might include?), then by parsing the membership of the cloud domain, adjusting and adding principles and attributes as necessary, and as we learn and the cloud evolves.  Taxonomy isn’t a static thing, either.  Ontology, in my opinion, will just have to wait.  The general, formal specification of the many manifestations of clouds and their components (and, by association, definition of generic cloud APIs) is premature.  Things need to condense more before ontology will be a productive exercise.

I do think some blame (a mild chastisement) is owed to anyone participating in the cloud taxonomy conversation that is not exercising appropriately-high levels of skepticism and insisting on well-defined and valid standards in their frameworks.  Taxonomies are thought-shaping tools and bad tools make for bad thinking.   One commenter on one of the many blogs echoing/amplifying the taxonomy conversation remarked that some of the diagrams were mere “marketecture” and others warned against special interests warping the framework to suit their own ends.  We should all be such critical thinkers.

The Cloudology Manifesto

I’m a skeptic about cloud computing (if you’re new to cloud computing, check out Wikipedia’s pretty-good definition — watch out for occasional gopher holes in the rest of the article, but hey, it’s Wikipedia — and sample some of the many fine cloud-related blogs).  In fact, I try to be a skeptic about most things, most of the time, though (like most well-intentioned people) sometimes I fail.  

Defining what I mean by “skeptic” can be as subtle as the Free Software Foundation’s definition of “free” (”free” as in “free speech”, not “free beer”).  I’m not a pessimist (certain not about cloud computing), nor do I have anything against free beer.  The definition of skeptic to which I try to adhere is not “one who disbelieves”, but “one who applies rigorous principles of critical thinking” — sorry, not nearly as catchy as “free speech/not free beer”, but then I’m not Richard Stallman.

The rigorous principles of critical thinking are what is applied when following the scientific method, when carefully avoiding logical fallacies, when consuming media without being credulous, when mistrusting one’s own senses, emotions, and reactions, when keeping an open mind (even about one’s publicly-stated positions), and when divorcing one’s ego from one’s arguments and having the guts to reverse an opinion in the face of valid contrary evidence.  Critical thinking is being critical about one’s own thinking, which helps one examine the external world more accurately.  As my Communications 101 professor stated, it’s about having a sharpened “bullshit detector”, but it’s also about pointing it at yourself and even at the positions you hold dear.

I suck at critical thinking.  Almost everyone does.  Human brains are analog computers, much as we like to describe their functions today using terms from digital computing (a friend once ruefully described herself as  ”a 16-bit processor in a 32-bit world”, which, of course, also dates the exchange).  Analog computers suck at precise computation, but they can be absolutely fabulous at pattern recognition and classification, generalization, and approximating solutions to algorithmically-intractable problems.  Those strengths can also be fatal weaknesses.  The human brain is the result of billions of years of evolution, the most-fit survivor of competitive environments that bear little resemblance to today’s technologically enhanced society.  Our wetware was shaped by the need to find food, reproduce, and physically out-compete our predators and neighbors, not drive cars, program VCRs, or design and operate efficient IT-based applications and services infrastructure.

Worse, our poor analog computers, constantly struggling to analyze, sort, compress and re-compress, and imperfectly store a few representative bits out of a constant gigabytes-per-second stream of real-time sensory data from billions of analog input sensors, are swimming in a polluted soup of emotion-triggering hormones and self-generated mood-altering chemical messengers perpetually interfering with whatever approximations of logical high-order thought we manage to muster.  It’s a wonder we can think at all, and no wonder so much of human society is so goofy.  Pay no attention to that man behind the curtain, he’s having a continuous nervous breakdown.

Being so poorly equipped for logical thought, it’s amazing how well we’ve done as a species, augmenting our innate capabilities with language, mathematics, science, and technology.  It’s also not surprising, however, that even a field as fundamentally based on logic as computing (what is more irreducibly logical than the binary codes and Boolean logic at the heart of our digital world?) should be also subject to the same excesses of enthusiasm, hyperbole, and unexamined certitude we find in the rest of human society?  Even mathematicians can be zealots, why should IT be immune?

Which brings me back to cloud computing.  If you’ve been paying more than cursory attention recently, I’ll bet you’ve seen at least one breathless headline, press release, or blog entry that has raised your eyebrow or caused your own bullshit meter to bounce, even just a little.  I, myself, am of the opinion that cloud computing is both the greatest thing for IT since diced silicon, and (too often) look like an overinflated volume of insubstantial vapor.  Cloud computing is new, and that’s sparked something of a land rush to stake out market- and mind-share.  On the other hand, cloud computing encompasses many not-new predecessor concepts, like utility and grid computing, SOA, and Web 2.0., and there is an on-going struggle to work out just how and if it all fits together.  There are True Believers claiming practically everything is cloud computing (buy your toilet paper from Amazon?  That’s TPaaS!).  At the other end of the spectrum, there are those that say it’s all hype, all vapor (clouds are both a compelling and, sometimes, unfortunate metaphor).  Both are probably wrong, at least to a degree, and — like judging Olympic skating — we would be wise to be suspicious of both excessively high and ridiculously low scores.

I work for a company that’s arguably been working on cloud infrastructure management software for over 5 years (though we didn’t realize that’s what it was until recently :^) and I’m doing everything I can to help propagate the cloud computing wave, yet at the same time, in conversations with customers and others, I often struggle to try to correct misconceptions and deflate overhyped expectations.  The efficiency, agility, and accountability that cloud computing can bring to an organization are incredibly valuable; it’s not just about saving money, it’s about growing the top line.  The potential pain, expense, and opportunity cost of a misguided or inept cloud computing adoption attempt is almost unbounded, and even doing nothing could cost you your business in the end as you are outcompeted by successful cloud adopters, efficiently running their company on lower cost, tighter turning, higher capacity IT infrastructures.

My goal with this blog is to exercise good skepticism, examining cloud computing with a thoughtful yet questioning eye as the wave continues to build and ultimately sweeps over IT (I almost said “crashes on the beach of IT”, but “sweeps” sounds more survivable, doesn’t it?).  I do have a particular PoV that’s been shaped by my history, but I promise to try to watch my own processing as closely as I examine others’.  While I am in the industry, I won’t be an unquestioning fanboy, but I won’t be a John C. Dvorck, either.  I hope you enjoy what I hope will be a stimulating conversation as we practice “cloudology” together.