It’s no secret that Cassatt (my benevolent employer) has been struggling to make a go at selling internal cloud management software. Our CEO, Bill Coleman, is on record stating that we ”vastly underestimated the social and cultural challenge of Cassatt.” Surprisingly, this “challenge” often first manifests itself, not as a conceptual “value of cloud computing” sort of issue, but instead in the degree of difficulty assessing the starting point, the current condition and usage of the data center in question, and this lack of detailed self-knowledge can slow or completely stymie efforts to follow a successful PoC (Proof of Concept) technology demonstration with a fruitful deployment into production.
How does this happen? How can an IT organization, having recognized it has a data center efficiency problem (low utilization, high costs, rigid/brittle architecture, out of power/cooling/space, etc.) and vetted and tested potential solutions (and I’m not just talking cloud infrastructure management technology) to its satisfaction, find itself unable to productively roll it out where it will do the most good?
Perhaps a clue can be found on the tube (you know, television… like Hulu, only with more commercials). The favorite saying of fictional TV doctor/diagnostician extraordinaire Gregory House, MD, is “everybody lies”. In his case(s), he’s talking about patients’ universal propensity for falsifying or withholding critical medical history or symptom information that would be invaluable for diagnosing their disease, their untruthfulness generally attributable to protecting self-interest or avoiding embarrassment.
Sometimes, life imitates art (if one is willing to call television “art”, or, for that matter, IT infrastructure management “life”). As in any “House” episode, improving data center efficiency also involves a diagnosis phase, requiring collection of important deployment and operational data about the organization and utilization of resources and applications already running in the data center. Whether the intended medicine is virtualization, automated provisioning, active power management, creation of a full-on internal cloud, or merely to discover and surgically remove orphan servers, like Dr. House’s experience, discovering exactly what is running in the data center, what it’s running on, how the pieces are all connected, who’s using what, and how much can be a surprisingly difficult forensic exercise — and for reasons Dr. House would find familiar.
Take a look at these real-world examples of the kinds of roadblocks we’ve encountered as IT tried to collect and analyze their applications and environments:
- Misleading: Customer A, considering active power management in their data center, polled server/app owners on what periods of time their servers might be unused and could be powered off. Cross checking revealed contradictory answers for the exact same apps/servers that varied so wildly that all data had to be thrown out.
- Uncooperative: Even though Customer B had a leading monitoring tool already in use in the data center capable of collecting the desired utilization and deployment data, the efficiency task force that wanted the info didn’t “own” the tool and couldn’t get the monitoring tool admins to turn it on everywhere or collect/report on results.
- Confidently wrong: In Customer C’s dev/test data center, management claimed that their global outsourced development team made round-the-clock use of the systems, handing off servers from one shift to the next and leaving virtually no opportunity for savings. A subsequent Active Profiling engagement revealed a large fraction of orphans, with the majority of all systems idle most of the time.
- Dodging the question: Customer D’s multi-data-center IT transformation effort, resourced by external professional services hired guns, planned a comprehensive database of servers/apps to guide server consolidation and deployment of virtualization, automated provisioning, and dynamic allocation. Their multi-page survey elicited few responses, and those few collected were inconsistently and sparsely filled out. End result: the entire transformation effort finally ground to a halt due to their inability to determine what to do where first.
As you can see, rarely is an inability to glean facts the result of anyone’s direct intent to mislead. Usually, it’s more like Obi-Wan Kenobi’s lame rationalization to an outraged Luke for obscuring the relationship between young Skywalker and the dark Lord Vader; people describe the world using statements that are “true, from a certain point of view”, and often can’t even see the need for the qualifier. This is one reason science (the practice of finding facts, not truth) can be so difficult. People aren’t machines (a good thing, but sometimes problematic) and aren’t calibrated. As I’ve said before, our perceptions are generalized and heavily influenced by our filters, and when we make assertions based on those generalizations, it’s probably not even fair to expect them to represent the facts, and too judgmental to call them “lies” just because they are erroneous.
Lessons learned? Data center efficiency projects need an accurate map and reliable profile of resources, apps, and utilization to guide implementation. Don’t underestimate the difficulty of obtaining that detailed info, nor the social and organization hierarchy roadblocks that can defeat seemingly-reasonable efforts at data collection. If possible, find a passive way to capture information that doesn’t require cooperation of app/resource owners (or maybe even entrenched IT operators and their existing tools) or rely on human estimates. They’re probably wrong, possibly lying (but then, isn’t everybody?).
Post a Comment