CIOTechOutlook >> Magazine >> March - 2015 issue

Tiered Infrastructure Maintenance Standards (TIMS) for Data Centre Operations

By

Billions of dollars have been spent building highly redundant data center facilities in order to deliver high availability IT solutions to an increasingly information reliant world. These large investments have produced a variety of sophisticated facility infrastructure designs that are inherently reliable and progressively more energy efficient. However no facility design, regardless of how well planned and constructed, can withstand the disruption of an improperly implemented Operations and Maintenance (O&M) program. Poor maintenance and risk mitigation processes can quickly undermine the facility design intent. It is therefore crucial to understand and evaluate how O&M programs are organized to achieve the level of performance for which the facility has been configured.
It can be exceedingly difficult for non-maintenance professionals to evaluate the quality and effectiveness of their data center maintenance program. The presence of activity by qualified individuals is not in itself a reliable indicator. Practices that deliver good results in non-critical facilities are not always suitable for high-availability environments, but may appear adequate until a service outage occurs.
In an effort to overcome this barrier, it is useful to develop a set of standards (mentioned below) for describing maintenance levels that is specific to data center operations, and other mission critical facilities.
Four Tiered Infrastructure Maintenance Standards (TIMS) been established:
• TIMS-1: Run to Fail
• TIMS-2: Unstructured
• TIMS-3: Structured
• TIMS-4: Facilitated

TIMS-1 Run to Fail
This level of service reflects the old adage, “If it isn’t broken, don’t fix it.” Maintenance is purely reactive at this level; when equipment fails, a technician is summoned to perform the repair. In areas where the system has redundancy, there may be little or no impact to the critical load for an isolated failure.
Operating at TIMS-1 implies that the perceived cost of an outage is low compared to the cost of preventative maintenance. However, any perceived short-term savings in maintenance costs will likely be overshadowed in the long run by more costly outages and expensive repairs.

TIMS-2 Unstructured Maintenance
TIMS-2 maintenance is characterized by the performance of routine preventative maintenance tasks without an overlying set of processes and procedures to ensure effectiveness and predictability. The fact that it is commonly performed by qualified manufacturer’s service representatives or trusted in-house technical staff can create a false sense of security. This approach may deliver adequate results in some environments, but do not meet the expectations of mission critical data centers. Unfortunately, this level of service is the industry norm. Service contracts for preventative maintenance are commonly low bid with the difference being made up on follow-up corrective maintenance work, which is lucrative.

TIMS-3 Structured Maintenance
Structured Maintenance is designed to maximize uptime by removing guesswork and minimizing the negative effects of human error. TIMS-3 level maintenance is a complicated task that requires discipline and experience to execute. Each component of the maintenance process is closely controlled; policies are established to control how information is gathered, acted upon and recorded, precisely managing how and when work is performed. Identifying and training qualified personnel is part of a formal program, as is supervision and performance evaluation.
Importantly, a facility with a high Uptime Tier rating is not required to enact a Structured Maintenance program. Rather, the critical systems must simply be maintained to the program standards.

TIMS-4 Facilitated Maintenance
Facilitated Maintenance is the highest level of maintenance service. It combines a Structured Maintenance program with a system topology that facilitates maintenance by providing multiple power and cooling distribution paths with redundant components. Such a design allows individual pieces of equipment to be isolated and maintained without a disruption in services.
Data center operations achieve the highest possible level of reliability for their assets when Structured Maintenance is performed in this environment.
Now that you have a framework for evaluating maintenance effectiveness, how should you utilize it? If you have an existing data center (or centers), the first step is to perform a detailed evaluation of the O&M program. This can take some time, because it needs to be comprehensive and detailed to be accurate.
The next step is to correlate the level of maintenance with the acceptable level of risk for the facility. A low Tier, geographically redundant facility may tolerate a less stringent level of maintenance than a high Tier facility with very high availability requirements.
Whether creating an Schneider Electric IT Business. program for a brand new facility or upgrading an existing program, Scope, budget, Skills and Impact are considered as fundamental pointers.

Conclusion
When evaluating the health of the mission-critical enterprise, the effectiveness of the maintenance program is one of the key components that must be factored-in to determine the true measure of sustained reliability. Tiered Infrastructure Maintenance Standards offer a systematic approach to aligning the operations and maintenance effort with the business goals of the data center.

CXO Insights

Where is Your Slice of the 19 Trillion Dollar...

By Bibhuti Kar, Sr Director, Engineering (Security Technologies), Cisco

Migrating to the Cloud - Why you Should...

By Pankaj Sabnis, Principal Architect – Cloud Computing, Infogain

The Challenges are yet to be addressed

By Virendra Raj, Vice President & Head Information Technology, Lava International Limited

Facebook