The data center environment, which has traditionally operated in silos and been fairly static, is undergoing a business-driven evolution to become a very dynamic, interconnected ecosystem. This change is the result of the rapid adoption of technologies like virtualization, cloud computing and fabric based computing. An additional, more intriguing driver is the need to align IT more closely with the business and enable a true IT services catalog approach with Business Service Management.
While IT systems, technologies and software are responding fairly well to the need to become dynamic, interconnected and break down the silos, the Facilities systems continue to remain static and distinct from IT. Facilities systems, resources and personnel are typically managed separately based on performance metrics not related to IT. This narrow view has caused Facilities teams to focus only on keeping IT systems highly available by any means – often by ensuring significant and wasteful buffer capacity. This has resulted in low efficiency, stranded capacity and poor utilization of power, cooling and space. Additionally, the rapid adoption of virtualization and cloud computing make this traditional approach risky and inefficient and will drive data centers to take a more holistic view of IT and Facilities to comprehensively address capacity and efficiency while ensuring high availability.
This inflexibility, coupled by rapidly increasing energy costs, the economic pressure to contain expenses, regulatory compliance and increasing business expectations for IT to be more flexibile and agile are increasing the need to holistically manage all data center resources. The industry answer is quickly becoming Data Center Infrastructure Management (DCIM), a catagory which is expected to exceed $1B by 2015.
Today Facilities must provide IT with on-demand power, cooling and space management resources for the computing workloads that enable the delivery of business services in a flexible, agile manner. The inefficient allocation of Facilities resources to virtualized applications causes sub-optimal power consumption and buildup of heat densities, creating unanticipated “hot spots” in server racks. That is a huge impediment to optimize the data center for efficiency without increasing the risk of failure. As data center managers adopt virtualization to increase resource utilization to control operating costs, predicting the demand on Facilities systems at any given time becomes more difficult. Data center managers compensate for this by adding buffer capacity to the existing unused capacity (that has been reserved as a buffer against overload for peak capacity) resulting in 20% or more of stranded capacity and low utilization rates. Getting more out of the significant investments organizations have made in their data centers will be the primary driver for the adoption of new DCIM strategies. To reduce or eliminate much of their stranded capacity, data center managers need integrated monitoring and management of all resources that provide detailed insight into data center operations. This includes their Faciliites resources – power distribution, backup generators. cooling systems, space management accurately mapped to their IT resources (server, storage, switch) in order to increase data center efficiency. All of this needs to be accomplished with a single DCIM platform to be efficient and effective.
Once visibility to information is obtained, a “closed loop control system” capability is required to take appropriate action. This allows the data center managers to operate the critical infrastructure as a whole, instead of as individual parts/domains. Such “closed loop” solution allows managing the distributed infrastructure elements based upon knowing both the state of those elements as well as the impact that any change has on those elements. The data center infrastructure management solution will become a “closed loop control system”, or true DCIM platform, if it utilizes one ‘central source of truth’ – one database to contain all relevant information about the data center and its status.
Real-time capabilities for DCIM are critical to ensure that the Facilities Resources can respond dynamically to IT workloads and enable a dynamic data center ecosystem. With the increased interdependency between IT and Facilities, it is critical to see historical and real-time data of the dynamic change in resource consumption. Real-time capabilities are also critical for rapid problem management in a dynamic, data center ecosystem.. While Facilities can anticipate some of IT resource needs from a baseline capacity plan perspective, real-time capabilities are required to respond dynamically to business needs. Detailed monitoring and measurement of data center performance, utilization and energy consumption is crucial in order for data center managers to make accurate judgments and ensure proper planning.
The holistic management capabilities in a DCIM solution should address the following tasks:
Inventory Management to keep track of all of the assets
Change Planning to plan and execute changes with confidence
Site Management to fully track and reporting on the health of the devices.
Power System Management for complete management of the data center’s power system
Energy Insights to provide a view into energy consumption, plus data center inefficiencies
Virtual Integration Management to collect a detailed catalog of all virtual machines in the data center, of course in real time.
Cooling System Management for complete management of the cooling system.
The ability to have holistic visibility to all data center resources in real-time and execute a closed loop control to manage all the data center activities defined above, brings the ability to See, Decide and Act to DCIM. It also enables predictive capabilities and data center optimization by providing data center personnel with the visibility and control to optimize performance while maintaining or improving availability. Data center management can now become truly proactive as personnel can anticipate potential failures and automatically shift compute and physical resources to eliminate downtime while increasing resource utilization to optimize efficiency across the data center.
While real-time data collection capabilities in DCIM are very desirable to increase agility, flexibility and reduce risk, they need to be implemented in a very efficient manner. Collecting up to 10,000 data points/ second from across all data center resources could add significant management overhead to the platform without the right architecture. The ability to leverage Complex Event Processing capabilities and pushing out intelligence to the edge could substantially improve efficiency and make it financially viable for DCIM customers. In addition to the economics, scalability and flexibility of the implementation become very important as data points could scale to several millions for even moderately sized environments. Typically hardware/software solutions are more elegant that software only solutions for real-time data collection. Additional capabilities to be considered for DCIM are the true, real-time capacity management, interdependencies between logical and physical layer and powerful visualization tools that offer a rich, visual view of the infrastructure and can guide design and change management.