Dr. Han Wang, Principal Architect at Inspur
Digital transformation across industries is driving a cloud data center construction and innovation boom, giving rise to a huge cloud service market that comprises public, private, and hybrid cloud environments. But as many technological innovations of data centers focus on the cooling, power supply, and management of hyperscale data centers, not enough attention is being paid to innovating small and medium-sized data centers to address the needs of the users of private and hybrid clouds.
The OpenRMC Project with the Open Compute Project (OCP), led by Inspur and Intel® with contributions from Microsoft and Wiwynn, introduces a rack management solution that integrates hardware and software that helps data centers improve construction efficiency, simplifies operations management, and enhances operational efficiency.
The intelligent era calls for open, automated operations and maintenance capacity
Operations and maintenance are an integral part of data center operations, and are growing increasingly complex. As the intelligent era unfolds, the diversity and complexity of application loads in data centers are increasing, new technologies such as artificial intelligence (AI) and containers are being introduced, and computing resources are becoming heterogeneous and pooled. And, in addition to traditional CPUs, accelerator computing units such as GPUs and FPGAs are now playing an increasingly important role in server systems.
To improve the reliability and availability of their data centers, and reduce business interruptions from software and hardware failures or system upgrading, users are looking to enable the automated deployment, automated inspection, in-depth fault diagnosis, and intelligent alarms as they strive to provide effective support for critical businesses and data.
Meanwhile, as the processing performance of CPUs and GPUs – core components of computing resources – gradually advance beyond Moore's Law, the adoption of multiple cores and advanced processes is driving up the energy consumption of processors and servers.
Cooling and power supply energy consumption accounts for a considerable part of data center operational costs, putting enormous cost pressures on companies. Thus, higher energy utilization, as well as green and energy-efficient design, are essential for boosting data center competitiveness and striking a balance between environmental and economic benefits.
But due to the difficulty of monitoring the performance and power consumption of servers in real time and at a fine granularity, traditional data center operations have failed to achieve the desired energy efficiency.
OpenRMC enables much better and easier monitoring of power consumption. In real time, the aggregated power consumption of all the equipment can be reported in real time along with the aggregated performance metrics. This data is necessary to accurately measure critical energy consumption and determine efficient computing resources.
From the above analysis, we can tell that almost 30% of power capacity of the rack is over-reserved as backup. With the accurate telemetry and power control function of OpenRMC, power utilization and rack density could be improved by 15%~25%.
Automated data center operations are essential to reduce energy consumption and optimize server resource allocation. In recent years, OCP has made major advances in delivering higher computing density per unit space, reducing vendor lock-in through unified specifications, and quickly responding to unexpected application demands.
To achieve this, the design and delivery of a flexible and modular rack solution for data centers holds the key.
Due to limitations in terms of product, technology, and capabilities, the deployment of automated data center operations and energy conservation equipment is still in the early stages. Given the boom of hybrid and private clouds and the particularity of users' needs for operations optimization and system design, there has emerged a universal and pressing need to help data centers elevate their operations capabilities.
OpenRMC, a promising rack management solution for data centers
In response to the growing and pressing need for enhanced automated data center operations and system availability, as well as reduced energy consumption, Inspur has initiated and led the establishment of the OpenRMC Project under. The Project aims to lead the industry to provide a software & hardware based, rack management solution through open-source management features.
A crucial issue addressed by OpenRMC is enhancing openness and usability. During the operation of a traditional data center, each server node is the most important managed control unit. Only when each node can work stably and efficiently, the systems in the whole rack can be coordinated and utilized in order. The BMC on the server node is the key to manage each server; the BMC is in the form of SoC, and through its own abundant IOs, BMC connects many sensors to the various subsystems and obtains the information to control the environment. OpenRMC uses the BMC of each node as the basic unit of management and control, supports IPMI and Redfish interface, and implements management functions such as remote power control, Serial over LAN, host node CPU and memory operating status monitoring, and hard disk LED on/off.
In terms of software and communication interfaces, in addition to supporting common IPMI interface standards and different commercial BMCs, such as iLO and DRAC, OpenRMC also supports OpenBMC open source management software architecture. This software architecture uses the Linux kernel to build the SoC system, and the application layer also uses similar modular software packages, so that the construction of the BMC management system uses a unified API, and the development and deployment of the BMC management function of a new device can be completed in a very short period of time.
Key Modules and Software Architecture
Inspur has developed a system-level management suite based on OpenRMC for rack management. The suite provides users with a firm grasp of the conditions of all components and equipment in the rack by simultaneously monitoring system equipment like servers and storage units, modules such as power modules, fans, network switches in the rack, and ambient temperature. Meanwhile this management suite displays the information through visualization devices to meet the needs for automated operations. On this basis, Inspur has defined the interface specifications for northbound management that targets all the equipment in the rack and contributed them to OCP. The move aims to promote the seamless connection and effective communication between northbound presentation and southbound management within the OCP framework.
Intel® and Microsoft are also actively promoting the innovation and application of OpenRMC. In 2014, Intel® released the Intel® RSD (Rack Scale Design), a reference design intended to promote the technology for resource pooling and flexible deployment in data centers to improve resource utilization. As one of the sponsors of the OpenRMC project, Intel® has open-sourced the RSD rack management module and management APIs (RSD RMM REST API) and contributed them to the OCP OpenRMC project. It has also provided the reference code and methods for obtaining the parameters of key functions and components, such as chassis, power supply, and cooling.
Microsoft Azure represents one of the largest public clouds in the world and as an owner of hyperscale data centers and provider of cloud computing services, Microsoft has provided open-sourced server standards (OCS) and Olympus to the OCP community. It has shared its own experience in data center management with the community and proposed several different RMC hardware implementation methods. The company has also provided suggestions for the software modularization design of OpenRMC firmware, as well as examples of accessing the status of rack-level components, management and monitoring.
The code and hardware reference designs contributed by OCP Project members have greatly diversified the use cases of OpenRMC and innovated the automated operations ecosystem. The members have also provided an underlying platform and credit guarantee for the wide adoption of OpenRMC functions.
With OpenRMC, a rack management system, based on open-source technologies, can be scaled up to help both large-scale and small- and medium-sized data centers integrate heterogeneous equipment, realizing automated and fine-grained operations. In this way, data centers can reduce their IT operations costs, simplify management and improve efficiency.
Learn more OpenRMC: https://www.opencompute.org/projects/openrmc