Top 5 Data Centre Operations Management Mistakes To Avoid
The importance of data centres is constantly increasing, with new large-scale facilities being announced almost every month. It is common that the attention of data centre operators/owners is directed toward the design of data centres, but it is also critical to focus on effective operations that will contribute to high availability levels.
Research suggests that on an average, downtime costs more than $5,000 per minute with human error frequently being named the biggest cause of data centre failure, making it even more crucial that operators take note of the common mistakes that lead to costly incidents:
The FIVE Common Mistakes:
1. Insufficient training and skill development for staff
Having a well-trained and skilled operations staff can help reduce overhead costs, turnover, and downtime incidents.
However, staffing and skills shortages is generally agreed to be a concern in the data centre industry, with 22% of respondents for AFCOM’s State of the Data Centre survey in 2019 indicating that they have difficulty filling roles for facility technicians, engineers, and operators. These concerns continue to increase, as the required skills continue to advance along with rapidly evolving technology, society and business.
With human error being the most common cause of data centre downtime, it is incredibly important for the businesses to focus on providing effective training and skill development for operations staff. By implementing this, operation teams can understand how to safely manage and maintain the data centre, and also know how to react when there is an incident. More than that, they can identify weaknesses and prevent errors and mistakes from happening and further optimise the data centre environment.
NextTech Learning can be your partner in ensuring that the skill sets of your staff and the organisation is updated and is benched with the global standards by providing the necessary skill development and effective training opportunities. NextTech Learning, offers latest courses in data centre management in partnership with EPI training solutions.
2. Poor communication between operations departments and design teams
Aligning teams with goals and plans is paramount for any business, and the data centre industry is no exception. From an operations point of view, if a data centre is poorly designed it could impact the ability to maintain and manage the data centre facility efficiently and effectively. This could have a serious impact on the operational cost, safety and security, additional unplanned and possibly costly downtime, or at a minimum, incremental expenses correcting issues. This could damage a company’s investment, ROI and reputation.
Data centre operations teams are under continuous and increasing pressure to meet growing demands for flexibility, speed, and capacity, as well as enabling infrastructure for cloud computing, mobile technology, and virtualisation. To achieve this, operation teams must be involved in the design phase to optimise the data centre tailored to the requirements of advancing technologies and client’s business requirements.
Involving operation teams in the design process can ensure that the significant total cost of ownership from a maintenance and management perspective is taken into account whilst at the same time optimising resources and increase safety and security.
3. Inadequate risk mitigation and management
Operation teams face a multitude of risks within the data centre, including loss of power and cooling, natural disasters and fires, and cybersecurity threats.
The organisation should have appropriate risk management policies and procedures which are updated on a regular planned basis to ensure they are fit for purpose. If the risk management plan is poorly written or is not up-to-date it will give the data centre operators/owners a false sense of security that all is well under management control.
Staff should be well trained and should be tested on their abilities to deal with emergencies by conducting emergency drills and delivery ongoing training programs. Where real emergencies were being dealt with, reviews should be conducted, and lessons learned should be documented and serve as input to further improve the risk management plans and emergency procedures.
4. Weak policies and lack of integration between processes
Non-integrated processes are a common issue in many data centres across the world. This occurs when different departments are not aligned on their objectives and business interest due to poor communication across teams.
Take for example, a facility management department may have generator and UPS maintenance scheduled, while an IT department has a planned database migration from one system to another. If a UPS system went down during maintenance in the middle of this migration, this could lead to disaster.
Organisations and teams must align their policies, procedures, and processes to maximise cohesion and harmony between departments. This could be achieved by setting up standard or emergency operating procedures, method of procedure libraries, and vendor management plans. These same procedures should be implemented for effective change management.
It is recommended that these processes match the businesses criticality and maturity of the data centres.
5. Inefficient document management and change procedures
Without detailed and well-managed documentation, well intentioned staff are at the risk of making non-intended mistakes. These mistakes can be further compounded repeatedly when there is no proper process to document lessons learned and implement these changes.
Effective documentation should include the agreed operating procedures, full detailed and as-built design drawings, emergency response, equipment lists, and more.
These documents ideally are digitally accessible or printed out. However, for printed documentation, adequate procedures shall be in place to ensure that documents in someone’s drawer are always reflecting to true current state.
Improving operations management will be able to minimise downtime. The success of this depends on two factors – ensuring staff has the right competences through training, and effective processes for example- The Certified Data Centre Facilities Operations Manager (CDFOM) training course offered by NextTech Learning is set up to fully prepare managers in their journey towards optimising data centre operations.
As mentioned previously, NextTech is a proud partner with EPI®, delivering courses which enable organisations to educate and align their staff by offering select courses from the EPI® portfolio of globally accredited certified data centre professional training courses such as:
- DCFC®: Data Centre Foundation Certificate
- CDFOM®: Certified Data Centre Facilities Operations Manager
- CDFOS®: Certified Data Centre Facilities Operations Specialist
- CDMS®: Certified Data Centre Migration Specialist
- CDCS®: Certified Data Centre Specialist
- CDCP®: Certified Data Centre Professional
- CNCDP®: Certified Network Cable Design Professional