Fujitsu
Not logged in » Login
X

Please login

Please log in with your Fujitsu Partner Account.

Login


» Forgot password

Register now

If you do not have a Fujitsu Partner Account, please register for a new account.

» Register now
May 09 2018

Why We Build Servers That Are Failure-Proof

/data/www/ctec-live/application/public/media/images/blogimages/Fujitsu_TC_Server_Blog_-_PY_Awareness_Campaign_Pt._1.jpg

In today's hyper-connected economy, it's almost unimaginable that there once was a time when employees and customers were willing to accept that vital IT services could sometimes become temporarily unavailable due to hardware failures. Even more surprising is the fact that this wasn't during the stone age of modern computing, i.e. the 1980s and 1990s – in fact, that era lasted until maybe 10 to 15 years ago, before we took the 24/7 accessibility of data and computing resources for granted. Since then, outages linked to weak or malfunctioning hardware have been widely derided as signs of incompetence on the part of the teams overseeing IT infrastructures and services. Fujitsu's PRIMERGY servers can help them avoid such critique.

In recent years, IT departments all over the world had to deal with a massive increase in business- and mission-critical services. From email through ERP to real-time analytics, there's hardly an application or use case that doesn't qualify, and it often seems like new ones are added every day. At the same time, both internal and external user groups have become less tolerant of failures or outages than they were in the past, occasionally flocking to support forums and heaping disdain on data center staff before pressured admins can identify the root cause of an outage, to say nothing about repairing it. This behavior pattern occurs almost irrespective of whether the IT service in question is truly mission- or business-critical in nature – such as batch processing in financial transaction systems or BI workloads in retail – or whether, perhaps, a temporary suspension may be acceptable – for example, if a groupware server goes down for a few minutes and there are other, functional communication channels. As a consequence, IT teams who wish to avoid the scorn will look for hardware that is failure-proof to begin with, like Fujitsu's family of PRIMERGY servers.

Why Servers Fail
The reasons for a server to crash are about as manifold and varied as for every other piece of machinery, e.g. for cars or TV sets that stop working as intended without a warning sign. These reasons can typically be lumped into one of three groups:

  • Low product quality and/or limited capacity – the server itself or its components are not up to the task at hand: processors, memory modules and storage media may be weak, outdated or too small
  • Environmental factors – server rooms and data center buildings are too small, improperly ventilated or located in inappropriate areas, causing systems to overheat and collapse
  • Poor management and maintenance – systems are basically left unattended so that operating systems, drivers, firmware etc. remain unpatched and/or become obsolete

While most ICT vendors have little if any influence over external factors or work ethics, there are quite a few things they can do to improve product quality and provide the tools IT teams need to ensure that all systems and applications are running smoothly.

The Importance of Quality
Fujitsu has set itself very high quality standards with regard to both hard- and software products. Hence, when building PRIMERGY servers, we follow strict guidelines for production, quality tests and quality assurance. In a nutshell, we

  • Design systems that pair powerful, first-class components with innovative technologies, e.g. for cooling, to combine performance, efficiency, ease of use, and affordability
  • Assure that our suppliers deliver top-notch components and work closely with partners and suppliers to continually improve products
  • Offer a comprehensive range of maintenance and support services to help customers ensure system availability and business continuity

Image

Fig. 1: Product Maintenance Services, performed by certified support engineers, are part of the PRIMERGY solution package

If all of this sounds a little abstract, that's because it is. So let's take a more detailed look at these aspects:

  • System design and production – all PRIMERGY servers are built around a set of advanced core components, such as CPUs from Intel's Xeon® Processor Scalable Family, DDR4 RAM (supporting advanced ECC, memory scrubbing and SDDC), PCIe SSDs and, occasionally, Pascal- or Volta-based GPGPUs from NVIDIA. To these, we add our own modules, including mainboards and redundant, hot-pluggable fans and power supplies. Together, these building blocks ensure that customers can build highly effective, reliable and agile platforms for running databases, ERP and CRM, batch processes, real-time data analytics, VDI, CAD etc. Finally, we offer a selection of sophisticated cooling technologies – namely, Cool-safe® ATD and Cool-Central® LCT – that help our servers endure high ambient temperatures of up to 45 °C (113 °F) and to stay cool under stress.
  • Quality assurancerigorous testing of both single components and finished systems has always been a cornerstone of our production process. These tests are conducted at our Augsburg facility, where candidates must undergo electromagnetic compatibility and interference immunity tests that exclude electrical malfunctions and data loss as well as a visit to the climate lab, where we simulate extreme changes in ambient temperature while running various benchmark tests. These tests are also carried out over the entire lifecycle of our systems – and always after new third-party components are added.
  • On the software side, our ServerView Management Suite adds a wide variety of administration, maintenance and integration capabilities that provide all necessary functions for fail-safe, flexible and automated 24/7 server operations and improve end-user productivity. The ServerView Operations Manager includes some functionalities such as the PDA (Prefailure Detection and Analysis), or ASR&R (Automatic Server Reconfiguration and Restart) which are available for early detection of errors, and to enable a proper shutdown/reboot with automatic disabling of defective parts.
  • Another key software element is our fifth-generation Integrated Remote Management Controller (iRMC S5), which helps to increase security and administrator productivity with features like integrated Embedded Lifecycle Management (eLCM) functions that allow for comprehensive remote management via protected HTTPS connections; new profile templates that accelerate server deployment; a unified API that simplifies operation and management of heterogeneous infrastructures; and a modernized GUI that supports 'administration-on-the-go' via mobile devices.
  • In addition to cutting-edge products, Fujitsu delivers worldwide Product Maintenance Services. This service portfolio includes diagnosis and elimination of hardware faults, either remotely or on site, performed by a team of certified support engineers. This is to keep data centers and system deployments productive, efficient and secure over the entire lifecycle a system operates – just like on day one.

Conclusion
"Quality matters" – that's been the underlying principle of our production process since Fujitsu first started building Intel-based servers in 1994. Since then, we've found numerous methods to develop, build and deliver systems that combine excellent performance with reliability, flexibility and exhaustive management functions that are almost failure-proof and come at a reasonable price. Customers who run PRIMERGY servers won't have to worry about performance slumps or shaky applications – they can instead focus on running their business. 

A more compact and compelling video version of our message is here:

 

Timo Lampe

 

About the Author:

Timo Lampe

Product Marketing Manager, Global Marketing Server, Fujitsu

SHARE

Comments on this article

No comments yet.

Please Login to leave a comment.