Not logged in » Login
Nov 16 2015

Accelerator GPUs for Machine-Learning Optimization: NVIDIA Tesla M40 and M4


NVIDIA's new end-to-end hyperscale 24/7 data center framework Tesla Accelerated Computing Platform is designed to boost machine learning workloads of web data centers, and to create what the firm calls "unprecedented" artificial intelligence (AI) applications. The new platform consists of two GPUs – the Tesla M4 and M40 – and NIVIDIA's Hyperscale Software Suite.

Within this combo, the GPUs cover different tasks: while the M40 accelerates machine-learning workloads for advanced AI applications, the M4 speeds up all AI applications, thus increasing data center throughput.

The NVIDIA Tesla M4 is a compact form-factor GPU for demanding, high-growth web services applications including video transcoding, image processing, and machine learning inference, which efficiently offloads demanding applications. NVIDIA claims that M4 transcodes, enhances and analyzes "up to 5x more simultaneous video streams compared with CPUs" and delivers "up to 10x better energy efficiency than a CPU" for video processing and machine learning algorithms. Its Maxwell-v2 graphics chip with 1,024 NVIDIA CUDA cores and 4 GB GDDR5 memory at 88 GB/s bandwidth has a theoretical simple accuracy computing capacity of 2.2 TFLOPS. The low-power M4 chip has a user-selectable power profile resulting in a maximal power consumption level of just 50 to 70 W. The low profile PCIe chip is suitable for very flat servers.

According to NVIDIA, the Tesla M40 GPU accelerator (pictured above) is "the world's fastest accelerator for deep learning training, purpose-built to dramatically reduce training time." The comparably large Tesla M40, optimized for the most sophisticated Deep Learning (DL) applications; for maximum uptime in the data center, it has standard rack size and requires two slots. Attached to the oblong cooling container are a GM200 graphics card with 3,072 NVIDIA CUDA cores, 12 GB GDDR5 memory, and 288GB/s bandwidth. The chip's maximum simple accuracy computing rate of 7 TFLOPS leads to the very high power consumption of 250 W. So far, NVIDIA has disclosed one benchmark for the Tesla M40, resulting from the Caffe DL framework developed at the Berkeley Vision and Learning Center, an NVIDIA-sponsored project of UCLA. According to this benchmark, a "GPU server" equipped with four Tesla M40 cards can master a Caffe DL task in less than 10 hours (0.4 days), whereas a regular dual-socket server with two Intel® Xeon® E5-2697v2 CPUs running at 2.7 GHz and 64 GB of RAM needs 5 days and 22 hours.

To make the most of the new hardware, customers need NVIDIA's Hyperscale Suite, a software collection that's optimized for both machine learning and video processing. Hyperscale Suite offers tools that were specifically designed for web services deployments, among them cuDNN (an algorithm used for AI-related neural network processing), GPU-accelerated FFmpeg multimedia software, the NVIDIA GPU REST Engine (for creating high-throughput, low-latency web services) and the NVIDIA Image Compute Engine (for accelerated image resizing).

Since the Tesla M4 and M40 are data center products, they will only be available in complete server configurations from OEMs; at the moment, no retail version is planned. The Tesla M40 and Hyperscale Suite will be available later this year, the Tesla M4 is scheduled for release in Q1/2016.


Comments on this article

No comments yet.

Please Login to leave a comment.


Please login

Please log in with your Fujitsu Partner Account.


» Forgot password

Register now

If you do not have a Fujitsu Partner Account, please register for a new account.

» Register now