Not logged in » Login
Sep 27 2014

Data Deduplication with FUJITSU ETERNUS CS800, Pt. 3: The Advantages of Hardware

/data/www/ctec-live/application/public/media/images/dedupe/29522_ETERNUS_CS800_Rack_scr.jpg

In recent years, data deduplication has become one of the most popular and most easily misunderstood storage technologies at the same time. Misled by industry buzz, customers heavily invested in deduplication-ready platforms only to find that the new systems could hardly meet the promised data reduction rates of 20:1 or more. The third part of our four-piece-blog explains why it is useful to rely on hardware-based deduplication instead of mere software solutions.

The market for deduplication solutions currently is controlled by two vendor groups. One camp favors an entirely software-based approach; the other insists that to function properly, the deduplication process must at some point be supported by special hardware capabilities. To the regular user, this may seem like a purely academic discussion – he doesn't care how or at which level the function is implemented, so long as it helps to save time and money. Of course one might always argue that it's better to have some sort of deduplication than none; but then we've already learned in Chapter 2 that this is not a very valid or convincing argument. Instead, we need to look in more detail at the subject matter and weigh the pros and cons of each approach.

Software-Based Deduplication
Historically, deduplication was first developed as some kind of add-on for existing large-scale backup solutions. Pioneering companies such as Avamar or Rocksoft invested lots of time in honing the algorithms that were supposed to identify and filter data duplicates. This explains why a comparatively large group of vendors (among them Microsoft and Symantec) initially believed that a software-based approach, i.e. the integration of such algorithms into their server and backup products, would already achieve sufficient results. Moreover, it offered a fast track to implementation at a reasonable price: some of the earliest dedupe programs ran on regular x86 servers and supported a host of operating systems, including MacOS X and various Linux/Unix flavors. Some of these arguments still sound reasonable today, especially the one about platform independence, which aligns particularly well with many IT departments' goal to avoid vendor lock-ins.

On the other hand, purely software-based deduplication suffers from limitations of its own. In fact, you could even argue that its greatest strength – the adaptability to general-purpose hardware – is also its greatest weakness. That's because deduplication is, after all, a complex task, and most standard servers simply don't offer the performance or capacity to execute it alongside other processes. An analogy can be found with regard to computer graphics: if a PC only runs standard office applications and a browser, then an integrated graphics processor is usually good enough; however, if it's used for image/video editing or other highly parallelized tasks, then a discrete graphics card is mandatory. Likewise, deduplication will always yield the best results on purpose-built systems, which is why many initial advocates of a software-based solution have moved on and are now offering their own appliances. What's more, deduplication will have a great impact on storage and backup strategies, so despite its history it's nothing to simply add to a backup solution and forget about. That means a lot of effort must go into soft- and hardware integration (including planning, testing and rollout) – or else the effects will be underwhelming and the cost higher than anticipated.

The Hardware-Based Approach
Fujitsu was among the earliest adopters of the technology: a good seven years ago, we delivered our first dedicated storage systems with dedupe capabilities. Meanwhile, both deduplication itself and our solutions have significantly matured. Today, our Fujitsu Storage ETERNUS CS800 Data Protection Appliance brings the functionality to all environments where data are backed up to disk. The new, fifth generation of the appliance is based on the FUJITSU Server PRIMERGY RX300 model and combines the latest Intel® Xeon® E5 processor technology with extremely efficient algorithms optimized for variable-length block division. As a result, it offers average deduplication rates that enable customers to cut their disk capacities by up to 95% and save lots of money and floor space. Moreover, the amount of information that must be transferred to a second site or the main data center shrinks by a similar factor, enabling considerable savings on bandwidth.

For maximum flexibility, each ETERNUS CS800 supports various configuration options or "models." More specifically, each model provides access to a common deduplication storage pool through multiple views that may include any combination of CIFS- or NFS-based NAS volumes, virtual tape libraries, and the Symantec-specific OpenStorage API, which writes data to Logical Storage Units (cf. Fig. 4 below). Since all views access a common storage pool, redundant data segments are eliminated across all data sets that are written to the appliance.

 

Fig. 4: Sharing a Deduplication Storage Pool

Fig. 4: Sharing a Deduplication Storage Pool

In practical terms, this means the ETERNUS CS800 will recognize and deduplicate blocks regardless of their point of origin, i.e. the application, interface or source that was used to store or process data. For example, the appliance will detect identical segments stored on print and file servers backed up via NAS and on email servers backed up by a VTL. This type of 'enhanced cross-platform approach' typically leads to better results than can be expected from competing appliances. Another massive benefit is that customers get a preconfigured system with standardized interfaces that is ready to run out of the box and only needs to connect to the existing backup solution to take effect. In other words, the Fujitsu ETERNUS CS800 yields better results than the competition at a reasonable price. Plus, it offers a variety of extras, such as integrated remote replication and cloud recovery features, which we will explain about in the fourth and final part of this blog.

Walter Graf

Stefan Bürger

 

About the Author:

Walter Graf

Principal Consultant Storage at Fujitsu EMEIA

About the second Author:

Stefan Bürger

Head of Tape Automation at Fujitsu

SHARE

Comments on this article

No comments yet.

Please Login to leave a comment.

X

Please login

Please log in with your Fujitsu Partner Account.

Login


» Forgot password

Register now

If you do not have a Fujitsu Partner Account, please register for a new account.

» Register now