The de-duplication software that NetApp has added to the OnTap operating system that powers its NearStore and FAS filers is based on the A-SIS code that the company first began shipping two years ago, in its SnapVault for NetBackup tool.

De-dupe has already being heralded as the most important storage technology to emerge in this decade, because of its ability to transform storage economics, and make disk-to-disk and remote backup much more affordable.

For what NetApp says is only a 2% to 3% increase in the net cost of its disk arrays, customers will be able to reduce the volume of the data stored on those devices by around a third or more for file-level data.

For backup data, which by its nature includes a greater degree of data de-duplication, NetApp is claiming about the same ratio as other vendors of de-duplication systems tailored only for backup data, which is typically around 20:1.

But that highlights a major difference between NetApp’s A-SIS – Advanced Single-instance Storage – and the virtual tape libraries, appliances and software-only systems from suppliers such as EMC, Symantec, Data Domain, Diligent, Quantum, Exagrid and Sepaton. Those are all designed to work only with backup data.

NetApp is really the first vendor to support data de-duplication across all tiers – primary, archive and backup – in the same platform. That’s a beautiful thing, said Tony Asaro, analyst at the Enterprise Strategy Group.

But NetApp’s customers are not going to be pack in a third more primary data or 20 times more backup data onto their disk arrays in return for a mere 2% or 3% increase in capital cost.

The process of breaking data into chunks and then using a hashing algorithm to identify the duplicates applies a processing overhead, and this is one of the reasons why de-dupe so far has been associated only with backup applications.

An issue for A-SIS is that the NetApp boxes will be doing the de-duplication processing in the background, while they continue with their normal duties serving block-level and file-level data. That may be too much to ask them to do.

Although NetApp says that the de-dupe only applies a 1% overhead during write, and zero overhead for read, these percentages are simply for the read and write operations, and give no indication of how many processor cycles are consumed by the background data de-duplication.

In a white paper removed from NetApp’s website only two months ago, the company advised customers to limit the use of A-SIS in order not to adversely affect performance.

NetApp yesterday told Computer Business Review the advice applied only to the de-duplication of data that had been snapshotted. The advice applies today, and the limitation will be removed in the future, said Ravi Chalaka, senior marketing director at NetApp.

Perhaps this is why SnapVault for NetBackup – the snapshot-making tool tailored to work with Symantec’s popular NetBackup software which until this week was the only vehicle for A-SIS – has sold so few copies.

NetApp admitted that SV for NBU has been implemented by only around fifty or 100 customers since it was launched two years ago. The storage vendor argued that this number is low simply because the tool only works with version 6.0 of NetBackup, which itself has still not been widely taken by customers.

Not so says Symantec, which claims that NetBackup 6.0 has been implemented by tens of thousands of customers since it was launched in 2005.

All that NetApp would say to defend A-SIS was that it was not the cause of the slow sales for SV for NBU, which in any case has seen several performance improvements over the last two years and is beginning to see sales growth.

Asora joined NetApp’s in the A-SIS defense. NetApp is saying that A-SIS is not recommended for high performance environments. But I think they’re being over-cautious – it is after all a background process, Asora said.

NetApp said that the beta testers for the new implementation of A-SIS have included a surgical equipment maker that saw a 38% reduction in the space needed for file archives, and an oil drilling company that was storing large volumes of file data, and saw a 35% reduction in storage requirements.

Meanwhile a large manufacturer backing up databases to NearStore saw a 50:1 volume reduction, and a bank backing up VMware server images saw an 88% cut in volume.

NetApp is not going to put any other de-dupe vendors out of business overnight, as the rest of the pack have all implemented de-dupe specifically for use with backup systems, in different way.

NetApp’s system is an out-of-band, or post-processing de-dupe system. Suppliers such as Diligent and Data Domain that have adopted the alternative in-band architecture will argue that their systems – offering VTL and NAS interfaces – complete the de-duplication process at the same time that data is being ingested, and so can begin piping backup data offsite to safety at a secondary DR location immediately it arrives.

Other out-of-band suppliers are likely to argue that NetApp will not be able match the same data reduction levels that they deliver because NetApp can only work with the fixed 4KB blocks used by its WAFL file system.

Symantec’s PureDisk and EMC’s Avamar systems are software applications designed to use de-dupe to allow remote offices to be backed up remotely.