Archival, Data Restore & Migration
An Approach Paper to Archival, Data Restore & Migration
May it be structured or unstructured, transient or persistent, in today’s world there is no industry or business that can survive without Electronic DATA. Large & Small Businesses generate hundreds and thousands of petabytes of data each day & their respective corporate IT teams go through periodic struggle to store and manage DATA. According to a general wisdom or rule of thumb, from whatever data that gets generated by any business or industry, approx. 20% of the data is required to be HOT, i.e. always available and about 80% of the data resides mainly as COLD data. It’s amazing to see that through past 3 decades or so, this equation has not changed!
In order to meet their respective affordability’s, most of the businesses today require super performing expensive storage for the HOT data, known as primary storage and relatively cheaper but moderately performing storage for COLD data, which is known as secondary storage. Today’s market is filled with competent vendors supplying reliable & cost effective alternatives for both primary and secondary storages.
There is much bigger portion of data storage that exists beyond HOT and COLD – which is known as archival storage. While the HOT and COLD data may be required to run a business, the archival storage is required to assure compliance requirements for a business. Typically, such retention policies mandate organizations to keep the data in the form of an archival storage which is on lowest cost medium such as DAT/LTO. Since retention policies mostly require multiyear retention period (e.g. 7 years), business are required to figure out ways and means to comply with it at affordable prices. Accordingly, it had been widely adopted best practice to store such archival data on cheapest available medium such as DAT or LTO Tapes.
Besides compliance requirements there is another important requirement for any business, which is the DR (Disaster Recovery) – i.e. how resilient is a business IT setup towards disasters & how well businesses are able to survive disaster conditions? One of the integral part of this DR audits (aka Business Continuity Audit – BCA). A typical use case of BCA happens to be complete or significant loss of primary and secondary datacenters, which means both HOT and COLD data is inaccessible. In such event, the only reliance is on the archival data.
Now, following are top challenges with archival data stored on DATs or similar tapes (LTO).
- There has not been any concept of mirroring in traditional tape based archival mechanisms, which means up-to-date indexing of the archival contents need to be available. If it’s not available then restore & correlation may become quite time consuming.
- It may be practically impossible to read and restore contents from all the tapes regularly. Besides the number of tapes may grow enormously over a period of time. Which means at a given point in time, nobody would know if all the tapes are readable or not?
- Because of vendor dependent variants of tape drives, tape formats etc. the mere tape reading operation may land into tedious project running for weeks or months.
The retention periods (e.g. 7 years) are long enough that it could cause significant level of obsolescence of the SW and HW technologies. Which means even if a data can be read from tapes, it may not be guaranteed that the data types can be decoded & also the actual data can