All businesses have to have a storage strategy for their data and it’s not unusual for contracts to stipulate a data retention period in terms of decades. Holding onto data for this long presents challenges for archiving, where information needs to be stored and more later retrieved in a readable format.
There are many industries in which the long-term storage of data may be required, including healthcare, infrastructure, nuclear, mining, and legal. Long-term data storage presents significant challenges for both end-users and storage providers.
Archiving, the long-term storage of information, has become an important aspect of many modern enterprises. Even when the information is no longer needed for day-to-day operations, there may be legislative or project requirements for it to be retained for reference.
In some sectors, data may need to be stored for decades or even longer.
“I looked at the industries of our clients, and for the US we provide long-term data storage for local governments, hospitals, energy companies, and construction companies,” says Jakob Østergaard, chief technical officer at data recovery firm Keepit.
“We even have a food bank and a synagogue.”
For example, the pharmaceutical industry records the research and development of new treatments to protect their intellectual property (IP) and prove due process. “IP lasts for twenty years and can can still be legally held twelve years after that, as well as another year for good measure. That’s thirty-three years,” says Lasse Thomsen, information systems manager for Oxford University Innovation.
Archives need to be searchable and information has to be easily retrievable, in a readable format. This can be challenging for file formats that are no longer supported, or for data stored in an older medium that is not compatible with current systems.
Benefits and drawbacks of long-term storage mediums
Choosing the right storage medium for backups, particularly over time frames of years or even decades, is easier said than done. A range of options have been available in the past, with many still to choose from today, but there is no one perfect option.
Microfiche used to be a popular choice for archiving data. It offered compact and stable information storage but was very difficult to search and required a bulky micro-fiche reader.
Organizations later turned to optical disks or magnetic tapes to archive their projects, often with off-site storage. While magnetic tape’s lifespan of 10-50 years has served it relatively well so far, the increasing age of companies in the industry has led many leaders to search for better data storage alternatives. Data stored on magnetic tapes is also very slow to access.
It is claimed that DVDs can last up to 200 years (in ideal conditions), but this has not yet been proven. DVDs also have a limited capacity, so large projects may need to be stored on multiple discs, an additional complication. Some might be tempted to archive projects on external hard disk drives (HDDs), but these may only last three to five years.
Flash storage has pros and cons, but is most useful in the short-to-medium term. Even the best solid-state drives (SSDs) are unsuitable for long-term storage. This is because SSDs store data using electrical charge in semiconductor cells and charge can dissipate in drives that aren’t powered for several years.
Having the appropriate hardware available to read optical discs or magnetic tape may also be a challenge. Thirty years ago, 3.5” floppy disks were widely used but, with the exception of Japan, they are now only found behind old filing cabinets or in dusty lofts. Floppy disk drives are even rarer, so businesses would struggle to read the old format even if disks were found.
Software is constantly evolving alongside file formats. As specialist software is unlikely to be supported in thirty years, it’s important that archives are future-proofed to ensure information remains readable.
For example, Adobe’s PDF/A format which embeds all of the information within the file, so everything is preserved “as printed”. Consequently, PDF/A formats are much larger than the typical PDFs, which means they take up more space, but they do offer future-proofing that normal PDFs do not.
Some cloud storage providers are now offering storage contracts guaranteeing file storage for a hundred years. This solves a lot of problems for end-user organizations, such as no longer having to worry about the storage medium.
But when considering storage providers, leaders in IT need to consider where archives will be physically located. Storing multiple copies of information in the same location is not recommended, in case the area suffers a disaster, such as fires or flooding. Even with multiple secure locations, leaders need a tight disaster recovery strategy.
If the data is export-controlled, storing an archive in another country would count as an export and this also needs to be factored into plans.
Regional data protection regulations also need to be considered. The protections afforded by the General Data Protection Regulation (GDPR) can make Europe an attractive location for data storage, but the right to be forgotten could be a challenge. “I wanted our archive to be co-located somewhere different, ideally in Europe,” says Thomsen. “GDPR was also ticking the box for us.”
Investing in the right storage infrastructure
Depending on the quantity of data to be stored, having sufficient bandwidth for upload may necessitate a telecommunications infrastructure upgrade to accommodate the increased usage. The initial cloud migration may take several days, or even longer.
“We have over eight TB of data, which is a massive amount of data to keep,” says Thomsen. “It took a while for data to sort of get to a synchronized state – after maybe a month we had pretty much caught up.”
Even if a contract offers data storage for a hundred years, this is dependent on the company’s ability to stay afloat for a century or more. All a storage provider can ever guarantee is that they store information during their company’s lifespan.
“The average lifetime of a Fortune 500 company is 21 years now and it’s been dropping since the 1960s, so there’s no hard guarantee,” says Østergaard. “There are no special measures in the contracts where we commit to storing the data someplace special, and no one can guarantee that.”
As such, any organization relying on a cloud storage provider for long-term storage solutions retains the ability to export its archives or migrate them to a different storage provider, should the need arise.
Whilst cloud storage is currently a popular storage solution, research into optical discs is offering an interesting alternative. Scientists in China have created an optical disc that can store up to 125 TB of data, as reported by LiveScience, with a combined storage capacity of about 15,000 DVDs.
Although this condenses an archive into a single disc, making data retrieval simpler, having devices that can still read the optical disc in several decades will be a challenge. Also, the long-term stability has not yet been proven.
Microsoft’s ongoing research into glass data storage, dubbed Project Silica, aims to produce a low-cost storage medium that could hold securely hold data for hundreds or even thousands of years. If realized, the solution would see enterprise data etched onto plates of quartz glass using an ultrafast laser and then stored on an archival wall like a huge record collection. While the project is ongoing, it could be many more years until we see businesses adopt this as their first choice for long-term data storage.
Long-term data storage, especially when required to meet contractual or legal obligations, can be challenging and require careful research and selection. Above all, businesses must recognize that legacy file formats and hardware are likely to be phased out. Ultimately, some firms may decide that information deemed too important to trust solely to electronic storage should still be stored within paper copies. As the sector continues to innovate in this space, enterprises will be first in line to adopt technologies that can offer even longer, more stable data storage.