Disaster Recovery: Are You Prepared?By Jerry Hodgen
September 22, 2005
Scores of IT infrastructure professionals have been working night and day to recover data centers that were either devastated or put out of service due to Hurricane Katrina. The lucky ones, which were prepared for such a catastrophic disaster, recovered their systems at alternative locations weeks ago and are now in the process of planning an orderly migration back to their old data center, or in some cases, a new data center.
The less fortunate ones who did not have a solid disaster recovery plan are still struggling to deliver mission critical and financial data to their respective business teams.
Disaster recovery planning is priority one
This should be a wakeup call for all of us. Expeditious recovery in the event of a disaster that impacts the processing and access of business critical data and financial systems should be the top priority of any organization. Furthermore, this is a never-ending priority project, because as our IT systems evolve and grow, our recovery plans must keep pace.
For those of you that are fortunate to reside in geographical areas that are not prone to natural disasters such as hurricanes, tornadoes, earthquakes and floods, you should not fall into complacency as there are numerous other events such as fire, disruption of electrical power, failure of cooling systems. The list of potential data center disasters goes on for miles.
Dedicated resources and executive buy-in is key
In my mind, the first step any reasonably sized company should take is to take create a full-time Disaster Recovery Manager position. This position should be filled by a seasoned IT veteran and report directly to the CIO as well as having an open door to the CEO and senior business operations executives. A Disaster Recovery Manager that meets this criterion and has the ear of the higher-ups will be sure to develop and maintain an airtight recovery plan for the business.
The next step is to identify a steering team of business process owners and information technology infrastructure and system owners. This team, based on the direction of business operations, will determine what systems and in what order they will be recovered in the event of a catastrophe.
Now the grunt work begins, and depending on the size of the business and the magnitude of the information technology systems, the planning will requires a tremendous amount of foresight, investigation, and documentation. Most businesses have gravitated away from single mainframe processing to client/server and distributed systems architectures. This being the case, a myriad of system and hardware dependencies must be documented and accounted for in any disaster recovery plan.
Where and how will the recovery take place?
By this time, the application support and infrastructure teams should have determined what applications and hardware platforms must be recovered as well as the network connectivity to those platforms. Now is the time to determine where the recovery will occur. Depending on the size of the company, this could be another company-owned data center at another location. However, in most cases, a third-party company that specializes in disaster recovery must be contracted with.
Typically these third party companies have Data Centers at numerous global locations as well as mobile data centers that can be deployed at a moments notice. Sungard sits atop the industry and has years of experience. As a matter of fact, Sungard has played and continues to play a key role in the recovery of data centers which were put out of service or destroyed by Katrina.
Testing is crucial to success
Now that a disaster recovery plan has been created and recovery centers have been identified, the first order of the day should be TEST, TEST, TEST. You cannot undergo enough testing.
You should start slowly and deliberately by testing each single hardware platform and its resident systems or modules individually. Once you have documented and worked out all of the issues with recovering that device, you should continue to build upon it until you have recovered and linked all platforms which comprise the mission critical systems the company requires to function.
This testing should be as true to life as possible and include business operations associates accessing and executing transactions they would normally be performing remotely from other locations.
My hat goes off to all of my Information Technology colleagues on the Gulf Coast who have gone above and beyond to recover and restore the information technology systems they support. Above all, I hope that their families and loved ones are safe and out of harm's way.