Because even the most thorough disaster preparedness plan won’t be able to justify the cost of including every mission process – especially for small organizations with limited resources – it is important to inventory and prioritize critical processes for the entire organization.
Organizations should tier data based on its import to operations. For example, processes that need to be resumed within 24 hours to prevent serious mission impact, such as citizen service delivery, or that will have major effect on stakeholders could receive an “A” rating, while those that need to be resumed within 72 hours could receive a “B” rating followed by those “C” functions that can be restored in more than 72 hours.
In addition, several software packages can help an agency or institution assess its disaster preparedness and map out strategies that fit the organization’s needs and goals.
2. Take steps to protect data.
Aside from people, information is the single most critical asset for virtually any organization. Organizations should back up data frequently to ensure records are kept, and consider upgrading the backup equipment to a faster version to reduce the time it takes to complete a backup cycle. Automated, remote backup services are available from many vendors.
Organizations should also store multiple copies of data off site and a long distance from the primary data center. Outsourcing this service may make sense for small and mid-sized organizations that do not currently operate in a suitable, alternative location.
There are a few different approaches to backing up data that are increasingly affordable for smaller agencies and institutions. They include:
- Tape Rotation: Information on servers is copied to storage media (typically tapes) on a set schedule. These tapes are then removed to an offsite location for safe storage. This is the most basic approach to data backup
- Data Replication: Information on servers in one location are copied – either in real time or on a set schedule – to servers in another location. As a result, the data in one location has an exact mirror image in another location – often at a great distance. The off-site server then takes over operations if the primary server is damaged
- Appliance Backup: Like data replication, the information on servers in one location is copied – either in real time or on a set schedule – to a storage appliance in another location. This does allow for a mirror image of the data on the server, but does not include offsite facilities should the primary server infrastructure be destroyed
- Data Vaulting Facilities: Information on servers is copied to an on-site central depository, which is then replicated to an off-site data vaulting facility typically owned by a third-party organization
Once data is backed up, organizations will need to carry out a practical and well-tested plan to retrieve the information. The same IT architecture should frame both the organization’s disaster recovery site and the primary data center, reducing complications. If the organization uses a wide-are network (WAN), the Internet, an intranet portal and telephones to provide citizen services, the same infrastructure should be built at its backup facility, for example.
Organizations focus so much on protecting and backing up network server data that they often fail to take steps to ensure their employees can remotely access that data if they are unable to work in the office. Remote-access software, such as products provided by Citrix and Microsoft, can enable employees to access networked server or desktop information offsite.
3. Review power options.
Organizations should add uninterrupted power supplies (UPS) for critical servers, network connections, and selected personal computers to keep the most essential applications running.
In addition, cooling systems should be supported by backup generators. Computer rooms can heat up quickly if computers operate on backup power without adequate, precision cooling. Monitoring for heat and humidity also are essential in critical computer rooms. Heat is the biggest threat to UPS battery life, and temperature increases can reduce the lifespan of network equipment by half – and also cause unplanned system interruptions when agency operations are most critical.
Having a power backup system does not eliminate the requirement to regularly inspect and maintain the power infrastructure. System administrators should periodically ensure that automatic transfer switches are configured so that there is little lag time to disrupt UPS power to computer systems. At the same time, they should take the opportunity to conduct regular battery inspections and replacement. Like flashlight and smoke detector batteries, UPS systems need to be inspected before they are needed.
Finally, if the system must stay operation, building redundancy into the power system is another proven means to ensure power system reliability and, therefore, network availability. Redundancy enables maintenance of a UPS module without affecting power to connected equipment. It also increases fault tolerance.
4. Identify and appoint a cross-functional preparedness team and a recovery team.
Organizations should pull together a cross-functional team from appropriate departments that can include computer operations, applications development, server and systems administration, facilities, key service departments, data security, physical security and network operations. This team can identify and prioritize critical processes, design the overall process for recovery, select an outside service provider, conduct tests, identify members of the preparedness team and document the plan.
The cross-functional preparedness team will select the recovery team, which will participate in recovery activities after any declared disaster. While the recovery team can be similar to the cross-functional preparedness team, its members should not be identical, even within a small organization. Additional members should include the executive sponsor (e.g., CIO or COO), key stakeholder representatives (e.g., community liaison), and representatives from outside service providers.
5. Document, test and update the disaster preparedness plan.
The cross-functional preparedness team should document a disaster preparedness plan that clearly defines the role of each individual on both the cross-functional preparedness and recovery teams. Documentation should include updated configuration diagrams of the hardware, software and network components to be used in the recovery. The plan should include logistical details, including travel to backup sites, and even who has spending authority for emergency needs. This plan also should include lists of emergency contacts and instructions.
Once complete, the plan should be tested to ensure that it will be accurate and effective in an emergency. The true value of a continuity plan can be assessed only if rigorous testing is carried out in a realistic environment. That means testing the plan in an environment that simulates the series of events likely to occur in an actual emergency. It also is important that the tests be carried out by the people who would be responsible for those activities in a crisis. While an organization is likely to make mistakes during such testing, it is best to experience, identify and address these errors well in advance of a real emergency.
Because change is constant within most organizations, and because the organizations are increasingly dependent on information systems, it also pays to update the plan regularly. Products and services designed to help in the event of an emergency also change, as does their method of delivery. A business continuity plan must keep pace with these changes for it to be useful in the event of a disruptive emergency, and tests must be conducted regularly to ensure organizational preparedness.
6. Consider telecommunications alternatives.
Key to any organization’s disaster preparedness plan is a contingency plan for telecommunications. Alternative communications vehicles, including wireless phones and satellite phones, should be considered.
Power for communications is just as important as it is for the rest of an organization’s IT infrastructure, so it is important to become familiar with the local telephone system’s emergency power capabilities and limitations. Organizations may want to investigate auxiliary power sources such as an uninterruptible power supply or battery back-up, either of which can be coupled with a surge protector. If on-premises telecommunications equipment uses software voice mail or a call accounting system, the software should be backed up regularly so valuable information about the system’s configuration is not lost if it goes down. Copies should be stored both on and off-site.
In addition, various telecommunications services can help organizations quickly restore communications connectivity:
- If the agency uses an 800 number for critical functions such as order taking or citizen services, this number can be terminated, or rerouted to another telephone number. A plan should be in place for answering those calls as well
- Call forwarding is an optional feature offered by the local phone company. A main telephone number can be forwarded to another office location, depending on anticipated call volume, or to an employee’s home. Calls can even be forwarded to cellular phones. Organizations may want to have call forwarding permanently installed on their main business telephone number so it can be easily activated in the event of an emergency
- In an emergency, the ability to place long distance calls can be greatly restricted. To minimize disruptions, organizations should maintain relationships with multiple service providers, enabling access with one network if another is down
7. Form tight relationships with vendors.
A strong relationship with hardware, software, network, and service vendors can help expedite recovery, as these vendor contacts often can work to ensure priority replacement of critical telecommunications equipment, personal computers, servers and network hardware in the event of a disaster. This is especially important for small- and medium-size organizations, which may lack the resources that larger companies can tap in an emergency.