Posts Tagged ‘Data’

4 steps to effective Disaster Recovery planning

by Marc Watley on August 23, 2010

Question: A wildfire 10 miles away from your company headquarters is raging out of control. The fire captain just ordered everyone in your building to evacuate. All staff have safely evacuated premises, and now you are likewise heading out, taking one final look at your datacenter – still humming away, unsuspectingly. You have offsite data storage but no offsite server infrastructure, applications, etc.

What do you do?

I’m paraphrasing from a not-so-great movie here – Speed (Keanu may have been good in The Matrix but the predictable tête-à-tête between his and Dennis Hopper’s character in Speed still makes me chuckle) – but IT executives today are, in fact, increasingly faced with the threat of disasters – whether natural (such as a wildfire) or man-made (e.g. some ding-dong crashing a vehicle into your datacenter). I may be taking a bit of creative license here, but this could not be a more serious issue. (Recall those horrible wildfires in San Diego, California area a few years back? The example above was culled from situations experienced during that period.)

As organizations – and their customers – increasingly rely on database, server, and IP-connected applications and data sources, the importance and responsibility of maintaining continuity of the business infrastructure and limiting costly downtime in the event of a disaster, is paramount.

Though many an organization had active disaster recovery (DR) projects on the books a few years ago, the global financial crunch of the last 20 or so months has wreaked havoc on IT budgets everywhere; only now are many of these DR projects once again taking priority.

If you’re thinking that you can ‘wait it out’ and disaster won’t strike on your watch, think again. Apparently, some 93 percent of organizations have had to execute on their disaster recovery plans. Yep. This according to an annual DR survey from Symantec last year.  A few more points from this survey:

  • In general it takes companies [with active DR plans] on average three hours to achieve skeleton operations after an outage, and four hours to be up and running
  • The average annual budget for DR initiatives is $50MM (including backup, recovery, clustering, archiving, spare servers, replication, tape, services, DR plan development and offsite costs)
  • Virtualization has caused 64 percent of organizations worldwide to reevaluate their DR plans

Whether your organization is a small recently funded startup or well-entrenched in the Fortune 100, designing, implementing, and testing a DR plan is an endeavor that takes dedication, careful planning and time (the entire process can take weeks or even months). There are many excellent resources available which can provide knowledge and detail as to the individual steps of a DR planning initiative.  (Cisco’s DR Best Practices site or Disaster Recovery are great places to begin, by the way.)  What follows is a high-level, best-practices overview of the planning process:

Executive Sponsorship

This first step of a successful DR plan involves two key components: One is to secure plan sponsorship and engagement from senior company leadership – CEO, COO, CIO, etc. The other is to establish a planning team that is representative of all functional units of the organization – sales, operations, finance, IT, etc.  This step is the catalyst to a smooth planning initiative, and requires focus and patience.  (The ability to herd cats wouldn’t hurt, either.) It may also be helpful to reduce the impact on internal resources by leveraging outside help from a consulting firm well-versed in DR planning.

Information Gathering

This portion of the planning process – information gathering, due diligence and assessment – is the most involved and most time-consuming, and a true test of teamwork across the organization.

The first step in this part of a DR planning initiative is performing a Business Impact Analysis (BIA), which helps to assess the overall risk to normal business operations (and revenue flow) should disaster strike right this second. The BIA is typically comprised of identifying and ranking all critical business systems, analysis impact of interruption on critical systems, and most importantly, establishing the maximum length of time critical systems can remain unavailable without causing irreparable harm to the business. This length of time is also known as Maximum Tolerable Downtime (MTD).  Working backwards from the MTD will allow acceptable Recovery Point Objective (RPO) and the Recovery Time Objective (RTO) to be reached.

With BIA in hand, the next steps are conducting a risk assessment and developing the recovery strategy.  The risk assessment will help to determine the probability of a critical system becoming severely disrupted, identifying vulnerabilities, and documenting the acceptability of these risks to the organization.  Engagement from the entire planning team is necessary in order to accurately review and record details for critical records, systems, processing requirements, support teams, vendors, etc. – all needed in order to develop the recovery strategy.

Also important in the recovery strategy is identifying the recovery infrastructure and outsourcing options – ideally alternate datacenter facilities from which critical systems and data can be recovered in the event of a serious interruption.  This, as they say, is the point at which the bacon hits the frying pan: Many organizations are leveraging the power and abundance of Cloud-based IT resources to lower infrastructure costs, and Cloud is particularly applicable for DR.  In fact, there are more than a few services who provide continuous data protection: typically accomplished via unobtrusive software agents residing on each server in a datacenter. These agents are then connected to a black box also residing in the datacenter, incrementally taking images of each server, de-duplicating the data, then replicating that data via secure WAN to a remote data store, ultimately providing on-demand (via secure web console) recovery from the remote location at any time. Companies such as nScaled, iland, and Simply Continuous offer such services and can even help build a business case to illustrate the ROI for this service.  Point is, do thy homework and explore if Cloud services such as these might make a sound fit into your organization’s DR plan.

Planning and Testing

Armed with a full impact analysis, risk assessment, recovery goals, and outsourced options, now the actual DR plan can be developed. The DR plan is a living document that identifies the criteria for invoking the plan, procedures for operating the business in contingency mode, steps to recovering lost data, and criteria and procedures for returning to normal business operations. Key activity in this step is to identify in the DR plan – a recovery team (which should consist of both primary and alternate personnel from each business unit) and to identify recovery processes and procedures at each business unit level.  Also important is to ensure the DR plan itself is available offsite – both via the web and in permanent media form (print, CD-ROM, etc.)

Equally important to having a DR plan is regular testing. This step includes designing disaster/disruption scenarios and the development and documentation of action plans for each scenario. Conducting regular testing with full operational participation is key to successful testing.

Ongoing Plan Evaluation

An effective DR plan is only a good plan if continually kept in lock-step with all changes within the organization.  Such changes include infrastructure, technology, and procedures – all of which must be kept under constant review, and the DR plan updated accordingly.  Also, DR plan testing should be evaluated on a regular basis, and any adjustments made (systems, applications, vendors, established procedures, etc.).

So there you have it – four key building blocks to tailoring a DR plan for your organization.  Of course, if the ‘disaster’ arrives in the form of a city-sized asteroid hurtling towards Earth, needless to say any plan will likely not make much difference. Anything short of such a global catastrophe, however, and a well-developed and maintained DR plan will keep employees and customers connected and business moving forward, with minimum downtime.

Again, this is by no means a complete recipe for designing and implementing a DR plan but instead is meant to serve as a high-level overview…offered as food for thought.  I encourage you to learn more, explore options, ask for help if needed – whatever it takes to thoroughly prepare your organization for the worst, should the worst ever occur. To loosely paraphrase our man Keanu once again from another of his, er, more questionable films from back in the day – Johnny Mnemonic – this is one topic where you absolutely, positively don’t want to “get caught in the 404″.

Data Isn’t Information

by Wayne Turmel on August 10, 2009

Readers of this site are very tech savvy – in fact (without sounding too flattering) I’d suggest that we are among the most technically proficient workers in the world. I would also submit that many of us don’t use technology properly. I don’t mean our fingers don’t fly and we can’t multi-task-web-cam-Google-group like a rock star. What I mean is we send more data than information.

Here are a few examples to clarify further: You check your email inbox or your project collaboration site. There’s the spreadsheet you wanted with the numbers you need to complete your task. That’s data. The problem is that the person who sent you those numbers didn’t tell you that they were put together at the last minute because they’d be in trouble if they were late, that they are only based on someone’s best guess or that the minute they hit “send” someone called with a last-minute correction. That’s context and it’s what turns data into information you can actually use.

There is an old model that talks about the learning and communication hierarchy:

dikw

Data (the raw numbers or facts) turns into… Information (what it means) which, when we apply to our real life problems effectively, we turn into… Knowledge (how do we apply this contextual information to move the project/company/species forward and finally… Wisdom (how do we use this knowledge in the most far-reaching, strategic and positive way)

In the lightning fast-paced work world, data is constantly flowing. We have all kinds of tools that allow us to get the numbers/project status/debugs anywhere in the world in seconds. The problem is not with the delivery of data, it’s how it’s processed and turned into action once it arrives.

We need context in order to understand all the subtleties of what the data means and what to do with it. Context is established when we seek answers to questions like:

  • Why is this data important?
  • Where did it come from?
  • What are you supposed to do with it?
  • Who sent it and how much do you trust them?
  • Who will use it and why should they trust you?
  • In other words, the data and the tools that send it are useless without the human dynamic, which brings us back to technology. We have all the technology we need to send the data and create context, we just don’t use it as well as we might.
    Take for example. You are an Agile team that wants to hold your scrum and get back to “the important stuff”. You don’t waste time on social niceties and “fluff”. Effective web meetings are held to under 10 minutes, the way they should be. IMs are held to ten words or less and anything more social than a “Hi are you busy” is considered unimportant. But if you don’t have social conversation, or allow for time to get to know each other, do you really know what’s going on with your teammates? Do you know who’s having trouble, who’s really doing more than their share and who can really give you insight into the data you’ve just received?

    I hear so many times that web cams are a waste of good bandwidth; time zones mean it’s easier to just hit “send” and go to bed, knowing that the folks in Bucharest or Bangalore or Boston are professionals and will know what to do with it when it arrives; Group collaboration sites don’t need pictures of the teammates on whom your job depends and “Why does it matter what Mary or Karim look like as long as the work gets done?”

    It matters. The human component matters, and we ignore the tools – and more importantly the techniques – that let us build those human connections at our peril.

    I’d like to end with a thought provoking question: What are you doing for your team (and what help is your company giving you) to learn to send data as well as turn it into information?

    ——

    It is about the data

    by Thomas Frasher on June 16, 2009

    blue_dataMany new online offerings are intended to work directly or indirectly with customer provided data.  Acquisitionaggregation and data security are but some of the concerns that online offerings must take into account if they are to have a successful business.

    1. Acquisition – this is the first and most important step, without this working reliably no amount of feature development will make the product successful. The ease of acquisition setup and consistency of data acquisition is of paramount importance if the offering provider expects to minimize early abandonment of the application. Here are some scenarios and the likely results for each:

    a. Data acquisition is easy to setup – the offering can see rapid adoption, and very high growth rates.

    b. Setup is moderately complex, either in detail or in complexity of procedures – the abandonment rate may still remain low as long as value is established early.

    c. Setting up the data acquisition is very complex, unexplained terms are used, or has cryptic workflows – the initial abandonment rate will be high.

    d. Setup is easy and the data collection is unreliable – the abandonment rate will still be high.

    Clearly, from the scenarios above: The data in the offering and what it represents as the ability to take care of present and future concerns is all that is important to the customer. Anything that impedes, challenges or thwarts those concerns is reason enough for abandonment.

    2. Aggregation or ETL – This is the second step and MUST be flawless. ETL refers to Extract, Transform and Load; this is the process by which the customer’s data is uploaded into the offering database. Any failures at this point reflect on the application very poorly, and once that happens, customers don’t trust the details that they see in the offering. What’s worse is: the lost trust is difficult if not impossible to win back.

    3. Data Security – The customer’s data must ALWAYS be secure and the customer should be able to determine the securing mechanism and whether they are connected to the offering website (security certificate, SSL connection, etc.). Online offerings will only get one chance to fail on this point. As has been seen over the past few years, failure to secure customer data as in the case of Heartland Payment Systems is very damaging to the public identity of the company and depending on the type of data compromised has large legal consequences.

    Suffice to say that as online offerings continue to be a substantial area of growth for companies in the coming years, paying due attention (or not) to acquiring, aggregating and securing data will be the thin line between abandonment and success… afterall, it is about the data!