Backups, Snapshots and Arrays, oh my!

There seems to be some confusion about what qualifies as backups and what doesn't. I'd like to take a minute to clear this up.

Backup: According the Merriam-Webster.com, a backup would be defined as "a copy of computer data"... emphasis on the word "copy"

First, let me cover what snapshots are in almost every system that uses them. A snapshot goes through certain steps:
1) Snapshot is requested
2) System (be it virtualization system, OS or array henceforth referred to collectively as "system") quiesces the data to commit any buffered data
3) Data is locked by the system (now referred to as source)
4) Change log is created (referred to a "new data")
5) As data changes are requested, pointers are created that reference the source with the new data updated. As more data is updated, the change log can grow up to the size of the original source data
6a) New snapshot (multiple snapshot config): change log is frozen like the original source data and new change log is created which points to the combination of source plus the previous new data.
6b) Data committed: Source data is updated with new data, thereby overwriting the old information
6c) Data reverted: Change log is destroyed and only source data remains

(This sounds a lot like a database with transaction logs doesn't it?)

This is an over simplified version of what happens, but still pretty much cover it. Notice what's missing? So how is this not a backup? The data is never copied/moved. It always resides in it's original spot and there is only the original+changes.

"But Charles, how's that any different than an array backup?"

Well, it's the same but isn't. If we're only talking about an array creating a snapshot, then that's not really a backup. If the array fails, data is lost. If a controller fails, data is lost. If the RAID tolerance is exceeded, the data is lost.

BUT if you include replication and retention into the mix, then that snapshot is taken, then is duplicated to another location with historical data. Whether it be a different array in a different rack, or across the planet. Now you have redundancy.

"Can't you accomplish that with just replicating the data?"
Yes and No. You accomplish the copy part, but since you're only replicating the data, then you have no retention. In the event of data corruption, data is lost (think cryptolocker/wannacry).

Backup solutions are backup solutions because by design, they make a copy from a source to a different location and have the ability for historical retention.

To put it another way, would you make a copy of your hard drive and store that copy on the same hard drive? No, wouldn't be a very good plan now would it?

So to summarize, I define backups as a copy of data from a source to a destination with historical retention. Anything else and you're asking for problems.

If you have any questions or feel something is in error, please contact me below:

Charles
@whitehattechs

Comments