[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Backing up entire disk -- raw disk v. filesystem v. tree



* TOP POST SECTION ...

Breaking away from products/utilities and more "technology," there are
basically three approaches:

1.  "raw disk" -- sector by sector copy of everything
2.  "filesystem" -- used filesystem meta-data and data
3.  "tree" -- a select set of directories

#1 is typically done with utilities like "dd".  If you are backing up a
slice (partition), then it must not have a read/write mounted
filesystem.  If you are backing up the entire disk, including the disk
label (partition table), then no slice (partition) must have a
read/write mounted filesystem.  Otherwise the restore could be
inconsistent.  A "raw disk" backup is typically filesystem agnostic.

#2 is typically done with utilities like "dump" -- including
filesystem-specific utilities (e2fsdump, xfsdump, ufsdump, etc...). 
They only copy the meta-data and data that is used -- so it's fairly
compact.  Some filesystem dumps are "safe" in how they backup -- meaning
they _can_ backup the filesystem into a "consistent state" even if
mounted.  Some are semi-consistent, meaning some files (data) could be
corrupted, but the dump as a whole (crucial meta-data) is consistent.  A
"filesystem" backup is typically very filesystem _dependendent_.

#3 is typically done with "ustar" streaming utilities like cpio, tar and
their direct replacement, pax, as well as "ustar" compatible utilities
like GNU tar (full "ustar" compatibility varies, long story), afio (my
favorite, cpio with per-file compression), star (the only full
POSIX-2001/SUSv3 "ustar" compliant utility in release) and the like. 
These backup portions of the filesystem into an OS/filesystem-agnostic
format, although varying meta-data storage may cause information to be
lost (i.e., the format doesn't support some capability of the filesystem
it was backed up from).  pax is a good utility to learn, because it's
included with NT5+ (2000, XP, 2003) as well as all newer UNIX/Linux
systems.

In addition to "off-line" streaming backup that put all files into one
big file/stream -- aka "archiving" -- there are "on-line" backup
utilities such as "rsync."  You can use "rsync" to backup from one
system to another, and only "deltas" will be passed (the portions of
files that changed).  Rsync is from one of the Samba co-founders (who
co-authored his PhD thesis on the algorithm).

I also have a backup that backs files into an .iso (ISO9660 Yellow Book)
aka "CD/DVD Image" while adding per-file compression.  It gives you the
best of both worlds -- direct access of the files on CD (_not_ requiring
you to "unarchive" tar or something else first), while compressing for
space.  To read more, see the 2002 April issue of "Sys Admin" magazine: 
  http://www.samag.com/documents/sam0204c/  
 
Some 3rd party utilities -- e.g., Ghost, DriveCopy/Image, etc... are a
hybrid of #1 and #2.  They can do #1, and they can do #2 for filesystems
they understand.  There are several Linux equivalents that bundle
similar functionality into a single program -- including resizing,
moving and copying NTFS filesystems (which is safe to do, as long as you
don't modify the underlying meta-data, which is _never_ safe to with the
NTFS other than with the registry-SAM that created it).

- Overcoming consistency issues

Raw disk backups are not consistent at all unless the filesystems/media
are off-line, and there are few options to do them on-line outside of
total virtualization (mega-$$$).

Filesystem dumps may offer some consistency built-in.  Others may not. 
Some OSes offer volume management and "snapshots" which keep track of
changes to the filesystem from the point of snapshot forward -- aka a
"reverse delta" (if you know what that is).  I.e., a "snapshot" of a
filesystem are the original meta-data/data blocks that have changed
since the snapshot while the "on-line" filesystem is the latest.  You
then backup from the snapshot -- be it a "dump" or "tree" backup --
which the volume management then makes appear as a "complete filesystem"
by taking the current filesystem state, and applying the reverse deltas
back to the the state as the time of the snapshot.

Veritas, IBM, HP and many other vendors offer snapshot capability for
NTFS, Ext3, etc...  Some OSes come with their own -- sometimes
snapshots, sometimes "off-line mirrors" (full copies that are part of a
RAID-1 mirror-set, but can be taken "off-line"), etc...  Linux LVM and
LVM2 offer GPL snapshot capability c/o Red Hat's Sistina acquisition.

- A note on recoverability and compression ...

When it comes to dd, any byte error can render the backup useless.
You may be able to recover with a "fsck" but it's a mega-PITA.

Most "dump" formats will only lose data affected by the bytes that have
errors, unless a major "meta-data" block is hit and then an entire
portion of the format could be lost.

Streaming formats like "ustar" are very recoverable, and far more 
recoverable than "directory" formats like PKZip.

Unfortunately, when you use a compressor, you destroy the ability of the
underlying format to recover itself, relying on the recoverability of
the compressor.

Most tape backup systems with compression use a block algorithm with a
good recovery logic.  As such, they are fairly recoverable, isolating
errors to only those blocks with an error.  So as long as the underlying
backup format is a recoverable format, you should be okay.

LZ77 (gzip, also used by PKZip) is virtually unrecoverable after the
point of error, even though it is supposedly a block compressor.

BWT (bzip2) is a bit better at block recovery, but it's not ideal.

LZO (lzop) is not very recoverable at all, like gzip.

A replacement for cpio called "afio" writes "ustar" backups with
per-file compression of each file in the backup.  This improves
recoverability drastically, while adding compression, all the meanwhile
it is still fully "ustar" compliant (if a cpio/tar/pax version
extracts the files, they will just be compressed).

I wrote a HOWTO on afio (starts a few paragraphs into the post):  
http://www.matrixlist.com/pipermail/leaplist/2002-December/026072.html  

- Restoring

Restoration varies on approach.

Raw disk formats like "dd" can be mounted as a virtual filesystem on the
loopback device.  E.g., mount -t ext3 rootfs.dd /mnt/rootfs -o loop,ro
You can then copy files from there.

Filesystem formats like "[xxx]dump" typically let you restore in a
relative path in a semi-interactive fashion.  E.g., I can cd to /tmp or
another directory, use "[xxx]restore -i" and then add/extract files into
paths under /tmp.

Tree formats like streaming formats such as "ustar" omit the leading
slash/absolute paths by default so they are stored as relative paths. 
So you can cd to /tmp or another directory, and extract the tree or
portions of the tree.


* BOTTOM POST SECTION ... (everyone quoted explicitly)

From: clangin@siu.edu <clangin@siu.edu>
> Greetings,
> Is it possible to back up and restore an entire disk?

From: fiaid@quasi-sane.com
> What OS are we talking here?  Solaris is easy as crap, linux could be just 
> as easy.  And how do you want to do this?  Are you looking for a hot 
> backup in teh sense that the system is running or are you going to get a 
> point in time backup or some sort of a cold backup?

From: clangin@siu.edu
> SuSE.  Yes, a cold mirror is the right idea.  Thanks!

How does SuSE Cold Mirror work?  Rsync?


From: Casey Boone <caseyboone@gmail.com>
> well you could use dd and copy it bit for bit to another disk

From: clangin@siu.edu
> I'm just learning about dd.

From: Casey Boone <caseyboone@gmail.com>
> are you wanting something to use while the disk is "online" (ie, you
> have booted off of it and are running the system currently) or do you
> want something you can use offline?

See my discussion above for backing up and restoring "on-line."


From: clangin@siu.edu
> If the first disk became unusable, would I be able to just physically
> swap disks and continue at the spot where the first disk was copied
> via dd?  (Assuming the jumper was changed to master on the replacement
> disk.)

You could.  But dd is typically not an ideal method of recovery.
A "dump" of filesystems would be far better.


From: clangin@siu.edu <clangin@siu.edu>
> It seems to me like when the data was restored, that
> it would be overwriting the files of active processes,
> and I don't understand if that would be possible.

From: fiaid@quasi-sane.com
> This should only come into play if you are running some sort of a database 
> or a system that does active file writes or changes.  A webserver should 
> be easy to backup as it is pretty static.  As remember that alot of thoe 
> files for active processes are help in memory rather than on disk, or in 
> the sense of swap they are there, but virtually.

Most databases and other programs that maintain databases have their own
"dump" program.  They take a snapshot of their internal, consistent
structure and write to file.  You then backup that copy.

In fact, if you do a volume/filesystem "snapshot," you typically want to
either off-line the database or create a database dump _before_ you
snapshot.  If you took the database off-line, then don't bring it back
on-line until _after_ the volume/filesystem snapshot is taken.


From: clangin@siu.edu <clangin@siu.edu>
> If the above is not possible, would this be possible?
> Make an identical second disk.  If the first disk
> is corrupted, just swap disks.

From: fiaid@quasi-sane.com
> I did this all the time with Solaris,there is a cold spare script floating 
> around on the web.  I'll try and find it and link it out here for you.  
> But, we did this with all of our production machines and everytime the OS 
> was updated or a significant change was made it was allowed a burn in 
> period then would mirrored to the cold mirror.
> http://www.blacksheepnetworks.com/security/resources/coldmirroring20010306.html
> This is the script that we used for Solaris, but you can use it as a 
> starting point to create something for use in linux or whatever OS you are 
> in.

Looks like you are doing point volume management with the "pv" tools.
Linux LVM/LVM2 has similar, although I doubt it is as capable at this
time.


From: Robert G. (Doc) Savage <dsavage@peaknet.net>
> Chet,
> dd is the preferred utility for doing bit-for-bit disk copies. The best
> way to do this is to first download and burn a Helix 1.5 CD, then boot
> from it on the source machine. Helix is a forensic version of Knoppix
> that will locate all partitions on your source machine for you. All of
> its utilities (dd, netstat, ps, ...) are statically compiled and run
> from the CD. Helix will allow you to create images of the whole disk, or
> of each partition unmounted (the only safe and meaningful way).

Having a CD bootable distro you trust is golden.

I've used Mindi Boot (from Mondo Rescue) to create a bootable CD for my
system, although I don't totally trust the other portions of the project
(i.e., Mondo Rescue itself).


From: Robert G. (Doc) Savage <dsavage@peaknet.net>
> If you lack space on your source machine to hold the images (very
> likely), you can set up a netcat listener on a target machine that does.
> Then start dd and pipe its output through a local netcat session to the
> listener: ... cut excellent netcat howto ...

As Doc pointed out, with a streaming or blocking program, you can stream
output to a remote system.  I show several examples of this with afio in
my HOWTO (see the link above) -- including to a remote tape.  You'll
want to use the "buffer" utility to buffer writes to any remote
tape/optical drive -- otherwise performance will be crap.

netcat, rsh/ssh, etc... they can do a lot!

Some utilities, like rsync, have built-in rsh/ssh support (that's
typically how it works, although you can setup rsync as a service/port).



-- 
Bryan J. Smith                                  b.j.smith@ieee.org 
------------------------------------------------------------------ 
"Communities don't have rights. Only individuals in the community
 have rights. ... That idea of community rights is firmly rooted
 in the 'Communist Manifesto.'" -- Michael Badnarik



-
To unsubscribe, send email to majordomo@silug.org with
"unsubscribe silug-discuss" in the body.