Disclaimer: what I am writing here will apply to most flavors of Linux, and even most UNIXes, but there might be a thing or two that is specific to RedHat 7.1, which is what I am using now. It has GNU tar 1.13.19 and gzip (GNU zip) 1.3. I'll try to keep it as general as I can.

tar will combine several files into a single file. It is often used in conjunction with gzip which compresses the resulting file. Zip in Windows and StuffIt in Mac OS usually do both of these in one step. tar can also do both in one step if given the proper arguments. Speaking of which, StuffIt and most Windows Unzipping utilities handle tarred & gzipped files just fine.

As with everything, there are a million ways to do this. Here is one example. (By the way, this page is mostly here just to remind myself how I like to use it.) Imagine you're in your home directory and have a directory called 'pictures' that you want to tar and gzip. Issue this command:

tar cfzv pics.tgz pictures/*

tar is the command that we're all here to study,
cf will create a file with the name specified (pics.tgz),
z will zip the archive,
v will list ('verbose') the files being archived (optional),
and pictures/* means 'everything in the directory pictures'.

Tarred and gzipped files are often named with '.tar.gz' or '.tgz' extensions. Like I said, there are a million options and a million ways to do this; this is just the one I usually use. Two things are worth pointing out: as shown, this command is recursive, so if you have subdirectories inside of 'pictures', they and their contens will be in the archive. When unzipped, the directory structure will be restored. Also, you aren't stuck with getting everything. If you make it 'pictures/*.gif', you'll get only GIFs; if you say 'pictures/*.jpg', you'll get just the JPEGs, etc. Including 'v' is a nice way to see right away which files are being included, and you can even pipe the whole thing through 'less' if there's a lot to see:
tar zcvf jpegs.tgz pictures/*.jpg | less

Like I said, 'tar' connects files into a single file, and 'gzip' compresses things. I have a directory called 'test' which contains two files, 1.txt and 2.txt:

-rw-rw-r--    1 brian    brian          71 Jul 30 00:12 1.txt
-rw-rw-r--    1 brian    brian          68 Jul 30 00:12 2.txt

This is file number 1, which starts with a 'T' and ends with a period.

Here we have file 2, which starts with an 'H' and ends with a bang!
Now I'll issue tar cf test.tar test/* without the 'z' option and here's what I get:
-rw-rw-r--    1 brian    brian       10240 Jul 30 00:15 test.tar
10,240 bytes! I had to ask around about that one (and I'm disappointed in myself that I didn't notice such an obvious base-2 number) but tar, by default, has a 10k (10x1024) blocksize.

Zipping the file works pretty well. Here's what you get from the whole command, `tar cfvz test.tar test/*`:

-rw-rw-r--    1 brian    brian         221 Jul 30 00:25 test.tgz
221 bytes is more than the sum of the two original files, but that's still not much overhead. In just about any nontrivial case you'll come out ahead. I just turned 42k worth of text into a 9k .tgz file.

If you `cat` that file, you'll get a bunch of gibberish (including a system bell or two) but only about 100 characters. Running `wc` on either file indicates that there are a bunch of hidden characters, because the character count does indeed match the byte count shown by `ls -l`. I'm sure there's tons of info on there about how 'tar' and 'gzip' work; I'm just rambling on about stuff I discovered while playing with it one night. Mostly, I just wanted to have a place to remind myself of the syntax, because I can never remember it.

One interesting thing: if you run `less` on either file (the .tar. or .tgz), you'll get what looks like results of `ls -l` on the original file. By the way, have you ever read the man page for `less`? Geez, it's like a whole operating system! All I ever use it for is to look at files when they're too big to fit onto one screen or for the equivelant of DOS's `dir /p`. `man less | wc -l` shows it at 1,782 lines. Yowza.

One final note: as far as I can tell, it doesn't matter what order you list tar's arguments in. Some people have told me it does, but when I tell them it seems to work fine in any order, they just say "Oh." I only listed them in that order so I could explaing them in the order they ocurred and have it make some sense. You're more likely to see `tar zcvf` just because (I guess) when you say the letters, they roll off the tongue a bit better that way. Doing a Google search for 'zcvf' yields 3740 results compared to 207 matches for 'cfzv'. If anyone wants to search on the other 22 options ([16 ways to arrange 4 letters] - [the 2 I used] + [8 ways to arrange 3 letters, since 'v' is optional]) and send me the results, go right ahead. :-)

[Brian Ashe's home page] [brianashe.com/linux]