created Tuesday, 19 March 2002
updated Monday, 14 October 2002
cpio is an archive program, sort of like tar. It is commonly available on Unix & Unix-like systems, including Gnu/Linux.
This article is a quick introduction for using cpio.
To extract files from an archive, use the -i
(copy
in) command line option for cpio. That will tell
cpio to read an archive from stdin
& to
extract the files from it.
So, assuming the archive is compressed, do this:
bzcat dir.cpio.bz2 |cpio -i
(If you're confused or concerned about my use of bzip2, you might want to read my short section about bzip2 or gzip?, then come back here & continue reading this article.)
cpio creates archives differently than tar.
Where tar automatically recurses into subdirectories,
cpio reads from stdin
a list of files &
directories to archive; it does not automatically recurse into
directories.
To create an archive, give cpio the -o
(copy out) command line option. cpio will read a list of
files & directories from stdin
, create the
archive, & write the archive to stdout
.
A good way to generate the list of files is the find program.
To archive everything in a directory, compress it with bzip2, & write the results to a file, do this:
find dir -print |cpio -o |bzip2 >dir.cpio.bz2
That's the generic way to create an archive. On a Gnu/Linux system, you might get a lot of ugly warnings about i-node numbers being truncated. The archive will be fine, but it's never good to have unnecessary errors in the output; the eye-sore might prevent you from seeing important error messages. To prevent all those warnings, type this:
find dir -print |cpio -o -Hnewc |bzip2 >dir.cpio.bz2
A potential problem is that "-Hnewc
" is not
portable to all implementations of cpio. So either you
must know when it's okay to use it or you must avoid using it &
suffer with the gratuitous warning messages.
So far, we've created archives of all files in a directory tree.
In other words, we've reproduced the functionality of tar
but at the cost of more key strokes. Not very impressive. Since
cpio reads a list of files from stdin
, we can
do a lot more.
If you want to create a distribution archive of your source
code, leaving out object files (*.o
), backup files
(*~
), and CVS & RCS directories, just take
advantage of the features of find that you already know
& love.
find dir \ -name "*.o" -o \ -name "*~" -o \ -name CVS -prune -o \ -name RCS -prune -o \ -print \ |cpio -o -Hnewc |bzip2 >dir.cpio.bz2
(I've broken the example into multiple lines for readability. You'd either type the command on a single command line, or you'd break it into multiple lines, as I've done, by including the back-slashes (\) literally.)
Need to backup just the files that have changed since your last backup yesterday? Trivial!
find dir -ctime -1 -print \ |cpio -o -Hnewc |bzip2 >dir.cpio.bz2
By using find to generate the list of files, you can make cpio archive any combination of files you want. It's easy to use find from your own shell scripts, too, or you could even use your own programs to generate the list of file names. cpio achieves great flexibility by leaving the file-selection responsibilities to another program.
Some (most? all?) cpio implementations are able to access file systems & tapes through a cpio server on another host. A benefit there is that you can use cpio to archive files from one host but write the archive file to, say, the tape drive on another host. I've found this useful in cases where I needed to backup large amounts of data to a tape drive, but the tape drive was on a server that didn't have enough disk space to hold a temporary copy of the entire archive, so I had to go directly to tape.
To use this feature, use the -O
(that's a capital
O) command line option in conjunction with the
user@host:pathname
method
of specifying the destination file. See "man cpio
" for
details.
Similarly, you can use the -I
command line option
to extract files from tape archives mounted on servers.
As cool as it sounds, this feature has some draw-backs.
System-specific command line options & device-file names are
often necessary. For example, you might have to force special block
sizes with -B
or --block-size
, or you
might have to use system-specific device file names, such as
/dev/st/n0a1bf00a
or something similarly
incomprehensible. Also, systems sometimes behave as though the
communication between the client (your cpio process) &
the server are treated as text, so non-text characters &
end-of-lines get mangled. In other words, it sometimes just doesn't
work.
In those cases, I've often made it work by using rsh and dd explicitly. In other words:
find . -print |cpio -o -Hnewc \ |rsh server dd bs=32kb of=/dev/st0
(The values for block size (bs
) & output file
(of
) are system-specific, of course, & might
differ for you.)
I don't mean this article to persuade people to use cpio instead of tar. tar is fine; I mean mostly to help people learn to use cpio if they are faced with such an archive (probably because that's what I usually give to people unless they instruct me differently). Nevertheless, I can't help but do some comparisons.
The main advantage cpio has over tar is that it's easier to archive only some of the files in a directory. That benefit comes to us because cpio reads a list of files to archive instead of assuming it should recurse into directories & archive all files. Modern implementations of tar have similar features, but they are not as flexible as the file-selection features of find or of your own program. What's more, you have to learn the file-selection language of tar, whereas you already know the file-selection language of find, & that knowledge can be applied to any file-selection task that's appropriate for find. In other words, you must know find anyway, so why not re-use that knowledge with your archiver (cpio) instead of learning a less capable, less general archiver-specific system?
cpio archives are usually noticeably smaller than tar files.
bash-2.04$ for D in phil skeleton camano tigris; do > (cd /space/gene-1/src; find $D -print |cpio -o -Hnewc |bzip2 -9) >$D.cpio.bz2 > (cd /space/gene-1/src; tar cf - $D |bzip2 -9) >$D.tar.bz2 > done 10903 blocks 1783 blocks 13996 blocks 504 blocks bash-2.04$ ls -l total 3332 -rw-rw---- 1 gene gene 861535 Mar 19 18:26 camano.cpio.bz2 -rw-rw---- 1 gene gene 866475 Mar 19 18:27 camano.tar.bz2 -rw-rw---- 1 gene gene 663206 Mar 19 18:26 phil.cpio.bz2 -rw-rw---- 1 gene gene 662529 Mar 19 18:26 phil.tar.bz2 -rw-rw---- 1 gene gene 110668 Mar 19 18:26 skeleton.cpio.bz2 -rw-rw---- 1 gene gene 111623 Mar 19 18:26 skeleton.tar.bz2 -rw-rw---- 1 gene gene 46374 Mar 19 18:27 tigris.cpio.bz2 -rw-rw---- 1 gene gene 46633 Mar 19 18:27 tigris.tar.bz2
You can see that the cpio archives are smaller, but here's a table to show the relative sizes. The right-most column shows the relative size of the cpio archive in terms of the the tar archive. Smaller numbers indicate that the cpio archive was smaller.
size | |||
---|---|---|---|
base name | cpio (bytes) | tar (bytes) | relative |
camano | 861535 | 866475 | 0.994 |
phil | 663206 | 662529 | 1.001 |
skeleton | 110668 | 111623 | 0.991 |
tigris | 46374 | 46633 | 0.994 |
A disadvantage with cpio, compared to tar, is that you must type more characters to use it. Even in the simplest case, recursively archiving a directory tree using the default archive format, requires more typing. Observe the differences between these two command line:
find . -print |cpio -o >../archive.cpio tar cf ../archive.tar .
"cpio" stands for "copy in, copy out". The copy part comes from "cp", which is the Unix copy program.
cpio comes to us from AT&T from the early 1980s, if not earlier. It is not used often; tar has that honor, but I found cpio because I was forced to exchange files between two systems that had incompatible versions of tar. The systems' administrator was unwilling to update the tar implementations, so I had to find an alternative. cpio worked just fine, & since then, I have not found a Unix or Unix-like system that had a cpio that could not work with some other Unix's cpio. In other words, cpio archives appear to be very portable.
To achieve that portability when you create archives, always use
-Hnewc
or the default archive format (no
-H
option at all) unless specific experience shows
that another -H
value is required.
Ignore the pass-through (-p
) function of
cpio. Use find instead.
On MS-DOS (including Windows), where pipes are treated as text,
always use the -O
or -I
command line
option to specify the output archive or the input archive. That's a
real bummer, but life sucks. (More specifically, MS-DOS (which
includes Windows) is naïve.)
Many modern implementations of cpio (and tar) are able to read or write all manner of archive file formats. Don't use these; they are not portable. Use cpio for archives in the cpio format. Use tar for archives in the tar format.
Similarly, some modern implementations allow you to instruct cpio to run your compression program on the archive. Don't do this; it is not portable. Instead, run the compression program separately & explicitly. On Unix, use a pipe to connect cpio & the compressor.
I prefer bzip2, so I've used it in my examples. gzip would work just as well. The two programs even share most of the important command line options. So you could substitute gzip wherever you see bzip2, & you could substitute gzcat or zcat wherever you see bzcat.