created Tuesday, 19 March 2002
updated Monday, 14 October 2002
cpio is an archive program, sort of like tar. It is commonly available on Unix & Unix-like systems, including Gnu/Linux.
This article is a quick introduction for using cpio.
To extract files from an
archive, use the -i
(copy in) command line option for
cpio. That will tell
cpio to read an archive
from stdin &
to extract the files from it.
So, assuming the archive is compressed, do this:
bzcat dir.cpio.bz2 |cpio -i
(If you're confused or concerned about my use of bzip2, you might want to read my short section about bzip2 or gzip?, then come back here & continue reading this article.)
cpio creates archives
differently than tar. Where
tar automatically recurses
into subdirectories, cpio
reads from
stdin
a list of files & directories
to archive; it does not automatically
recurse into directories.
To create an archive,
give cpio the -o
(copy out) command
line option. cpio will read a list
of files & directories from stdin,
create the archive, & write the archive to
stdout.
A good way to generate the list of files is the find program.
To archive everything in a directory, compress it with bzip2, & write the results to a file, do this:
find dir -print |cpio -o |bzip2 >dir.cpio.bz2
That's the generic way to create an archive. On a Gnu/Linux system, you might get a lot of ugly warnings about i-node numbers being truncated. The archive will be fine, but it's never good to have unnecessary errors in the output; the eye-sore might prevent you from seeing important error messages. To prevent all those warnings, type this:
find dir -print |cpio -o -Hnewc |bzip2 >dir.cpio.bz2
A potential problem is that
"-Hnewc"
is not portable to all implementations
of cpio. So either you must
know when it's okay to use it or you
must avoid using it & suffer with
the gratuitous warning messages.
So far, we've created archives of
all files in a directory tree. In
other words, we've reproduced the
functionality of tar but
at the cost of more key strokes.
Not very impressive. Since cpio
reads a list of files from stdin,
we can do a lot more.
If you want to create a distribution archive
of your source code, leaving out object files (*.o),
backup files (*~), and
CVS & RCS directories, just take advantage
of the features of find that you
already know & love.
find dir \
-name "*.o" -o \
-name "*~" -o \
-name CVS -prune -o \
-name RCS -prune -o \
-print \
|cpio -o -Hnewc |bzip2 >dir.cpio.bz2
(I've broken the example into multiple lines for readability. You'd either type the command on a single command line, or you'd break it into multiple lines, as I've done, by including the back-slashes (\) literally.)
Need to backup just the files that have changed since your last backup yesterday? Trivial!
find dir -ctime -1 -print \ |cpio -o -Hnewc |bzip2 >dir.cpio.bz2
By using find to generate the list of files, you can make cpio archive any combination of files you want. It's easy to use find from your own shell scripts, too, or you could even use your own programs to generate the list of file names. cpio achieves great flexibility by leaving the file-selection responsibilities to another program.
Some (most? all?) cpio implementations are able to access file systems & tapes through a cpio server on another host. A benefit there is that you can use cpio to archive files from one host but write the archive file to, say, the tape drive on another host. I've found this useful in cases where I needed to backup large amounts of data to a tape drive, but the tape drive was on a server that didn't have enough disk space to hold a temporary copy of the entire archive, so I had to go directly to tape.
To use this feature, use the
-O (that's a capital
O) command line option in
conjunction with the
user@host:pathname
method of specifying the destination file.
See "man cpio"
for details.
Similarly, you can use the
-I command line option
to extract files from tape archives
mounted on servers.
As cool as it sounds, this feature
has some draw-backs. System-specific
command line options & device-file names
are often necessary. For example,
you might have to force special block
sizes with -B
or --block-size, or you might
have to use system-specific device file
names, such as /dev/st/n0a1bf00a
or something similarly incomprehensible.
Also, systems sometimes behave as though the
communication between the client (your
cpio process) & the server
are treated as text, so non-text characters
& end-of-lines get mangled. In other
words, it sometimes just doesn't work.
In those cases, I've often made it work by using rsh and dd explicitly. In other words:
find . -print |cpio -o -Hnewc \ |rsh server dd bs=32kb of=/dev/st0
(The values for block size (bs) &
output file (of) are
system-specific, of course, & might
differ for you.)
I don't mean this article to persuade people to use cpio instead of tar. tar is fine; I mean mostly to help people learn to use cpio if they are faced with such an archive (probably because that's what I usually give to people unless they instruct me differently). Nevertheless, I can't help but do some comparisons.
The main advantage cpio has over tar is that it's easier to archive only some of the files in a directory. That benefit comes to us because cpio reads a list of files to archive instead of assuming it should recurse into directories & archive all files. Modern implementations of tar have similar features, but they are not as flexible as the file-selection features of find or of your own program. What's more, you have to learn the file-selection language of tar, whereas you already know the file-selection language of find, & that knowledge can be applied to any file-selection task that's appropriate for find. In other words, you must know find anyway, so why not re-use that knowledge with your archiver (cpio) instead of learning a less capable, less general archiver-specific system?
cpio archives are usually noticeably smaller than tar files.
bash-2.04$ for D in phil skeleton camano tigris; do > (cd /space/gene-1/src; find $D -print |cpio -o -Hnewc |bzip2 -9) >$D.cpio.bz2 > (cd /space/gene-1/src; tar cf - $D |bzip2 -9) >$D.tar.bz2 > done 10903 blocks 1783 blocks 13996 blocks 504 blocks bash-2.04$ ls -l total 3332 -rw-rw---- 1 gene gene 861535 Mar 19 18:26 camano.cpio.bz2 -rw-rw---- 1 gene gene 866475 Mar 19 18:27 camano.tar.bz2 -rw-rw---- 1 gene gene 663206 Mar 19 18:26 phil.cpio.bz2 -rw-rw---- 1 gene gene 662529 Mar 19 18:26 phil.tar.bz2 -rw-rw---- 1 gene gene 110668 Mar 19 18:26 skeleton.cpio.bz2 -rw-rw---- 1 gene gene 111623 Mar 19 18:26 skeleton.tar.bz2 -rw-rw---- 1 gene gene 46374 Mar 19 18:27 tigris.cpio.bz2 -rw-rw---- 1 gene gene 46633 Mar 19 18:27 tigris.tar.bz2
You can see that the cpio archives are smaller, but here's a table to show the relative sizes. The right-most column shows the relative size of the cpio archive in terms of the the tar archive. Smaller numbers indicate that the cpio archive was smaller.
| size | |||
|---|---|---|---|
| base name | cpio (bytes) | tar (bytes) | relative |
| camano | 861535 | 866475 | 0.994 |
| phil | 663206 | 662529 | 1.001 |
| skeleton | 110668 | 111623 | 0.991 |
| tigris | 46374 | 46633 | 0.994 |
A disadvantage with cpio, compared to tar, is that you must type more characters to use it. Even in the simplest case, recursively archiving a directory tree using the default archive format, requires more typing. Observe the differences between these two command line:
find . -print |cpio -o >../archive.cpio tar cf ../archive.tar .
"cpio" stands for "copy in, copy out". The copy part comes from "cp", which is the Unix copy program.
cpio comes to us from AT&T from the early 1980s, if not earlier. It is not used often; tar has that honor, but I found cpio because I was forced to exchange files between two systems that had incompatible versions of tar. The systems' administrator was unwilling to update the tar implementations, so I had to find an alternative. cpio worked just fine, & since then, I have not found a Unix or Unix-like system that had a cpio that could not work with some other Unix's cpio. In other words, cpio archives appear to be very portable.
To achieve that portability when you
create archives, always
use -Hnewc or the default
archive format (no -H
option at all) unless specific
experience shows that another
-H value is required.
Ignore the pass-through (-p)
function of cpio. Use
find instead.
On MS-DOS (including Windows), where
pipes are treated as text, always use the
-O or -I
command line option to specify the output
archive or the input archive.
That's a real bummer, but life sucks.
(More specifically, MS-DOS (which includes
Windows) is
naïve.)
Many modern implementations of cpio (and tar) are able to read or write all manner of archive file formats. Don't use these; they are not portable. Use cpio for archives in the cpio format. Use tar for archives in the tar format.
Similarly, some modern implementations allow you to instruct cpio to run your compression program on the archive. Don't do this; it is not portable. Instead, run the compression program separately & explicitly. On Unix, use a pipe to connect cpio & the compressor.
I prefer bzip2, so I've used it in my examples. gzip would work just as well. The two programs even share most of the important command line options. So you could substitute gzip wherever you see bzip2, & you could substitute gzcat or zcat wherever you see bzcat.
End.
Copyright © 2002 by Gene Michael Stover. Permission to copy, store, & view this document unmodified & in its entirety is granted. All other rights are reserved.
$Header: /home/gene/library/website/docsrc/cpio-howto/RCS/index.html,v 395.1 2008/04/20 17:25:55 gene Exp $