by Gene Michael Stover
Sunday, 27 January 2002
There is sometimes disagreement between programmers on what method of IPC to use in Unix applications. Here I report the measurements of throughput, programming difficult, code sharing, & run-time flexibility of some IPC methods.
stdout
,
another program reads from
stdin
,
& you connect the two with the
pipe operator ('|
')
in a shell script.
Pipes can be considered as a method of
communication between software components
in which simple programs are the components
& shell is the scripting language.
In this respect, pipes are in the same
family as COM & Corba.
stdout
,
another program reads from
stdin
,
& you connect the two by running
the first program with its output
redirected to a temporary file, then
running the second program with its
input redirected from the temporary file.
(Then you remove the file, in case that
matters.)
This is closely related to pipes &
can be considered a kind of piping.
getmsg
,
msgsnd
,
&
msgrcv
in the Unix manual.
Naïve software developers believe that the most important thing about software is run-time speed. Let's satisfy their curiosity now.
I did some performance tests on two machines on my network, & here are the raw numbers.
Each row in a table is one "test". Each test (row in the table) shows the run-times for each of the IPC mechanisms I tested on one host. For example, the first row in the table (after the column headings) shows the results of running the message queue IPC method, the temporary file IPC method, the Unix socket IPC method, the pipe IPC method, plus two others (sanity checks & curiosity satisfiers) on Palsy.
The first column is the host name. Ebola is a 200 MHz Pentium with 32 MB of RAM running OpenBSD 2.7. Palsy is a 750 MHz Pentium 3 with 128 MB of RAM running Red Hat Linux 7.1 (Linux kernel 2.4, methinks).
Columns 2 through 7 are times for the IPC mechanims. For each IPC mechanism in a test, a generator program uses the IPC mechanism to send 512 megabytes of randomly generated, printable characters to a sink program. The time required for the generator to send everything & the sink to receive everything is the result of using that IPC mechanism in that test. The time include connection start-up & tear-down.
I chose 512 megabytes because
I chose to send randomly generated printable characters because it's a way to simulate real-world data being transmitted but with a small development cost & a small run-time cost.
Regardless of the IPC mechanism, the generator transmitted data to the sink in chunks of fixed size. In all cases, the chunk size was 1024 bytes. I chose this because
Larger chunk sizes tend to produce higher throughput, but the largest chunk size that would work on Ebola was 1024. Besides that, the purpose of these tests is to compare the throughputs of the IPC mechanisms, not to find any particular mechanism's maximum throughput. I used the same chunk size for all IPC mechanisms in the hope of giving all IPC mechanisms the same advantage, which is equivalent to giving no advantage to any IPC mechanism.
The IPC mechanims are named in the first column. They are:
src/qgenerator.c
, &
the sink was
src/sink.c
.
stdout
.
I redirected that to a temporary file.
When the generator was done, I ran
the sink & redirected its input
from the temporary file. Then I
deleted the file. ("I" didn't
do all that manually. It was in a
shell script.)
In this test, the
the generator was src/generator.c
,
and the sink was src/sink.c
.
>/dev/null
stdout
was redirected to /dev/null
.
There was no sink.
In this test,
the generator was src/generator.c
.
|sink
stdout
was piped into the sink's
stdin
.
In this test, the
the generator was src/generator.c
,
and the sink was src/sink.c
.
|cat >/dev/null
stdout
was piped into cat
, &
cat's output was redirected to
/dev/null
.
This was sort of to get a feel for
the efficiency of the src/sink.c
program.
In this test, the
the generator was src/generator.c
,
and the sink was src/sink.c
.
Generator/sink pairs were separate
programs run from a simple shell script.
(Those are the
src/run-*.sh
scripts in the source code.)
A wrapper shell script
(src/throughput.sh
)
timed the shell scripts that
ran the generator/sink pairs.
Timing was done by running the
common Unix
date
program before & after the
pair-running script, then taking
the difference. The resolution of
this method of timing is 1 second.
I ran multiple tests on each host so that noise would average-out when I analyzed the results. Noise might come from the (lack of) resolution of the timing mechanism & from periodic background processes run by the operating system. (During the tests, the hosts were not running any user programs at all, & I didn't even access the common LAN from other hosts, but common & standard Unix operating system processes were still running.)
(I shouldn't need to point this out, but I have a sinking feeling that I should: Since the table reports run-times, smaller numbers indicate higher throughput, & "Higher throughput" is a kind of better performance.)
Finally, here's the table of performance results.
hostname | time (sec) | |||||
---|---|---|---|---|---|---|
msgq | tmp file | unix | pipe | |cat >/dev/null |
>/dev/null |
|
palsy | 67 | 101 | 67 | 66 | 66 | 64 |
palsy | 67 | 101 | 67 | 66 | 66 | 64 |
palsy | 67 | 101 | 67 | 65 | 66 | 65 |
palsy | 67 | 101 | 69 | 66 | 66 | 64 |
palsy | 68 | 101 | 69 | 66 | 66 | 64 |
palsy | 67 | 103 | 68 | 66 | 66 | 64 |
palsy | 67 | 101 | 69 | 65 | 66 | 65 |
palsy | 68 | 100 | 69 | 66 | 66 | 65 |
ebola | 479 | 432 | 371 | 356 | 355 | 340 |
ebola | 477 | 434 | 370 | 354 | 357 | 340 |
ebola | 478 | 432 | 369 | 353 | 356 | 340 |
ebola | 478 | 432 | 370 | 352 | 356 | 340 |
I tracked development times
on a wall clock while I wrote
the generator/sink pairs of
programs. Here is a table
of development times, but notice
that
the first pair of
programs,
generator
and
sink
,
are used in four IPC methods.
So, depending on your demands,
their development cost might
amortize to 8.5 minutes per
IPC method.
method | program pair | development time (hh:mm) |
---|---|---|
pipe, tmp file, |cat , /dev/null |
generator.c, sink.c | 0:34 |
msgq | qgenerator.c, qsink.c | 1:24 |
Unix socket | ugenerator.c, usink.c | 1:03 |
Well, I'm surprised. I expected message queues to have a marginally higher throughput than the other methods of IPC. It looks like pipes are the fastest method of IPC that I tested. This is not surprising because it is the original method of IPC on Unix, so implementors have had plenty of time to optimize it. What's more, other benefits of pipes cause it to be a common method of IPC on Unix, which gives implementors still more motivation to optimize pipes.
Because each IPC mechanism was used the same number of times in each combination of hostname, data size (always 512 megabytes), & chunk size (always 1024 bytes), we can obtain an estimate of total work for each IPC mechanism over all the tests by summing each column from the first table (the one that showed run times for each test). These sums are the cumulative work for each IPC mechanism over all the tests, & they are valid as long as each test summed includes all the IPC mechanisms. The sums produce the first row in the following table.
From those sums, we can get a relative measure of efficiency. We do that by finding the largest sum. (Rememeber that the sums are seconds, so larger numbers means less throughput.) That largest sum turns out to belong to the tmp file (temporary file) IPC method, & it is 2539 seconds. We divide that number by each of the sums. The quotient for an IPC method is that IPC method's efficiency compared to the slowest of the methods. (The efficiency of the tmp file method will be 1.0, since it's sum is the numberator in all the divisions.) These quotients are the second column in the following table.
Here's that's table, keeping sort of the same columns as the previous run-time table for easy reading. The first row holds the sums; each column is a number of seconds. The second row holds the quotients; they are multiples of the througput of the tmp file method of IPC. Larger values in the second row indicate higher throughput.
over-all | method | |||||
---|---|---|---|---|---|---|
msgq | tmp file | unix | pipe | |cat >/dev/null |
>/dev/null |
|
sum (sec) | 2450 | 2539 | 2025 | 1941 | 1952 | 1875 |
throughput (relative) |
1.03 | 1.00 | 1.25 | 1.30 | 1.30 | 1.35 |
From the second row, you
can see that pipes are the fastest
method of IPC. (Redirecting to
/dev/null
has a higher
throughput, but it's not exactly a
method of IPC because the data never
reaches another program, it just
disappears into /dev/null
.)
Unix sockets are a close second in
throughput, with message queues being
only slightly faster than temporary files.
Think about flexibility. To put us on the same page, let me tell you how the test programs were implemented.
I had two write six programs:
generator
writes to
stdout
, &
sink
reads from
stdin
,
qgenerator
writes to a message queue, &
qsink
reads from that queue,
ugenerator
writes to a socket in the
Unix domain &
usink
reads from that socket.
qgenerator
&
qsink
are good for
exactly one thing: reading &
writing the message queue they share.
You can't send their data through
any of the standard Unix utilities
such as wc
,
awk
, or any of the hundreds
of others. You can't put their data
in a file or an e-mail message. Their
computations, their code, & their data are
bound to their communication method.
Nearly the same can be said for
ugenerator
&
usink
, though they
benefit from some
code reuse.
The same cannot be said of
generator
&
sink
.
That single pair of programs,
with no increase in complexity,
implements two IPC methods (piping &
redirecting) without any increase in
complexity in the source code. (Look
at the source code if you don't believe
me. The source for generator.c
is simpler than that of the other generators,
& the source for sink.c
is simpler than that of the other sinks.)
How can this be? It's because their IPC
mechanism is external to them. They
read stdin
or write
stdout
, & other programs
can redirect or pipe that where they want.
generator
& sink
could be combined with standard Unix
utilites such as wc
,
file compressors, & e-mail.
They can even be used over a network via
rsh
,
telnet
, or
some other, pipe-based command
line networking utility (which would
itself be flexible due to the same
pipe IPC mechanism that
generator
&
sink
use).
Code reuse occurs on three levels:
source code,
object code (.o
files
& linkable libraries), &
executable code (the program files
you actually run).
Source code reuse occurs when
the developer types one set of
source code & uses it in
different places. A common example
of this is templates in C++ (which
I did not use in these tests). It
also happens when you use macros
in C. Notice that with C++ templates
& C macros, the same source code
is (probably) used to produce different
chunks of object code. For example,
if I write a function template in C++
& then call it with an integer argument
in one place & a char *
argument in another place, the function
is compiled to object code twice; once as
a function of integers & a second time
as a function of char *
.
Source code reuse is the weakest form
of code reuse; it saves you the least.
Object code reuse occurs when you compile a chunk of source code once & link with it. This implies that the source code was reused. So with object code reuse, the developer write the source code once & the compiler compiled it once. Object code reuse is most cost-effective than source code reuse because the developer wrote & debugged just one set of source code, & the compiler compiled it just once.
Executable code reuse
occurs when a
single program can be used for more
than one purpose. When a shell script
takes some filter program (such as cat
or gzip
),
pipes some other program's output
through it, & then pipes yet
another program's output through
it again, that's executable code
reuse. Here's an example:
# !/bin/sh # # Here we use the 'gzip' executable # ls |gzip >tmpfile # # Here we use the 'gzip' executable # again # w |gzip >tmpfile2
Sure, that's a trivial example,
but consider the benefits of
you're reusing complex functionality,
like rsh
,
an e-mail program, or a complex
data-analysis program.
Consider also the benefits of
this kind of reuse:
So much for code reuse in theory. Here's how my test programs reused code.
All the generator
programs shared the code
that generated the random
characters. That code is in the
g.c
file. This was
object code reuse, & it was
a trivial case of that.
All the generator
programs also shared the looping
code that sent the generator
characters. That code is the
function
APP_Loop
in the file
app.c
. They also
shared the command line parsing
code (function APP_ParseCommandLine
)
and some initialization & clean-up
code (functions APP_Init
and APP_Uninit
).
All this is object code reuse, &
that's a good thing.
The pair of message queue programs
(qgenerator
and
qsink
) were not
able to share much more code. They
shared some between each other, but
no more.
The Unix socket generator (ugenerator
)
was able to share a lot of code with
the pipe-able generator (generator
FILE *
functions. This is an argument
that basic, C (or C++) I/O promotes
object code reuse. (Notice that
this was more reuse than achived with the
message queue programs.)
The pipe-able programs (generator
and sink
) even allow for
executable code reuse. That generator's
output can be piped to other programs or
redirected to a file. That sink's input
can come from other programs or a file.
It's not such a great advantage to these
particular programs because they are toys,
but what if you have programs that do
non-trivial analysis of data, or that
allow other programs to communicate over a
network (which is what
rsh
, ssh
,
telnet
, rcp
,
e-mail programs, & about a bazillion others do)?
What if the program that can be reused
is a data compressor (such as gzip
)?
Or an error-corrector? All these programs
allow for executable code reuse, & as
I said already, that reuse doesn't require
as much developer expertise (or time or effort)
as does writing the C, C++, or Java code to
do the same thing.
Anyway, so the pipe method of IPC (which includes redirecting to files) promotes executable code reuse (which implies object code reuse & source code reuse), whereas the other methods only allow for object code reuse & source code reuse.
The pipe-able programs,
generator
&
sink
,
are the cleanest. All generator
does is make new characters &
write them. All sink
does it read characters & discard
them. Neither program needs to worry about
connection start-up or tear-down.
Neither program needs to worry about
message boundaries. Few things in life
could be simpler than these two programs.
Their functionality is completely
divorced from the details of communication.
While this isn't a big deal with
these two programs, it could be a big deal
with programs that have more complex
functionality. If a developer is writing
a program with complex functionality,
he'll be less productive if he needs to worry
about message boundaries or connection start-up
& tear-down. He'll have to worry about
these things if he's using message queues or
sockets for IPC.
The other programs (qgenerator
,
qsink
, ugenerator
,
& usink
) have code that
deals with the IPC mechanisms. The
Unix domain socket programs (ugenerator
& usink
) are the simplest
of these because, after setting up their
connection, they call the same code that the
pipe-able programs do. The message queue
programs have to deal with connection
start-up, then with some unique message-transmission
code, & then connection tear-down.
Their functionality is coupled to their
IPC mechanism more closely than in the
pipe-able programs.
Unix application developers have many methods of IPC available. This plethora of choices can make for a difficult decision.
Programs which read & write
stdin
& stdout
so that they can be piped together or
redirected to or from files
excell in many ways over programs
that use other methods of IPC:
There are few, if any, business
reasons for application developers to
prefer message queues to reading &
writing
stdin
& stdout
as a method of IPC.
At this time (Sunday, 27 January 2002), the source code I used in these tests is online at throughput-6.cpio.bz2. I make no guarrantees about it remaining available there.
After downloading:
bzcat throughput-6.cpio.bz2 |cpio -i
cd throughput-6
./configure
make
go
The results will be written to
doc/`hostname`.txt
.
A relevant, good books is:
Mike Gancarz, The Unix Philosophy. (1995) Digital Press; Newton, MA. ISBN 1-55558-123-4.
End.