Comparing Some IPC Methods on Unix

by Gene Michael Stover

Sunday, 27 January 2002

Introduction

There is sometimes disagreement between programmers on what method of IPC to use in Unix applications. Here I report the measurements of throughput, programming difficult, code sharing, & run-time flexibility of some IPC methods.

The Methods

pipe: This is where one program writes to stdout, another program reads from stdin, & you connect the two with the pipe operator ('|') in a shell script. Pipes can be considered as a method of communication between software components in which simple programs are the components & shell is the scripting language. In this respect, pipes are in the same family as COM & Corba.
temporary file: This is where one program writes to stdout, another program reads from stdin, & you connect the two by running the first program with its output redirected to a temporary file, then running the second program with its input redirected from the temporary file. (Then you remove the file, in case that matters.) This is closely related to pipes & can be considered a kind of piping.
message queue: This is where one program places messages in a Unix message queue, & another program removes the messages from the queue. See getmsg, msgsnd, & msgrcv in the Unix manual.
Unix domain sockets: This is where two programs communicate over a network socket in the Unix domain. This type of socket is like a TCP socket except that, where a TCP socket is in the IP domain, a Unix socket is in the Unix domain. Sockets in the Unix domain can be server sockets (bind, listen, accept) or client sockets (connect). They are just like the more familiar TCP sockets except that the client & the server must be on the same host.

Numbers

Throughput

Naïve software developers believe that the most important thing about software is run-time speed. Let's satisfy their curiosity now.

I did some performance tests on two machines on my network, & here are the raw numbers.

Each row in a table is one "test". Each test (row in the table) shows the run-times for each of the IPC mechanisms I tested on one host. For example, the first row in the table (after the column headings) shows the results of running the message queue IPC method, the temporary file IPC method, the Unix socket IPC method, the pipe IPC method, plus two others (sanity checks & curiosity satisfiers) on Palsy.

The first column is the host name. Ebola is a 200 MHz Pentium with 32 MB of RAM running OpenBSD 2.7. Palsy is a 750 MHz Pentium 3 with 128 MB of RAM running Red Hat Linux 7.1 (Linux kernel 2.4, methinks).

Columns 2 through 7 are times for the IPC mechanims. For each IPC mechanism in a test, a generator program uses the IPC mechanism to send 512 megabytes of randomly generated, printable characters to a sink program. The time required for the generator to send everything & the sink to receive everything is the result of using that IPC mechanism in that test. The time include connection start-up & tear-down.

I chose 512 megabytes because

it was multiple times larger than the memory on either machine, so disk caching wouldn't make the file I/O methods appear faster than they would for lost of data,
it was (hopefully & probably) larger than the buffer on any of the IPC methods, and
the time required to transmit the data would exceed the time required to launch the generator & sink programs & setup the IPC channel, so that the tests would reflect the throughput of each IPC method, not the start-up & tear-down overhead.

I chose to send randomly generated printable characters because it's a way to simulate real-world data being transmitted but with a small development cost & a small run-time cost.

Regardless of the IPC mechanism, the generator transmitted data to the sink in chunks of fixed size. In all cases, the chunk size was 1024 bytes. I chose this because

the tests attempt to simulate real applications, & for an application developer, a buffer size of 1024 bytes is
- likely to be small enough to fit in a host's memory,
- large enough to provide decent performance.
common among real applications.

Larger chunk sizes tend to produce higher throughput, but the largest chunk size that would work on Ebola was 1024. Besides that, the purpose of these tests is to compare the throughputs of the IPC mechanisms, not to find any particular mechanism's maximum throughput. I used the same chunk size for all IPC mechanisms in the hope of giving all IPC mechanisms the same advantage, which is equivalent to giving no advantage to any IPC mechanism.

The IPC mechanims are named in the first column. They are:

msgq: Message Queues. The generator produced characters in fixed-sized messages which is stuffed into a message queue. The sink pulled messages from the queue & discarded them. In this case, the generator was src/qgenerator.c, & the sink was src/sink.c.
tmp file: Temporary file. The generator wrote characters to stdout. I redirected that to a temporary file. When the generator was done, I ran the sink & redirected its input from the temporary file. Then I deleted the file. ("I" didn't do all that manually. It was in a shell script.) In this test, the the generator was src/generator.c, and the sink was src/sink.c.
>/dev/null: The generator's stdout was redirected to /dev/null. There was no sink. In this test, the generator was src/generator.c.
|sink: The generator's stdout was piped into the sink's stdin. In this test, the the generator was src/generator.c, and the sink was src/sink.c.
|cat >/dev/null: The generator's stdout was piped into cat, & cat's output was redirected to /dev/null. This was sort of to get a feel for the efficiency of the src/sink.c program. In this test, the the generator was src/generator.c, and the sink was src/sink.c.




Generator/sink pairs were separate
programs run from a simple shell script.
(Those are the
src/run-*.sh
scripts in the source code.)
A wrapper shell script
(src/throughput.sh)
timed the shell scripts that
ran the generator/sink pairs.
Timing was done by running the
common Unix
date
program before & after the
pair-running script, then taking
the difference.  The resolution of
this method of timing is 1 second.



I ran multiple tests on each host
so that noise would average-out when
I analyzed the results.  Noise might come
from the (lack of) resolution of the
timing mechanism & from periodic
background processes run by the operating
system.  (During the tests, the hosts
were not running any user programs at all,
& I didn't even access the common
LAN from other hosts, but common &
standard Unix operating system processes
were still running.)



(I shouldn't need to point this out,
but I have a sinking feeling that I
should: Since the table reports run-times,
smaller numbers indicate higher throughput,
&
"Higher throughput" is a kind of
better performance.)



Finally, here's the table of
performance results.



  
    hostname
    time (sec)
  

  
    
    msgq
    tmp file
    unix
    pipe
    |cat >/dev/null
     >/dev/null 
  

  
    palsy
    67   
    101   
    67   
    66   
    66   
    64   
  
  
    palsy
    67   
    101   
    67   
    66   
    66   
    64   
  
  
    palsy
    67   
    101   
    67   
    65   
    66   
    65   
  
  
    palsy
    67   
    101   
    69   
    66   
    66   
    64   
  
  
    palsy
    68   
    101   
    69   
    66   
    66   
    64   
  
  
    palsy
    67   
    103   
    68   
    66   
    66   
    64   
  
  
    palsy
    67   
    101   
    69   
    65   
    66   
    65   
  
  
    palsy
    68   
    100   
    69   
    66   
    66   
    65   
  

  
    ebola
    479   
    432   
    371   
    356   
    355   
    340   
  
  
    ebola
    477   
    434   
    370   
    354   
    357   
    340   
  
  
    ebola
    478   
    432   
    369   
    353   
    356   
    340   
  
  
    ebola
    478   
    432   
    370   
    352   
    356   
    340   
  



Development Time


I tracked development times
on a wall clock while I wrote
the generator/sink pairs of
programs.  Here is a table
of development times, but notice
that
the first pair of
programs,
generator and
sink,
are used in four IPC methods.
So, depending on your demands,
their development cost might
amortize to 8.5 minutes per
IPC method.



  
    method
    program pair
    development time (hh:mm)
  

  
    pipe, tmp file, |cat, /dev/null
    generator.c, sink.c
    0:34
  

  
    msgq
    qgenerator.c, qsink.c
    1:24
  

  
    Unix socket
    ugenerator.c, usink.c
    1:03
  


Analysis

Throughput


Well, I'm surprised.  I expected
message queues to have a marginally
higher throughput than the other
methods of IPC.  It looks like
pipes are the fastest method of IPC
that I tested.  This is not surprising
because it is the original method
of IPC on Unix, so implementors
have had plenty of time to optimize it.
What's more, other benefits of pipes
cause it to be a common method of
IPC on Unix, which gives implementors
still more motivation to optimize
pipes.



Because each IPC mechanism was used the same
number of times in each combination of
hostname, data size (always 512 megabytes),
& chunk size (always 1024 bytes),
we can obtain an estimate of total
work for each IPC mechanism over all the
tests by summing each column from the
first table (the one that showed run times
for each test).
These sums are the cumulative work
for each IPC mechanism over all the tests,
&
they are valid as long as each test summed
includes all the IPC mechanisms.
The sums produce the first row in the
following table.



From those sums, we can get a relative
measure of efficiency.  We do that by finding
the largest sum.  (Rememeber that the sums
are seconds, so larger numbers means less
throughput.)  That largest sum turns out to
belong to the tmp file
(temporary file)
IPC method, & it is 2539 seconds.
We divide that number by each
of the sums.  The quotient for an IPC method
is that IPC method's efficiency compared to the
slowest of the methods.  (The efficiency of
the tmp file method will be 1.0,
since it's sum is the numberator in all the
divisions.)
These quotients are the second column in the
following table.



Here's that's table, keeping sort of the
same columns as the previous run-time
table for easy reading.
The first row holds the sums; each column is a number of
seconds.
The second row holds the quotients; they are
multiples of the througput of the
tmp file method of IPC.  Larger
values in the second row indicate higher
throughput.



  
    over-all
    method
  

  
    
    msgq
    tmp file
    unix
    pipe
    |cat >/dev/null
     >/dev/null 
  

  
    sum (sec)
    2450   
    2539   
    2025   
    1941   
    1952   
    1875   
  

  
    throughput 
 (relative)
    1.03   
    1.00   
    1.25   
    1.30   
    1.30   
    1.35   
  




From the second row, you
can see that pipes are the fastest
method of IPC.  (Redirecting to
/dev/null has a higher
throughput, but it's not exactly a
method of IPC because the data never
reaches another program, it just
disappears into /dev/null.)
Unix sockets are a close second in
throughput, with message queues being
only slightly faster than temporary files.


Run-time Flexibility


Think about flexibility.  To put us
on the same page, let me tell you how
the test programs were implemented.



I had two write six programs:




  
  generator writes to
  stdout, &
  sink reads from
  stdin,
  

  
  qgenerator
  writes to a message queue, &
  qsink
  reads from that queue,
  

  
  ugenerator
  writes to a socket in the
  Unix domain &
  usink
  reads from that socket.
  




qgenerator &
qsink are good for
exactly one thing: reading &
writing the message queue they share.
You can't send their data through
any of the standard Unix utilities
such as wc,
awk, or any of the hundreds
of others.  You can't put their data
in a file or an e-mail message.  Their
computations, their code, & their data are
bound to their communication method.



Nearly the same can be said for
ugenerator
&
usink, though they
benefit from some
code reuse.



The same cannot be said of
generator
&
sink.
That single pair of programs,
with no increase in complexity,
implements two IPC methods (piping &
redirecting) without any increase in
complexity in the source code.  (Look
at the source code if you don't believe
me.  The source for generator.c
is simpler than that of the other generators,
& the source for sink.c
is simpler than that of the other sinks.)



How can this be?  It's because their IPC
mechanism is external to them.  They
read stdin or write
stdout, & other programs
can redirect or pipe that where they want.



generator & sink
could be combined with standard Unix
utilites such as wc,
file compressors, & e-mail.
They can even be used over a network via
rsh, 
telnet, or
some other, pipe-based command
line networking utility (which would
itself be flexible due to the same
pipe IPC mechanism that
generator &
sink use).


Code Reuse


Code reuse occurs on three levels:
source code,
object code (.o files
& linkable libraries), &
executable code (the program files
you actually run).



Source code reuse occurs when
the developer types one set of
source code & uses it in
different places.  A common example
of this is templates in C++ (which
I did not use in these tests).  It
also happens when you use macros
in C.  Notice that with C++ templates
& C macros, the same source code
is (probably) used to produce different
chunks of object code.  For example,
if I write a function template in C++
& then call it with an integer argument
in one place & a char *
argument in another place, the function
is compiled to object code twice; once as
a function of integers & a second time
as a function of char *.
Source code reuse is the weakest form
of code reuse; it saves you the least.



Object code reuse occurs when
you compile a chunk of source
code once & link with it.  This
implies that the source code was
reused.  So with object code reuse,
the developer write the source
code once & the compiler
compiled it once.
Object code reuse is most cost-effective
than source code reuse because the
developer wrote & debugged just
one set of source code, &
the compiler compiled it just once.



Executable code reuse
occurs when a
single program can be used for more 
than one purpose.  When a shell script
takes some filter program (such as cat
or gzip),
pipes some other program's output
through it, & then pipes yet
another program's output through
it again, that's executable code
reuse.  Here's an example:


    # !/bin/sh

    #
    # Here we use the 'gzip' executable
    #
    ls |gzip >tmpfile

    #
    # Here we use the 'gzip' executable
    # again
    #
    w |gzip >tmpfile2



Sure, that's a trivial example,
but consider the benefits of
you're reusing complex functionality,
like rsh,
an e-mail program, or a complex
data-analysis program.
Consider also the benefits of
this kind of reuse:




  
  The executable
  being reused is completed, debugged,
  & working.  The developer is
  done with it.  (Considering how
  prevelant vapor ware is, this
  benefit alone is worth a lot because
  it defines the program as being
  more substantial than vapor.)
  

  
  The power of that executable (including the
  power of the 
  object code in it & the
  source code that was used to create
  the object code) is now
  usable by other people when they
  write shell scripts, & writing
  shell scripts is pretty easy.  Sure,
  it takes someone with more expertise
  than the average user has, but it
  doesn't require nearly the expertise
  that programming in C++ does.
  And writing shell scripts is really,
  really quick.
  




So much for code reuse in theory.
Here's how my test programs reused code.



All the generator
programs shared the code
that generated the random
characters.  That code is in the
g.c file.  This was
object code reuse, & it was
a trivial case of that.



All the generator
programs also shared the looping
code that sent the generator
characters.  That code is the
function
APP_Loop in the file
app.c.  They also
shared the command line parsing
code (function APP_ParseCommandLine)
and some initialization & clean-up
code (functions APP_Init
and APP_Uninit).
All this is object code reuse, &
that's a good thing.



The pair of message queue programs
(qgenerator and
qsink) were not
able to share much more code.  They
shared some between each other, but
no more.



The Unix socket generator (ugenerator)
was able to share a lot of code with
the pipe-able generator (generatorFILE *
functions.  This is an argument
that basic, C (or C++) I/O promotes
object code reuse.  (Notice that
this was more reuse than achived with the
message queue programs.)



The pipe-able programs (generator
and sink) even allow for
executable code reuse.  That generator's
output can be piped to other programs or
redirected to a file.  That sink's input
can come from other programs or a file.
It's not such a great advantage to these
particular programs because they are toys,
but what if you have programs that do
non-trivial analysis of data, or that
allow other programs to communicate over a
network (which is what
rsh, ssh,
telnet, rcp,
e-mail programs, & about a bazillion others do)?
What if the program that can be reused
is a data compressor (such as gzip)?
Or an error-corrector?  All these programs
allow for executable code reuse, & as
I said already, that reuse doesn't require
as much developer expertise (or time or effort)
as does writing the C, C++, or Java code to
do the same thing.



Anyway, so the pipe method of IPC (which
includes redirecting to files)
promotes executable code reuse (which implies
object code reuse & source code reuse), whereas
the other methods only allow for
object code reuse & source code
reuse.


Architecture


The pipe-able programs,
generator
&
sink,
are the cleanest.  All generator
does is make new characters &
write them.  All sink
does it read characters & discard
them.  Neither program needs to worry about
connection start-up or tear-down.
Neither program needs to worry about
message boundaries.  Few things in life
could be simpler than these two programs.
Their functionality is completely
divorced from the details of communication.
While this isn't a big deal with
these two programs, it could be a big deal
with programs that have more complex
functionality.  If a developer is writing
a program with complex functionality,
he'll be less productive if he needs to worry
about message boundaries or connection start-up
& tear-down.  He'll have to worry about
these things if he's using message queues or
sockets for IPC.



The other programs (qgenerator,
qsink, ugenerator,
& usink) have code that
deals with the IPC mechanisms.  The
Unix domain socket programs (ugenerator
& usink) are the simplest
of these because, after setting up their
connection, they call the same code that the
pipe-able programs do.  The message queue
programs have to deal with connection
start-up, then with some unique message-transmission
code, & then connection tear-down.
Their functionality is coupled to their
IPC mechanism more closely than in the
pipe-able programs.


Conclusion


Unix application developers
have many methods of IPC available.
This plethora of choices can
make for a difficult decision.



Programs which read & write
stdin & stdout
so that they can be piped together or
redirected to or from files
excell in many ways over programs
that use other methods of IPC:




  
  They have better
  performance.
  

  
  They are
  more flexible.
  

  
  They allow for
  executable code reuse,
  the most beneficial type of code reuse.
  

  
  They are
  easier to develop
  & have
  cleaner architectures.
  




There are few, if any, business
reasons for application developers to
prefer message queues to reading &
writing
stdin & stdout
as a method of IPC.


Source Code


At this time (Sunday, 27 January 2002),
the source code I used in these tests
is online at
throughput-6.cpio.bz2.
I make no guarrantees about it remaining available
there.



After downloading:



  bzcat throughput-6.cpio.bz2 |cpio -i
  cd throughput-6
  ./configure
  make
  go



The results will be written to
doc/`hostname`.txt.


Bibliography


A relevant, good books is:



Mike Gancarz,
The Unix Philosophy.
(1995)
Digital Press; Newton, MA. 
ISBN 1-55558-123-4.


End.

hostname	time (sec)
hostname	msgq	tmp file	unix	pipe	`\|cat >/dev/null`	`>/dev/null`
palsy	67	101	67	66	66	64
palsy	67	101	67	66	66	64
palsy	67	101	67	65	66	65
palsy	67	101	69	66	66	64
palsy	68	101	69	66	66	64
palsy	67	103	68	66	66	64
palsy	67	101	69	65	66	65
palsy	68	100	69	66	66	65
ebola	479	432	371	356	355	340
ebola	477	434	370	354	357	340
ebola	478	432	369	353	356	340
ebola	478	432	370	352	356	340

method	program pair	development time (hh:mm)
pipe, tmp file, `\|cat`, `/dev/null`	generator.c, sink.c	0:34
msgq	qgenerator.c, qsink.c	1:24
Unix socket	ugenerator.c, usink.c	1:03

over-all	method
over-all	msgq	tmp file	unix	pipe	`\|cat >/dev/null`	`>/dev/null`
sum (sec)	2450	2539	2025	1941	1952	1875
throughput (relative)	1.03	1.00	1.25	1.30	1.30	1.35