Copyright 1995

TCP/IP Sockets

The page describes software for internal use at the GRASP and HMS labs of the University of Pennsylvania.
1 Introduction
2 A simple example of transferring data
3 A library
4 Setting up the system
5 A complete example
6 Frequently asked questions
7 Acknowledgements

1

Introduction

Sockets provide a means to communicate between two different processes. Those processes may be on the same machine or they may be on different machines on opposite sides of the country. In either case the software you need is the same - the system handles all of the lower level details - insulating you from the need to worry about physical machine locations.

Here we describe a simple means of using sockets to pass arbitrary data bi-directionally between two running processes. While there are more complicated possibilities it is this simple case which occurs most often in code written here at the lab.

The basic idea is simple:

  1. Create connection between processes
  2. Read/Write data
  3. Shutdown connection

Within each process, creating the connection is similar to opening a file using the open() call. What you end up with is an integer representing the number of the opened stream. Reading and writing is then just a simple matter of calling the system read/write routines and passing them that integer stream number.

2

A simple example of transferring data

Say you have a structure:
struct stuffStruct
{
  int integer;
  float real;
  char string[20];
} first;
and you wish to send it from process A to B. Then the actual C code for process A is simply:
write(streamNumber, &first, sizeof(first));
and the code for process B is simply:
read(streamNumber, &first, sizeof(first));
Now, actually correct code would be a little more complicated, since we should check the return values from read and write to look for errors. Nevertheless the basic idea is not very complicated.

3

A library

Since writing the code to open connections and check read/write for errors can become tedious we've written a library to handle those functions for you. This library is just a nice interface to the system routines - it does no additional copying of data being sent/received and it introduces no overhead (except perhaps the cost of a function call).

The library has just 5 functions:

int  jsSockAccept( int* socket, char* service, char* protocol);
int jsSockConnect( int* socket, char* rhost, char* service, char* protocol);
int    jsSockRead( int socket, void* buffer, int size);
int   jsSockWrite( int socket, void* buffer, int size);
int   jsSockClose( int socket );
The accept and connect routines are used to create the socket connection. From process A call jsSockAccept and from process B call jsSockConnect. Then data may be written and read by calling the jsSockRead and jsSockWrite routines. Finally, when you're all done with the connection, each process should call jsSockClose.

Source code is currently in

~tele/sun4/src/socket/jsSocket.[ch] on grip.cis
and is freely available for use here in the lab - it compiles using gcc on the Sun and any of the compilers (even CC) on the SGI machines. There's no need for any special compiler switches.

You are encouraged to browse through the source code and see that its doing what you expect - in particular check the jsSockRead and jsSockWrite functions and see how they are nothing but wrappers around the low-level system read and write functions.

4

Setting up the system

To use a socket connection you first have to ask your system administrator to provide one for you. This requires adding an entry to the system /etc/services file. For example, for the teleoperation project we have the following two sockets listed:
tele1		7511/tcp		
tele2		7512/tcp		
The first word on each line is the "service" - think of it as a textual designator for the socket. The number is the system's internal description for that socket while "tcp" is the protocol we'll be using.

Making use of someone else's entry in /etc/services is a very bad practice - the person who owns it may be trying to use it from the other side of the country - if you are going to use sockets you must get your own entry.

5

A complete example

Here's example code for transferring the contents of a structure from process A to B and then back again. Remember that we need two programs - one for each process.

For process A the code is:

#include <stdio.h>
#include "jsSocket.h"

typedef struct stuffStruct
{
  int integer;
  float real;
  char string[20];
} stuffStruct;

int main(int argc, char *argv[])
{
  stuffStruct stuff;
  int sock;

  if( argc != 3)
    {
      fprintf(stderr, "Usage: %s service protocol\n", argv[0]);
      exit(-1);
    }

  if( jsSockAccept(&sock, argv[1], argv[2]) ) exit(-1);

  stuff.integer=10, stuff.real=0.5, strcpy(stuff.string, "test1");
  if( jsSockWrite(sock, &stuff, sizeof(stuff))) exit(-1);
  printf("1. Wrote: %d %f %s\n", stuff.integer, stuff.real, stuff.string);

  if(jsSockRead(sock, &stuff, sizeof(stuff))) exit(-1);
  printf("1. Read: %d %f %s\n", stuff.integer, stuff.real, stuff.string);

  jsSockClose(sock);
  return 0;
}  
     

And for the second process the code is:
#include <stdio.h>
#include "jsSocket.h"

typedef struct stuffStruct
{
  int integer;
  float real;
  char string[20];
} stuffStruct;

int main(int argc, char *argv[])
{
  stuffStruct stuff;
  int sock;

  if( argc != 4)
    {
      fprintf(stderr, "Usage: %s machine service protocol\n",argv[0]);
      exit(-1);
    }

  if( jsSockConnect(&sock, argv[1], argv[2], argv[3])) exit(-1);

  if(jsSockRead(sock, &stuff, sizeof(stuff))) exit(-1);
  printf("2. Read: %d %f %s\n", stuff.integer, stuff.real, stuff.string);

  stuff.integer=20, stuff.real=1.0, strcpy(stuff.string, "test2");
  if( jsSockWrite(sock, &stuff, sizeof(stuff))) exit(-1);
  printf("2. Wrote: %d %f %s\n", stuff.integer, stuff.real, stuff.string);

  jsSockClose(sock);
  return 0;
}   
Running the two processes is just a case of using the appropriate command-line arguments so that each knows the name of the socket service, the protocol to use and the name of the machine to which it should connect.

For example, using the "tele1" socket with process A running on a machine called "marlin" would require running:

processA tele1 tcp
on the marlin machine and:
processB marlin tele1 tcp
on some other machine.

6

Frequently asked questions

What if the sending and receiving processes are on different machines?

The library code has been used and tested for communication between IRIX and SunOS machines without problems. It should work fine on other platforms so long as both the communicating processes are running on machines which store data in compatible ways. In particular it assumes the same byte ordering and the same floating-point format. Fortunately all the machines currently in the lab (and all those we are likely to purchase any time soon) do use the same (IEEE) floating point format and the same byte-ordering.

In the unlikely event that you do need to communicate between incompatible machines then you have two choices - either add code yourself to flip the byte order around or use xdr (see man pages). However, either way its messy and slow and best avoided if possible!

How do you send variable-length data

One difficulty is that when you call read() you need to know how many bytes to expect. This makes it difficult to send and read things like regular C strings which could have arbitrary length. The solution is simple. Just send two messages. The first, of fixed size, encodes the number of bytes which should be expected in the second, variable-sized, message.

If you have a choice between several different fixed-size messages then you can send a single character to indicate which type of message will follow.

How can you receive data without waiting around?

In the above examples it was necessary for the each process to know when to call read. However, in some cases you may not know when data is to be expected. There are several solutions to this:

One which I use with X/Xt code is to make use of XtAppAddInput(). This allows you to specify a stream and a function. When data arrives on that stream the system will automatically call the specified function. If you're not using X you can do something similar by writing your own code using the system select() function.

An alternative is to have the system send your program an interrupt whenever input on a socket arrives. Another alternative is to spawn off a separate thread to handle i/o - this is particularly easy on the iris since you can use sproc(). A final alternative is to set the sockets to be non-blocking and use polling. (See the man pages and network programming guides for more details).

What does "Address already in use" mean?

This is usually caused by an abnormal termination of the last process to use that socket. The reason for the error is that if a socket is not formally shutdown then it will stay around for a while even after the process which created it died. This is a side effect of two things - the need to cope with possibly large variable transmission delays and the need for the system to guarantee that data is received in the right order by the right process. Basically, when a process dies the system prevents others from using the same socket until it is sure no more data may arrive for the old (and now dead) process.

This error should go away all by itself - just wait a minute or two and try again. If it doesn't disappear check for any suspended processes (at either end) which may be holding the socket open. You can avoid having to wait by being careful to shutdown each socket before terminating.

7

Acknowledgements

This document is the result of reading the network programming guides and asking questions of several different system managers including Mark Jason Dominus, Mark Foster, John Bradley and Gaylord Holder. They know what they're talking about so any mistakes in this document are mine alone.

The library code is derived from code by Janez Funda which is itself based on the examples from the networking manual.



Craig Sayers (sayers@grip.cis.upenn.edu) May 1995