Previous | Next | Trail Map | Custom Networking and Security | Overview of Networking


Networking Basics

Computers running on the Internet communicate to each other using the TCP and UDP protocols, which are both 4-layer protocols:

When you write Java programs that communicate over the network, you are programming at the application layer. Typically, you don't need to concern yourself with the TCP and UDP layers--instead you can use the classes in the java.net package. These classes provide system-independent network communication. However you do need to understand the difference between TCP and UDP to decide which Java classes your programs should use.

When two applications want to communicate to each another reliably they establish a connection and send data back and forth over that connection. This is analagous to making a telephone call--if you want to speak to Aunt Beatrice in Kentucky, a connection is established when you dial her phone number and she answers. You send data back and forth over the connection by speaking to one another over the phone lines. Like the phone company, TCP guarantees that data sent from one end of the connection actually gets to the other end and in the same order it was sent (otherwise an error is reported).


Definition: TCP is a connection-based protocol that provides a reliable flow of data between two computers.

Applications that require a reliable, point-to-point channel to communicate, use TCP to communicate. Hyper Text Transfer Protocol (HTTP), File Transfer Protocol (ftp), and Telnet (telnet) are all examples of applications that require a reliable communication channel. The order that the data is sent and received over the network is critical to the success of these applications--when using HTTP to read from a URL, the data must be received in the order that it was sent otherwise you end up with a jumbled HTML file, a corrupt zip file, or some other invalid information.

For many applications this guarantee of reliability is critical to the success of the transfer of information from one end of the connection to the other. However, other forms of communication don't require such strict communications and in fact are hindered by them either because of the performance hit from the extra overhead, or because the reliable connection invalidates the service altogether.

Consider, for example, a clock server that sends the current time to its client when requested to do so. If the client misses a packet does it really make sense to resend the packet? No, because the time won't be correct by the time the client receives it. If the client makes two requests and receives packets from the server out of order, it doesn't really matter because the client can figure out that the packets are out of order and request another one. The reliable channel here is unnecessary, causes performance degradation, and may hinder the usefulness of the service.

Another example of a service that doesn't need the guarantee of a reliable channel is the ping command. The whole point of the ping command is to test the communication between two programs over the network. In fact, ping needs to know about dropped or out of order packets to determine how good or bad the connection is. Thus a reliable channel would invalidate this service altogether.

The UDP protocol provides for non-guaranteed communication between two applications on the network. UDP is not connection-based like TCP. Rather it sends independent packets of data, called datagrams from one application to another. Sending datagrams is much like sending a letter through the mail service: the order of delivery is not important and is not guaranteed, and each message is independent of any others.


Definition: UDP is a protocol that sends independent packets of data, called datagrams from one computer to another with no guarantees about arrival. UDP is not connection-based like TCP.

Ports

Generally speaking, a computer has a single physical connection to the network. All data destined for a particular computer arrives through that connection. However, the data may be intended for different applications running on the computer. So how does the computer know which application to forward data to? Through the use of ports.

Data transmitted over the Internet is accompanied by addressing information that identifies the computer and the port that it's destined for. The computer is identified by its 32-bit IP address, which IP uses to deliver data to the right computer on the network. Ports are identified by a 16-bit number, which TCP and UDP use to deliver the data to the right application.

In connection-based communication, an application establishes a connection with another application by binding a socket to a port number. This has the effect of registering the application with the system to recieve all data destined for that port. No two applications can bind to the same port: Attempts to bind to a port that is already in use will fail.

In datagram-based communication, the datagram packet contains the port number of its destination.


Definition: The TCP and UDP protocols use ports to map incoming data to a particular process running on a computer.

Port numbers range from 0 to 65535 (because ports are represented by 16-bit numbers). The port numbers ranging from 0 - 1023 are restricted--they are reserved for use by well-known services such as HTTP and ftp and other system services. Your applications should not attempt to bind to these ports. Ports that are reserved for well-known services such as HTTP and ftp are called well-known ports.

Through the classes in java.net, Java programs can use TCP or UDP to communicate over the Internet. The URL, URLConnection, Socket, and SocketServer classes all use TCP to communicate over the network. The DatagramPacket and DatagramServer classes use UDP.


Previous | Next | Trail Map | Custom Networking and Security | Overview of Networking