Sei sulla pagina 1di 139

www.UandiStar.

org
KATRAGADDA INNOVATIVE TRUST FOR EDUCATION

NETWORK PROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
2|Page NETWORKPROGRAMMING

UNIT-I Introduction and TCP/IP


INTRODUCTION

When writing programs that communicate across a computer network, one must first invent a protocol, an agreement on how those programs will communicate. Before delving into the design details of a protocol, high-level decisions must be made about which program is expected to initiate communication and when responses are expected. For example, a Web server is typically thought of as a long-running program (or daemon) that sends network messages only in response to requests coming in from the network. The other side of the protocol is a Web client, such as a browser, which always initiates communication with the server. This organization into client and server is used by most network-aware applications.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
3|Page NETWORKPROGRAMMING

OSI Model

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
4|Page NETWORKPROGRAMMING

A common way to describe the layers in a network is to use the International Organization for Standardization (ISO) open systems interconnection (OSI) model for computer

communications. This is a seven-layer model, along with the approximate mapping to the Internet protocol suite.

The sockets programming interfaces described are interfaces from the upper three layers (the "application") into the transport layer. Why do sockets provide the interface from the upper three layers of the OSI model into the transport layer? There are two reasons for this design: First, the upper three layers handle all the details of the application (FTP, Telnet, or HTTP, for example) and know little about the communication details. The lower four layers know little about the application, but handle all the communication details: sending data, waiting for acknowledgments, sequencing data that arrives out of order, calculating and verifying checksums, and so on. The second reason is that the upper three layers often form what is called a user process while the lower four layers are normally provided as part of the operating system (OS) kernel. Unix provides this separation between the user process and the kernel, as do many other contemporary operating systems. Therefore, the interface between layers 4 and 5 is the natural place to build the API.

APPLICATION LEVEL VIEW OF A SOCKET

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
5|Page NETWORKPROGRAMMING

KERNEL LEVEL VIEW OF A SOCKET (IPv4)

represents SOCKET

The Big Picture

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
6|Page NETWORKPROGRAMMING

IPv4 Internet Protocol version 4. IPv4, which we often denote as just IP, has been the workhorse protocol of the IP suite since the early 1980s. It uses 32-bit addresses. IPv4 provides packet delivery service for TCP, UDP, SCTP, ICMP, and IGMP.

IPv6 Internet Protocol version 6. IPv6 was designed in the mid-1990s as a replacement for IPv4. The major change is a larger address comprising 128 bits, to deal with the explosive growth of the Internet in the 1990s. IPv6 provides packet delivery service for TCP, UDP, SCTP, and ICMPv6. We often use the word "IP" as an adjective, as in IP layer and IP address, when the distinction between IPv4 and IPv6 is not needed.

TCP Transmission Control Protocol. TCP is a connection-oriented protocol that provides a reliable, full-duplex byte stream to its users. TCP sockets are an example of stream sockets. TCP takes care of details such as acknowledgments, timeouts, retransmissions, and the like. Most Internet application programs use TCP. Notice that TCP can use either IPv4 or IPv6.

UDP User Datagram Protocol. UDP is a connectionless protocol, and UDP sockets are an example of datagram sockets. There is no guarantee that UDP datagrams ever reach their intended destination. As with TCP, UDP can use either IPv4 or IPv6.

SCTP Stream Control Transmission Protocol. SCTP is a connection-oriented protocol that provides a reliable full-duplex association. The word "association" is used when referring to a connection in SCTP because SCTP is multihomed, involving a set of IP addresses and a single port for each side of an association. SCTP provides a message service, which maintains record boundaries. As with TCP and UDP, SCTP can use either IPv4 or IPv6, but it can also use both IPv4 and IPv6 simultaneously on the same association.

ICMP Internet Control Message Protocol. ICMP handles error and control information between routers and hosts. These messages are normally generated by and processed by the TCP/IP networking software itself, not user processes, although we show the ping and traceroute programs, which use ICMP. We sometimes refer to this protocol as ICMPv4 to distinguish it from ICMPv6.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
7|Page NETWORKPROGRAMMING

IGMP Internet Group Management Protocol. IGMP is used with multicasting, which is optional with IPv4. ARP Address Resolution Protocol. ARP maps an IPv4 address into a hardware address (such as an Ethernet address). ARP is normally used on broadcast networks such as Ethernet, token ring, and FDDI, and is not needed on point-to-point networks.

RARP Reverse Address Resolution Protocol. RARP maps a hardware address into an IPv4 address. It is sometimes used when a diskless node is booting.

ICMPv6 Internet Control Message Protocol version 6. ICMPv6 combines the functionality of ICMPv4, IGMP, and ARP.

BPF BSD packet filter. This interface provides access to the datalink layer. It is normally found on Berkeley-derived kernels.

DLPI Datalink provider interface. This interface also provides access to the datalink layer. It is normally provided with SVR4.
We use the terms "IPv4/IPv6 host" and "dual-stack host" to denote hosts that support both IPv4 and IPv6.

USER DATAGRAM PROTOCOL [UDP]:The User Datagram Protocol (UDP) provides a connectionless, unreliable transport service. Connectionless means that a communication session between hosts is not established before exchanging data. UDP is often used for communications that use broadcast or multicast Internet Protocol (IP) packets. The UDP connectionless packet delivery service is unreliable because it does not guarantee data packet delivery or send a notification if a packet is not delivered. Because delivery of UDP packets is not guaranteed, applications that use this protocol must supply their own mechanisms for reliability if necessary. Although UDP appears to have some limitations, it is useful in certain situations.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
8|Page NETWORKPROGRAMMING

Each UDP datagram has a length. The length of a datagram is passed to the receiving application along with the data.

TRANSMISSION CONTROL PROTOCOL [TCP]: Connection oriented: An application requests a connection to destination and uses connection to transfer data. Point-to-point: A TCP connection has two endpoints (no broadcast/multicast). Reliability: TCP guarantees that data will be delivered without loss, duplication or transmission errors. Full duplex: Endpoints can exchange data in both directions simultaneously. Delivering TCP: TCP segments travel in IP datagrams. Internet routers only look at IP header to forward datagrams. Each segment contains a sequence number. Flow Control: Flow control is necessary when a computer in the network transmits data too fast for another computer to receive it .Flow control requires some form of feedback from the receiving peer. This is executed effectively due to the receivers buffer i.e., Window. TCP contains algorithms to estimate the round-trip time (RTT) between a client and server dynamically so that it knows how long to wait for an acknowledgment. For example, the RTT on a LAN can be milliseconds while across a WAN, it can be seconds. Furthermore, TCP continuously estimates the RTT of a given connection, because the RTT is affected by variations in the network traffic.

TCP Connection Establishment

Three-Way Handshake
The following scenario occurs when a TCP connection is established:

1. The server must be prepared to accept an incoming connection. This is normally done by calling socket, bind, and listen and is called a passive open. 2. The client issues an active open by calling connect. This causes the client TCP to send a "synchronize" (SYN) segment, which tells the server the client's initial sequence
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
9|Page NETWORKPROGRAMMING

number for the data that the client will send on the connection. Normally, there is no data sent with the SYN; it just contains an IP header, a TCP header, and possible TCP options (which we will talk about shortly).

3. The server must acknowledge (ACK) the client's SYN and the server must also send its own SYN containing the initial sequence number for the data that the server will send on the connection. The server sends its SYN and the ACK of the client's SYN in a single segment. 4. The client must acknowledge the servers SYN.

TCP Connection Termination

1. One application calls close first, and we say that this end performs the active close. This end's TCP sends a FIN segment, which means it is finished sending data.

2. The other end that receives the FIN performs the passive close. The received FIN is acknowledged by TCP. The receipt of the FIN is also passed to the application as an endof- file (after any data that may have already been queued for the application to receive), since the receipt of the FIN means the application will not receive any additional data on the connection.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
10 | P a g e NETWORKPROGRAMMING

3. Sometime later, the application that received the end-of-file will close its socket. This causes its TCP to send a FIN.

4. The TCP on the system that receives this final FIN (the end that did the active close) acknowledges the FIN.

Since a FIN and an ACK are required in each direction, four segments are normally required. We use the qualifier "normally" because in some scenarios, the FIN in Step 1 is sent with data. Also, the segments in Steps 2 and 3 are both from the end performing the passive close and could be combined into one segment.

Importance of TIME_WAIT State:

Undoubtedly, one of the most misunderstood aspects of TCP with regard to network programming is its TIME_WAIT state. The end that performs the active close goes through this state. The duration that this endpoint remains in this state is twice the maximum segment lifetime (MSL), sometimes called 2MSL.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
11 | P a g e NETWORKPROGRAMMING

Every implementation of TCP must choose a value for the MSL. The recommended value in RFC 1122 [Braden 1989] is 2 minutes, although Berkeley-derived implementations have traditionally used a value of 30 seconds instead. This means the duration of the TIME_WAIT state is between 1 and 4 minutes. The MSL is the maximum amount of time that any given IP datagram can live in a network. We know this time is bounded because every datagram contains an 8-bit hop limit with a maximum value of 255. Although this is a hop limit and not a true time limit, the assumption is made that a packet with the maximum hop limit of 255 cannot exist in a network for more than MSL seconds. The way in which a packet gets "lost" in a network is usually the result of routing anomalies. A router crashes or a link between two routers goes down and it takes the routing protocols seconds or minutes to stabilize and find an alternate path. During that time period, routing loops can occur (router A sends packets to router B, and B sends them back to A) and packets can get caught in these loops. In the meantime, assuming the lost packet is a TCP segment, the sending TCP times out and retransmits the packet, and the retransmitted packet gets to the final destination by some alternate path. But sometime later (up to MSL seconds after the lost packet started on its journey), the routing loop is corrected and the packet that was lost in the loop is sent to the final destination. This original packet is called a lost duplicate or a
wandering duplicate.

TCP must handle these duplicates.


INFORMATION HAS BEEN TAKEN FROM:

THE

FOLLOWING

http://sit.iitkgp.ernet.in/archive/teaching/internetTech/tcp/www.scit.wlv.ac.uk/%257Ejphb/comms/ tcp.html

It should be noted that the exchange is really two independent exchanges and it is possible to close the connection in one direction but not the other. This is known as a half close. The following example (due to Stevens) demonstrates the use of the half-close. Consider the Unix command rsh remote sort < datafile The effect of this is that the local file datafile is sorted on the remote host and the results transferred back to the local host. The data flow is shown in the following diagram.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
12 | P a g e NETWORKPROGRAMMING

The problem here is that the sort program on the remote host will not start sorting the data until it has read all the data, this event is indicated by the local host closing the connection and the sort program responding to the corresponding EOF indication. However, the "back" connection must remain open for the return of data. Stevens suggests that the library call shutdown() be used with sockets programming to achieve a half close. Once the final ACK has been sent on an active close, the port/connection cannot be relaeased and re-used for the time period 2MSL. This is twice the maximum segment life and this constraint is imposed in case the the final ACK is lost. If the final ACK is lost then the passive closing host will time out awaiting an ACK in response to the closing FIN and will resend the FIN. If this arrives before the 2MSL time has expired there is no problem, after this time the FIN does not appear to belong to whatever connection might exist between the two clients.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
13 | P a g e NETWORKPROGRAMMING

RFC 793 defines MSL (Maximum Segment Lifetime) as 120 seconds but some implementations use 30 or 60 seconds. It is, basically, the maximum time for which it is reasonable to wait for a segment, i.e. if a segment doesn't reach its destination in MSL, it probably won't get there at all at it can be assumed that it has been lost.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
14 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
15 | P a g e NETWORKPROGRAMMING

There are two reasons for the TIME_WAIT state:

1. To implement TCP's full-duplex connection termination reliably 2. To allow old duplicate segments to expire in the network The first reason can be explained by assuming that the final ACK is lost. The server will resend its final FIN, so the client must maintain state information, allowing it to resend the final ACK. If it did not maintain this information, it would respond with an RST (a different type of TCP segment), which would be interpreted by the server as an error. If TCP is performing all the work necessary to terminate both directions of data flow cleanly for a connection (its full-duplex close), then it must correctly handle the loss of any of these four segments. This example also shows why the end that performs the active close is the end that remains in the TIME_WAIT state: because that end is the one that might have to retransmit the final ACK. To understand the second reason for the TIME_WAIT state, assume we have a TCP connection between 12.106.32.254 port 1500 and 206.168.112.219 port 21. This connection is closed and then sometime later, we establish another connection between the same IP addresses and ports: 12.106.32.254 port 1500 and 206.168.112.219 port 21. This latter connection is called an incarnation of the previous connection since the IP addresses and ports are the same. TCP must prevent old duplicates from a connection from reappearing at some later time and being misinterpreted as belonging to a new incarnation of the same connection. To do this, TCP will not initiate a new incarnation of a connection that is currently in the TIME_WAIT state. Since the duration of the TIME_WAIT state is twice the MSL, this allows MSL seconds for a packet in one direction to be lost, and another MSL seconds for the reply to be lost. By enforcing this rule, we are guaranteed that when we successfully establish a TCP connection, all old duplicates from previous incarnations of the connection have expired in the network. USEFUL LINKS FOR TIME_WAIT IMPORTANCE:
http://support.citrix.com/article/CTX117910 http://www.pcvr.nl/tcpip/tcp_time.htm Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
16 | P a g e NETWORKPROGRAMMING

Port Numbers

ALLOCATION OF PORT NUMBERS

INTRODUCTION TO CONCURRENT SERVERS:

SOCKETPAIR: The socket pair for a TCP connection is the four-tuple that defines the two endpoints of the connection: the local IP address, local port, foreign IP address, and foreign port. A socket pair uniquely identifies every TCP connection on a network.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
17 | P a g e NETWORKPROGRAMMING

NOTE: FOR MORE INFORMATION ABOUT FIRST 6 UNITS, PLEASE GO THROUGH THE FOLLOWING LINK:
http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
18 | P a g e NETWORKPROGRAMMING

UNIT-II Socket Address Structure


Most socket functions require a pointer to a socket address structure as an argument. Each supported protocol suite defines its own socket address structure.

IPv4 Socket Address Structure(SAS)


An IPv4 socket address structure, commonly called an "Internet socket address structure," is named sockaddr_in and is defined by including the <netinet/in.h> header. The POSIX definition of IPV4 SAS is shown below:
struct in_addr { in_addr_t s_addr; }; struct sockaddr_in { uint8_t sin_len; sa_family_t sin_family; in_port_t sin_port; struct in_addr sin_addr; char sin_zero[8]; };

The diagrammatical representation of IPV4 SAS is:

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
19 | P a g e NETWORKPROGRAMMING

Datatype, Description and Header File of IPV4 SAS Members

IMP NOTE: The 32-bit IPv4 address can be accessed in two different ways. For example, if serv is defined as an Internet socket address structure, then serv.sin_addr references the 32bit IPv4 address as an in_addr structure, while serv.sin_addr.s_addr references the same 32bit IPv4 address as an in_addr_t (typically an unsigned 32-bit integer). We must be certain that we are referencing the IPv4 address correctly, especially when it is used as an argument to a function, because compilers often pass structures differently from integers. Socket address structures are used only on a given host: The structure itself is not communicated between different hosts, although certain fields (e.g., the IP address and port) are used for communication.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
20 | P a g e NETWORKPROGRAMMING

Value-Result Arguments

Three functions, bind, connect, and sendto, pass a socket address structure from the process to the kernel. One argument to these three functions is the pointer to the socket address structure and another argument is the integer size of the structure. Since the kernel is passed both the pointer and the size of what the pointer points to, it knows exactly how much data to copy from the process into the kernel.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
21 | P a g e NETWORKPROGRAMMING

Four functions, accept, recvfrom, getsockname, and getpeername, pass a socket address structure from the kernel to the process, the reverse direction from the previous scenario. Two of the arguments to these four functions are the pointer to the socket address structure along with a pointer to an integer containing the size of the structure. The reason that the size changes from an integer to be a pointer to an integer is because the size is both a value when the function is called (it tells the kernel the size of the structure so that the kernel does not write past the end of the structure when filling it in) and a result when the function returns. This type of argument is called a value-result argument.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
22 | P a g e NETWORKPROGRAMMING

Byte Ordering Functions


Consider a 16-bit integer that is made up of 2 bytes. There are two ways to store the two bytes in memory: with the low-order byte at the starting address, known as little-endian byte order, or with the high-order byte at the starting address, known as big-endian byte order.

Network Byte Order Big Endian Byte Order Host Byte Order Big Endian or Little Endian Byte Order We must deal with these byte ordering differences as network programmers because networking protocols must specify a network byte order. For example, in a TCP segment, there is a 16-bit port number and a 32-bit IPv4 address. The sending protocol stack and the receiving protocol stack must agree on the order in which the bytes of these multibyte fields will be transmitted. The Internet protocols use big-endian byte ordering for these multibyte integers.

In theory, an implementation could store the fields in a socket address structure in host byte order and then convert to and from the network byte order when moving the fields to and from the protocol headers, saving us from having to worry about this detail. But, both history and the POSIX specification say that certain fields in the socket address structures must be
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
23 | P a g e NETWORKPROGRAMMING

maintained in network byte order. Our concern is therefore converting between host byte order and network byte order. We use the following four functions to convert between these two byte orders.

In the names of these functions, h stands for host, n stands for network, s stands for short, and l stands for long. The terms "short" and "long" are historical artifacts from the Digital VAX implementation of 4.2BSD. We should instead think of s as a 16-bit value (such as a TCP or UDP port number) and l as a 32-bit value (such as an IPv4 address). Indeed, on the 64-bit Digital Alpha, a long integer occupies 64 bits, yet the htonl and ntohl functions operate on 32-bit values. NOTE: These functions are used exclusively for data functionality between sockets (storage).

Byte Manipulation Functions


There are two groups of functions that operate on multibyte fields, without interpreting the data, and without assuming that the data is a null-terminated C string. We need these types of functions when dealing with socket address structures because we need to manipulate fields such as IP addresses, which can contain bytes of 0, but are not C character strings.

The first group of functions, whose names begin with b (for byte), are from 4.2BSD and are still provided by almost any system that supports the socket functions. The second group of functions, whose names begin with mem (for memory), are from the ANSI C standard and are provided with any system that supports an ANSI C library.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
24 | P a g e NETWORKPROGRAMMING

src might represent application space and dest might represent socket send buffer space (socket receive buffer space).
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
25 | P a g e NETWORKPROGRAMMING

inet_aton, inet_addr, and inet_ntoa Functions


To send IP address on the network, we have the functions that serve the purpose. The following functions are for IPV4.

inet_pton and inet_ntop Functions


The IPV6 functions for the data communication over the network, following functions are used. These functions can also be used for IPV4 addresses also (The family argument specifies this).

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
26 | P a g e NETWORKPROGRAMMING

sock_ntop Function
A basic problem with inet_ntop is that it requires the caller to pass a pointer to a binary address. This address is normally contained in a socket address structure, requiring the caller to know the format of the structure and the address family.

To solve this problem, sock_ntop() is used which takes pointer to a socket address structure as an argument, calls the appropriate function and the presentation address is returned.

readn, writen, and readline Functions


Stream sockets (e.g., TCP sockets) exhibit a behavior with the read and write functions that differ from normal file I/O. A read or write on a stream socket might input or output fewer bytes than requested, but this is not an error condition. The reason is that buffer limits might be reached for the socket in the kernel. All that is required to input or output the remaining bytes is for the caller to invoke the read or write function again. Some versions of Unix also exhibit this behavior when writing more than 4,096 bytes to a pipe. This scenario is always a possibility on a stream socket with read, but is normally seen with write only if the socket is nonblocking. Nevertheless, we always call our writen function instead of write, in case the implementation returns a short count.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
27 | P a g e NETWORKPROGRAMMING

The following functions overcome this problem.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
28 | P a g e NETWORKPROGRAMMING

Elementary TCP Sockets

Socket functions for elementary TCP client/server

Socket: socket (af, type, protocol); Creates a socket on demand (placing it in an unconnected state), returns an integer identifying the socket (descriptor), and specifies: Address Family (af) - particular address of the family. Type - Type of communication socket:
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
29 | P a g e NETWORKPROGRAMMING

SOCK_STREAM - connection-oriented SOCK_DGRAM - connection-less SOCK_RAW - access to low-level protocols or network interfaces. Protocol - Accommodates multiple protocols within a family.

Bind: bind (socket, localaddr, addrlen); Socket is created without any association to local or destination addresses, so a program uses bind to establish a local address for it. Socket - integer descriptor of the socket. Localaddr - structure that specifies the local address to be bound. Addrlen - integer length of the address (in bytes).

Listen: listen (socket, qlength); Server creates a socket, binds it to a well-known port, and waits for requests. To avoid rejecting service requests that cannot be handled, a server queue is created using Listen. It provides a mechanism to create the queue and then listen for incoming connections (passive mode). Listen only works with sockets using a reliable stream service. Socket - Integer descriptor. Qlength - length of the request queue for that socket (max. = 5).

Connect: connect (socket, destaddr, addrlen); Binds a permanent destination to a socket placing it in a connected state. Sockets using connection-less service do not have to use connect (specify the address in every datagram), but may. Socket - socket descriptor. Destaddr - socket_addr structure (also includes protocol port number) specifying the destination address. Addrlen - length of destination address (in bytes).

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
30 | P a g e NETWORKPROGRAMMING

Accept: accept (socket, addr, addrlen); Bind associates a socket with port, but that socket is not connected to a foreign destination. When a request comes in, Accept establishes the full connection. It blocks until a connection request arrives. Addr - pointer to the sockaddr structure. Addrlen - pointer to integer size of address.

Close: (A system call from traditional UNIX Environment) close (socket descriptor); When a client or server finishes with a socket, calls close to deallocate its resources. The connection immediately terminates unless several processes share the same socket. It then decrements the reference count (closing it completely when reference count = 0).

Order of Socket System Calls:


Client Side Client Side (depends on connection type): Socket Connect Write (may be repeated) Read (may be repeated) Close

Server Side Server Side (depends on connection type): Socket Bind Listen Accept Read (may be repeated) Write (may be repeated) Close (go back to Accept)
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
31 | P a g e NETWORKPROGRAMMING

Shutdown: Shutdown (socket, direction); The shutdown function applies to full-duplex sockets (connected using a TCP socket) and is used to partially close the connection. Socket - socket descriptor of a connected socket. Direction - direction in which shutdown is desired 0 = terminate further input. 1 = terminate further output. 2 = terminate input / output (close).

IMPORTANT NOTES:
File and Socket Descriptors: A socket is a generalized UNIX file access mechanism that provides an endpoint for communication. Descriptors (maintained in the descriptor tables) are kept per process by the operating system to point to internal data structures for files and sockets. Descriptors are small integer values. File Descriptor: Bound to a file when open is called. Socket Descriptor: Created using open, but does not bind it to a destination. Unbounded - UDP specifies destination every time. Bounded - TCP specifies destination during an open system call.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
32 | P a g e NETWORKPROGRAMMING

After a socket has been created (using open), additional system calls are required to specify the details of its use. Passive Socket - used by a server to wait for calls. Active Socket - used by a client to initiate a connection.

Basic I/O Functions in UNIX:


UNIX and other operating systems provide a basic set of system functions used for I/O operations on files and other devices. Most operating systems provide similar variations to the five standard I/O operations that BSD UNIX uses.

I/O Functions: Open - prepare for input / output. Close - terminate the use of a device. Write - transfer data from memory to an output device. Read - transfer data from an input device to memory. Lseek - position the head of a disk drive to a specific place on the disk.

The Socket Interface: The Berkeley socket interface provides generalized functions that support network communication using many possible protocols. Socket calls refer to all TCP/IP protocols as a single protocol family (protocol suite). The calls allow a programmer to specify the type of service required, rather than the name of a specific protocol. The socket interface was created since an API (application program interface) for network connections is not standardized, its design lies outside the scope of a protocol suite.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
33 | P a g e NETWORKPROGRAMMING

Concurrent Servers

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
34 | P a g e NETWORKPROGRAMMING

getsockname and getpeername Functions


These two functions return either the local protocol address associated with a socket (getsockname) or the foreign protocol address associated with a socket (getpeername).

#include <sys/socket.h> int getsockname(intsockfd, struct sockaddr *localaddr, socklen_t *addrlen); int getpeername(intsockfd, struct sockaddr *peeraddr, socklen_t *addrlen); Both return: 0 if OK, -1 on error

Notice that the final argument for both functions is a value-result argument. That is, both functions fill in the socket address structure pointed to by localaddr or peeraddr. We mentioned in our discussion of bind that the term "name" is misleading. These two functions return the protocol address associated with one of the two ends of a network connection, which for IPV4 and IPV6 is the combination of an IP address and port number. These functions have nothing to do with domain names.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
35 | P a g e NETWORKPROGRAMMING

These two functions are required for the following reasons:

After connect successfully returns in a TCP client that does not call bind, getsockname returns the local IP address and local port number assigned to the connection by the kernel.

After calling bind with a port number of 0 (telling the kernel to choose the local port number), getsockname returns the local port number that was assigned. getsockname can be called to obtain the address family of a socket.

In a TCP server that binds the wildcard IP address, once a connection is established with a client (accept returns successfully), the server can call getsockname to obtain the local IP address assigned to the connection. The socket descriptor argument in this call must be that of the connected socket, and not the listening socket.

When a server is execed by the process that calls accept, the only way the server can obtain the identity of the client is to call getpeername.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
36 | P a g e NETWORKPROGRAMMING

UNIT-III TCP Client/Server Example

Introduction
Our simple example is an echo server that performs the following steps: 1. The client reads a line of text from its standard input and writes the line to the server. 2. The server reads the line from its network input and echoes the line back to the client. 3. The client reads the echoed line and prints it on its standard output.

Normal Startup(w.r.to socket pair)


In order to initiate the communication between the client and server, we first start the Server by calling socket(). The socket pair at the server is; SP = (IPs:Ps , IPc:Pc) where IPc IP address of Client IPs IP address of Server Pc Port Number of Client Ps Port Number of Server

Next comes bind(), then SP = (localhost:33600 , IPc:Pc) Then listen(), now SP = (localhost:33600 , IPc:Pc) [You may enter wildcard character * for IPs, IPc, Pc when they are not known.] So, at Server the status is Passive Open and the format is: Server socket() - SP = (IPs:Ps , IPc:Pc) bind() - SP = (localhost:33600 , IPc:Pc)
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
37 | P a g e NETWORKPROGRAMMING

listen() - SP = (localhost:33600 , IPc:Pc) or (*:33600 , *:*)

Now, the Client requests the connection with the server. The function calls are; socket(). The socket pair is; SP = (IPc:Pc , IPs:Ps) So, at the client side, the status is Active Open. Now, SIMULTANEOUS OPEN situation occurs as both the ends connect with each other as, At Client: Call is connect() SP = (localhost:33597, x.y.z.w:33600) At Server: Call is accept() SP = (localhost:33600 , a.b.c.d:33597) The format is: Client socket() - SP = (IPc:Pc , IPs:Ps)

SIMULTANEOUS OPEN connect() SP = (localhost:33597, x.y.z.w:33600) accept() SP = (localhost:33600 , a.b.c.d:33597) At this point, Normal Startup of Client and Server is said to be occurred.

The following steps take place with our Client/Server example: 1. The client calls str_cli, which will block in the call to fgets, because we have not typed a line of input yet. 2. When accept returns in the server, it calls fork and the child calls str_echo. This function calls readline, which calls read, which blocks while waiting for a line to be sent from the client.
3.

The server parent, on the other hand, calls accept again, and blocks while waiting for the next client connection.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
38 | P a g e NETWORKPROGRAMMING

Normal Termination
We can follow through the steps involved in the normal termination of our client and server: 1. When we type our EOF character, fgets returns a null pointer and the function str_cli returns. 2. When str_cli returns to the client main function , the latter terminates by calling exit. 3. Part of process termination is the closing of all open descriptors, so the client socket is closed by the kernel. This sends a FIN to the server, to which the server TCP responds with an ACK. This is the first half of the TCP connection termination sequence. At this point, the server socket is in the CLOSE_WAIT state and the client socket is in the FIN_WAIT_2 state. 4. When the server TCP receives the FIN, the server child is blocked in a call to readline, and readline then returns 0. This causes the str_echo function to return to the server child main. 5. The server child terminates by calling exit. 6. All open descriptors in the server child are closed. The closing of the connected socket by the child causes the final two segments of the TCP connection termination to take place: a FIN from the server to the client, and an ACK from the client. At this point, the connection is completely terminated. The client socket enters the TIME_WAIT state. 7. Finally, the SIGCHLD signal is sent to the parent when the server child terminates. This occurs in this example, but we do not catch the signal in our code, and the default action of the signal is to be ignored. Thus, the child enters the zombie state. We can verify this with the ps command.

wait and waitpid Functions


we call the wait function to handle the terminated child. #include <sys/wait.h> pid_t wait (int *statloc); pid_t waitpid (pid_tpid, int *statloc, intoptions); Both return: process ID if OK, 0 or1 on error
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
39 | P a g e NETWORKPROGRAMMING

wait and waitpid both return two values: the return value of the function is the process ID of the terminated child, and the termination status of the child (an integer) is returned through the statloc pointer. There are three macros that we can call that examine the termination status and tell us if the child terminated normally, was killed by a signal, or was just stopped by job control. Additional macros let us then fetch the exit status of the child, or the value of the signal that killed the child, or the value of the job-control signal that stopped the child. We will use the WIFEXITED and WEXITSTATUS macros for this purpose. If there are no terminated children for the process calling wait, but the process has one or more children that are still executing, then wait blocks until the first of the existing children terminates.

waitpid gives us more control over which process to wait for and whether or not to block. First, the pid argument lets us specify the process ID that we want to wait for. A value of -1 says to wait for the first of our children to terminate. (There are other options, dealing with process group IDs, but we do not need them in this text.) The options argument lets us specify additional options. The most common option is WNOHANG. This option tells the kernel not to block if there are no terminated children.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
40 | P a g e NETWORKPROGRAMMING

Termination of Server Process


We will now start our client/server and then kill the server child process. This simulates the crashing of the server process, so we can see what happens to the client. The following steps take place: 1. We start the server and client and type one line to the client to verify that all is okay. That line is echoed normally by the server child. 2. We find the process ID of the server child and kill it. As part of process termination, all open descriptors in the child are closed. This causes a FIN to be sent to the client, and the client TCP responds with an ACK. This is the first half of the TCP connection termination. 3. The SIGCHLD signal is sent to the server parent and handled correctly. 4. Nothing happens at the client. The client TCP receives the FIN from the server TCP and responds with an ACK, but the problem is that the client process is blocked in the call to fgets waiting for a line from the terminal. 5. Running netstat at this point shows the state of the sockets.

linux % netstat tcp tcp tcp 0 0 1

-a | grep 0 0 0

9877 *:9877 *:* localhost:43604 localhost:9877 LISTEN FIN_WAIT2 CLOSE_WAIT

localhost:9877 localhost:43604

6. We can still type a line of input to the client. Here is what happens at the client starting from Step 1: linux hello hello %tcpcli01 127.0.0.1 start client the first line that we type is echoed correctly here we kill the server child on the server host another line str_cli : server terminated prematurely we then type a second line to the client

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
41 | P a g e NETWORKPROGRAMMING

When we type "another line," str_cli calls writen and the client TCP sends the data to the server. This is allowed by TCP because the receipt of the FIN by the client TCP only indicates that the server process has closed its end of the connection and will not be sending any more data. The receipt of the FIN does not tell the client TCP that the server process has terminated (which in this case, it has). When the server TCP receives the data from the client, it responds with an RST since the process that had that socket open has terminated. We can verify that the RST was sent by watching the packets with tcpdump.

7. The client process will not see the RST because it calls readline immediately after the call to writen and readline returns 0 (EOF) immediately because of the FIN that was received in Step 2. Our client is not expecting to receive an EOF at this point so it quits with the error message "server terminated prematurely." 8. When the client terminates, all its open descriptors are closed.

Crashing of Server Host


The following steps take place: 1. When the server host crashes, nothing is sent out on the existing network connections. That is, we are assuming the host crashes and is not shut down by an operator. 2. We type a line of input to the client, it is written by writen , and is sent by the client TCP as a data segment. The client then blocks in the call to readline, waiting for the echoed reply. 3. If we watch the network with tcpdump, we will see the client TCP continually retransmitting the data segment, trying to receive an ACK from the server. Section 25.11 of TCPv2 shows a typical pattern for TCP retransmissions: Berkeley-derived implementations retransmit the data segment 12 times, waiting for around 9 minutes before giving up. When the client TCP finally gives up (assuming the server host has not been rebooted during this time, or if the server host has not crashed but was unreachable on the network, assuming the host was still unreachable), an error is returned to the client process. Since the client is blocked in the call to readline, it returns an error. Assuming the server host crashed and there were no responses at all
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
42 | P a g e NETWORKPROGRAMMING

to the client's data segments, the error is ETIMEDOUT. But if some intermediate router determined that the server host was unreachable and responded with an ICMP destination unreachable message, the error is either EHOSTUNREACH or ENETUNREACH.

Crashing and Rebooting of Server Host


The following steps take place: 1. We start the server and then the client. We type a line to verify that the connection is established. 2. The server host crashes and reboots. We type a line of input to the client, which is sent as a TCP data segment to the server host. 3. When the server host reboots after crashing, its TCP loses all information about connections that existed before the crash. Therefore, the server TCP responds to the received data segment from the client with an RST. 4. Our client is blocked in the call to readline when the RST is received, causing readline to return the error ECONNRESET.

Shutdown of Server Host


The previous two sections discussed the crashing of the server host, or the server host being unreachable across the network. We now consider what happens if the server host is shut down by an operator while our server process is running on that host. When a Unix system is shut down, the init process normally sends the SIGTERM signal to all processes (we can catch this signal), waits some fixed amount of time (often between 5 and 20 seconds), and then sends the SIGKILL signal (which we cannot catch) to any processes still running. This gives all running processes a short amount of time to clean up and terminate. If we do not catch SIGTERM and terminate, our server will be terminated by the SIGKILL signal. When the process terminates, all open descriptors are closed, and we then follow the same sequence of steps discussed in TERMINATION OF SERVER PROCESS. As stated there, we must use the select or poll function in our client to have the client detect the termination of the server process as soon as it occurs.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
43 | P a g e NETWORKPROGRAMMING

UNIT-IV I/O Multiplexing:The


select

and

poll

functions

Introduction
We saw our TCP client handling two inputs at the same time: standard input and a TCP socket. We encountered a problem when the client was blocked in a call to fgets (on standard input) and the server process was killed. The server TCP correctly sent a FIN to the client TCP, but since the client process was blocked reading from standard input, it never saw the EOF until it read from the socket (possibly much later). What we need is the capability to tell the kernel that we want to be notified if one or more I/O conditions are ready (i.e., input is ready to be read, or the descriptor is capable of taking more output). This capability is called I/O multiplexing and is provided by the select and poll functions. We will also cover a newer POSIX variation of the former, called pselect.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
44 | P a g e NETWORKPROGRAMMING

I/O multiplexing is typically used in networking applications in the following scenarios:

When a client is handling multiple descriptors (normally interactive input and a network socket), I/O multiplexing should be used. It is possible, but rare, for a client to handle multiple sockets at the same time. If a TCP server handles both a listening socket and its connected sockets, I/O multiplexing is normally used. If a server handles TCP and UDP, I/O multiplexing is normally used. If a server handles multiple services and perhaps multiple protocols, I/O multiplexing is normally used.

There are normally two distinct phases for an input operation:

1. Waiting for the data to be ready 2. Copying the data from the kernel to the process

For an input operation on a socket, the first step normally involves waiting for data to arrive on the network. When the packet arrives, it is copied into a buffer within the kernel. The second step is copying this data from the kernel's buffer into our application buffer.

I/O Models
The five I/O models those are available to us under UNIX: blocking I/O nonblocking I/O I/O multiplexing (select and poll) signal driven I/O (SIGIO) asynchronous I/O (the POSIX aio_functions)

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
45 | P a g e NETWORKPROGRAMMING

BLOCKING I/O MODEL:

NONBLOCKING I/O MODEL:

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
46 | P a g e NETWORKPROGRAMMING

I/O MULTIPLEXING

SIGNAL-DRIVEN I/O

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
47 | P a g e NETWORKPROGRAMMING

ASYNCHRONOUS I/O MODEL

SELECT FUNCTION
select()Synchronous

I/O Multiplexing

This function is somewhat strange, but it's very useful. Take the following situation: you are a server and you want to listen for incoming connections as well as keep reading from the connections you already have. No problem, you say, just an accept() and a couple of recv()s. Not so fast, buster! What if you're blocking on an accept() call? How are you going to recv() data at the same time? "Use non-blocking sockets!" No way! You don't want to be a CPU hog. What, then?
select() gives you the power to monitor several sockets at the same time. It'll tell you

which ones are ready for reading, which are ready for writing, and which sockets have raised exceptions, if you really want to know that.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
48 | P a g e NETWORKPROGRAMMING

This being said, in modern times select(), though very portable, is one of the slowest methods for monitoring sockets. One possible alternative is libevent, or something similar, that encapsulates all the system-dependent stuff involved with getting socket notifications. Without any further ado, I'll offer the synopsis of select():
#include <sys/time.h> #include <sys/types.h> #include <unistd.h> int select(int numfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

The function monitors "sets" of file descriptors; in particular readfds, writefds, and exceptfds. If you want to see if you can read from standard input and some socket descriptor, sockfd, just add the file descriptors 0 and sockfd to the set readfds. The parameter numfds should be set to the values of the highest file descriptor plus one. In this example, it should be set tosockfd+1, since it is assuredly higher than standard input (0). When select() returns, readfds will be modified to reflect which of the file descriptors you selected which is ready for reading. You can test them with the macro FD_ISSET(), below. Before progressing much further, I'll talk about how to manipulate these sets. Each set is of the type fd_set. The following macros operate on this type:
FD_SET(int fd, fd_set *set); FD_CLR(int fd, fd_set *set); FD_ZERO(fd_set *set);

Add fd to the set. Remove fd from the set. Clear all entries from the set.

FD_ISSET(int fd, fd_set *set); Return true if fd is in the set.

Finally, what is this weirded out struct timeval? Well, sometimes you don't want to wait forever for someone to send you some data. Maybe every 96 seconds you want to print "Still Going..." to the terminal even though nothing has happened. This time structure allows you to specify a timeout period. If the time is exceeded and select() still hasn't found any ready file descriptors, it'll return so you can continue processing.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
49 | P a g e NETWORKPROGRAMMING

The struct timeval has the follow fields:


struct timeval { int tv_sec; int tv_usec; };

// seconds // microseconds

Just set tv_sec to the number of seconds to wait, and set tv_usec to the number of microseconds to wait. Yes, that's microseconds, not milliseconds. There are 1,000 microseconds in a millisecond, and 1,000 milliseconds in a second. Thus, there are 1,000,000 microseconds in a second. Why is it "usec"? The "u" is supposed to look like the Greek letter (Mu) that we use for "micro". Also, when the function returns, timeout might be updated to show the time still remaining. This depends on what flavor of Unix you're running. Yay! We have a microsecond resolution timer! Well, don't count on it. You'll probably have to wait some part of your standard Unix timeslice no matter how small you set yourstruct
timeval.

Other things of interest: If you set the fields in your struct timeval to 0, select() will timeout immediately, effectively polling all the file descriptors in your sets. If you set the parametertimeout to NULL, it will never timeout, and will wait until the first file descriptor is ready. Finally, if you don't care about waiting for a certain set, you can just set it to NULL in the call toselect(). The following code snippet waits 2.5 seconds for something to appear on standard input:
/* ** select.c -- a select() demo */ #include #include #include #include <stdio.h> <sys/time.h> <sys/types.h> <unistd.h> // file descriptor for standard input

#define STDIN 0

int main(void) { struct timeval tv; fd_set readfds;

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
50 | P a g e NETWORKPROGRAMMING

tv.tv_sec = 2; tv.tv_usec = 500000; FD_ZERO(&readfds); FD_SET(STDIN, &readfds); // don't care about writefds and exceptfds: select(STDIN+1, &readfds, NULL, NULL, &tv); if (FD_ISSET(STDIN, &readfds)) printf("A key was pressed!\n"); else printf("Timed out.\n"); return 0; }

If you're on a line buffered terminal, the key you hit should be RETURN or it will time out anyway. Now, some of you might think this is a great way to wait for data on a datagram socket and you are right: it might be. Some Unices can use select in this manner, and some can't. You should see what your local man page says on the matter if you want to attempt it. Some Unices update the time in your struct timeval to reflect the amount of time still remaining before a timeout. But others do not. Don't rely on that occurring if you want to be portable. (Use gettimeofday() if you need to track time elapsed. It's a bummer, I know, but that's the way it is.) What happens if a socket in the read set closes the connection? Well, in that case, select() returns with that socket descriptor set as "ready to read". When you actually do recv() from it,recv() will return 0. That's how you know the client has closed the connection. One more note of interest about select(): if you have a socket that is listen()ing, you can check to see if there is a new connection by putting that socket's file descriptor in the readfds set. And that, my friends, is a quick overview of the almighty select() function.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
51 | P a g e NETWORKPROGRAMMING

But, by popular demand, here is an in-depth example. Unfortunately, the difference between the dirt-simple example, above, and this one here is significant. But have a look, then read the description that follows it. This program acts like a simple multi-user chat server. Start it running in one window, then telnet to it ("telnet hostname 9034") from multiple other windows. When you type something in onetelnet session, it should appear in all the others.
/* ** selectserver.c -- a cheezy multiperson chat server */ #include #include #include #include #include #include #include #include #include <stdio.h> <stdlib.h> <string.h> <unistd.h> <sys/types.h> <sys/socket.h> <netinet/in.h> <arpa/inet.h> <netdb.h> // port we're listening on

#define PORT "9034"

// get sockaddr, IPv4 or IPv6: void *get_in_addr(struct sockaddr *sa) { if (sa->sa_family == AF_INET) { return &(((struct sockaddr_in*)sa)->sin_addr); } return &(((struct sockaddr_in6*)sa)->sin6_addr); } int main(void) { fd_set master; fd_set read_fds; int fdmax;

// master file descriptor list // temp file descriptor list for select() // maximum file descriptor number

int listener; // listening socket descriptor int newfd; // newly accept()ed socket descriptor struct sockaddr_storage remoteaddr; // client address socklen_t addrlen; char buf[256]; int nbytes; // buffer for client data

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
52 | P a g e NETWORKPROGRAMMING

char remoteIP[INET6_ADDRSTRLEN]; int yes=1; int i, j, rv; // for setsockopt() SO_REUSEADDR, below

struct addrinfo hints, *ai, *p; FD_ZERO(&master); FD_ZERO(&read_fds); // clear the master and temp sets

// get us a socket and bind it memset(&hints, 0, sizeof hints); hints.ai_family = AF_UNSPEC; hints.ai_socktype = SOCK_STREAM; hints.ai_flags = AI_PASSIVE; if ((rv = getaddrinfo(NULL, PORT, &hints, &ai)) != 0) { fprintf(stderr, "selectserver: %s\n", gai_strerror(rv)); exit(1); } for(p = ai; p != NULL; p = p->ai_next) { listener = socket(p->ai_family, p->ai_socktype, p->ai_protocol); if (listener < 0) { continue; } // lose the pesky "address already in use" error message setsockopt(listener, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)); if (bind(listener, p->ai_addr, p->ai_addrlen) < 0) { close(listener); continue; } break; } // if we got here, it means we didn't get bound if (p == NULL) { fprintf(stderr, "selectserver: failed to bind\n"); exit(2); } freeaddrinfo(ai); // all done with this // listen if (listen(listener, 10) == -1) { perror("listen"); exit(3); } // add the listener to the master set

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
53 | P a g e NETWORKPROGRAMMING

FD_SET(listener, &master); // keep track of the biggest file descriptor fdmax = listener; // so far, it's this one // main loop for(;;) { read_fds = master; // copy it if (select(fdmax+1, &read_fds, NULL, NULL, NULL) == -1) { perror("select"); exit(4); } // run through the existing connections looking for data to read for(i = 0; i <= fdmax; i++) { if (FD_ISSET(i, &read_fds)) { // we got one!! if (i == listener) { // handle new connections addrlen = sizeof remoteaddr; newfd = accept(listener, (struct sockaddr *)&remoteaddr, &addrlen); if (newfd == -1) { perror("accept"); } else { FD_SET(newfd, &master); // add to master set if (newfd > fdmax) { // keep track of the max fdmax = newfd; } printf("selectserver: new connection from %s on " "socket %d\n", inet_ntop(remoteaddr.ss_family, get_in_addr((struct sockaddr*)&remoteaddr), remoteIP, INET6_ADDRSTRLEN), newfd); } } else { // handle data from a client if ((nbytes = recv(i, buf, sizeof buf, 0)) <= 0) { // got error or connection closed by client if (nbytes == 0) { // connection closed printf("selectserver: socket %d hung up\n", i); } else { perror("recv"); } close(i); // bye! FD_CLR(i, &master); // remove from master set } else { // we got some data from a client for(j = 0; j <= fdmax; j++) {

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
54 | P a g e NETWORKPROGRAMMING

// send to everyone! if (FD_ISSET(j, &master)) { // except the listener and ourselves if (j != listener && j != i) { if (send(j, buf, nbytes, 0) == -1) { perror("send"); } } } } } } // END handle data from client } // END got new incoming connection } // END looping through file descriptors } // END for(;;)--and you thought it would never end! return 0; }

Notice I have two file descriptor sets in the code: master and read_fds. The first, master, holds all the socket descriptors that are currently connected, as well as the socket descriptor that is listening for new connections. The reason I have the master set is that select() actually changes the set you pass into it to reflect which sockets are ready to read. Since I have to keep track of the connections from one call of select() to the next, I must store these safely away somewhere. At the last minute, I copy the master into the read_fds, and then call select(). But doesn't this mean that every time I get a new connection, I have to add it to the master set? Yup! And every time a connection closes, I have to remove it from the master set? Yes, it does. Notice I check to see when the listener socket is ready to read. When it is, it means I have a new connection pending, and I accept() it and add it to the master set. Similarly, when a client connection is ready to read, and recv() returns 0, I know the client has closed the connection, and I must remove it from the master set. If the client recv() returns non-zero, though, I know some data has been received. So I get it, and then go through the master list and send that data to all the rest of the connected clients.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
55 | P a g e NETWORKPROGRAMMING

And that, my friends, is a less-than-simple overview of the almighty select() function. In addition, here is a bonus afterthought: there is another function called poll() which behaves much the same way select() does, but with a different system for managing the file descriptor sets.
http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#select

POLL FUNCTION
poll()

Test for events on multiple sockets simultaneously

Prototypes
#include <sys/poll.h> int poll(struct pollfd *ufds, unsigned int nfds, int timeout);

Description
This function is very similar to select() in that they both watch sets of file descriptors for events, such as incoming data ready to recv(), socket ready to send() data to, out-of-band data ready to recv(), errors, etc. The basic idea is that you pass an array of nfds struct pollfds in ufds, along with a timeout in milliseconds (1000 milliseconds in a second.) The timeout can be negative if you want to wait forever. If no event happens on any of the socket descriptors by the timeout, poll() will return. Each element in the array of struct pollfds represents one socket descriptor, and contains the following fields:
struct pollfd { int fd; short events; short revents; };

// the socket descriptor // bitmap of events we're interested in // when poll() returns, bitmap of events that occurred

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
56 | P a g e NETWORKPROGRAMMING

Before calling poll(), load fd with the socket descriptor (if you set fd to a negative number, this struct pollfd is ignored and its revents field is set to zero) and then construct the eventsfield by bitwise-ORing the following macros:
POLLIN POLLOUT POLLPRI

Alert me when data is ready to recv() on this socket. Alert me when I can send() data to this socket without blocking. Alert me when out-of-band data is ready to recv() on this socket.

Once the poll() call returns, the revents field will be constructed as a bitwise-OR of the above fields, telling you which descriptors actually have had that event occur. Additionally, these other fields might be present:
POLLERR POLLHUP POLLNVAL

An error has occurred on this socket. The remote side of the connection hung up. Something was wrong with the socket descriptor fdmaybe it's uninitialized?

Return Value
Returns the number of elements in the ufds array that have had event occur on them; this can be zero if the timeout occurred. Also returns -1 on error (and errno will be set accordingly.)

Example
int s1, s2; int rv; char buf1[256], buf2[256]; struct pollfd ufds[2]; s1 = socket(PF_INET, SOCK_STREAM, 0); s2 = socket(PF_INET, SOCK_STREAM, 0); // pretend we've connected both to a server at this point //connect(s1, ...)... //connect(s2, ...)... // set up the array of file descriptors. // // in this example, we want to know when there's normal or out-of-band // data ready to be recv()'d... ufds[0].fd = s1;

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
57 | P a g e NETWORKPROGRAMMING

ufds[0].events = POLLIN | POLLPRI; // check for normal or out-of-band ufds[1] = s2; ufds[1].events = POLLIN; // check for just normal data // wait for events on the sockets, 3.5 second timeout rv = poll(ufds, 2, 3500); if (rv == -1) { perror("poll"); // error occurred in poll() } else if (rv == 0) { printf("Timeout occurred! No data after 3.5 seconds.\n"); } else { // check for events on s1: if (ufds[0].revents & POLLIN) { recv(s1, buf1, sizeof buf1, 0); // receive normal data } if (ufds[0].revents & POLLPRI) { recv(s1, buf1, sizeof buf1, MSG_OOB); // out-of-band data } // check for events on s2: if (ufds[1].revents & POLLIN) { recv(s1, buf2, sizeof buf2, 0); } }

Socket Options
There are various ways to get and set the options that affect a socket: The getsockopt and setsockopt functions The fcntl function The ioctl function

This chapter starts by covering the setsockopt and getsockopt functions, followed by an example that prints the default value of all the options, and then a detailed description of all the socket options. We divide the detailed descriptions into the following categories: generic, IPv4, IPv6, TCP, and SCTP. This detailed coverage can be skipped during a first reading of this chapter, and the individual sections referred to when needed. A few options are discussed in detail in a later chapter, such as the IPv4 and IPv6 multicasting options.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
58 | P a g e NETWORKPROGRAMMING

setsockopt(), getsockopt() Set various options for a socket

Prototypes
#include <sys/types.h> #include <sys/socket.h> int getsockopt(int s, int level, int optname, void *optval, socklen_t *optlen); int setsockopt(int s, int level, int optname, const void *optval, socklen_t optlen);

Description
Sockets are fairly configurable beasts. In fact, they are so configurable, I'm not even going to cover it all here. It's probably system-dependent anyway. But I will talk about the basics. Obviously, these functions get and set certain options on a socket. On a Linux box, all the socket information is in the man page for socket in section 7. (Type: "man 7 socket" to get all these goodies.) As for parameters, s is the socket you're talking about, level should be set to SOL_SOCKET. Then you set the optname to the name you're interested in. Again, see your man page for all the options, but here are some of the most fun ones:
SO_BINDTODEVICE

Bind this socket to a symbolic device name like eth0 instead of using bind() to bind it to an IP address. Type the command ifconfig under Unix to see the device names. Allows other sockets to bind() to this port, unless there is an active listening socket bound to the port already. This enables you to get around those "Address already in use" error messages when you try to restart your server after a crash. Allows UDP datagram (SOCK_DGRAM) sockets to send and receive packets sent to and from the broadcast address. Does nothingNOTHING!!to TCP stream sockets! Hahaha!

SO_REUSEADDR

SO_BROADCAST

As for the parameter optval, it's usually a pointer to an int indicating the value in question. For booleans, zero is false, and non-zero is true. And that's an absolute fact, unless it's different on your system. If there is no parameter to be passed, optval can be NULL.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
59 | P a g e NETWORKPROGRAMMING

The final parameter, optlen, is filled out for you by getsockopt() and you have to specify it for setsockopt(), where it will probably be sizeof(int). Warning: on some systems (notably Sun and Windows), the option can be a char instead of an int, and is set to, for example, a character value of '1' instead of an int value of 1. Again, check your own man pages for more info with "man setsockopt" and "man 7 socket"!

Return Value
Returns zero on success, or -1 on error (and errno will be set accordingly.)

Example
int optval; int optlen; char *optval2; // set SO_REUSEADDR on a socket to true (1): optval = 1; setsockopt(s1, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof optval); // bind a socket to a device name (might not work on all systems): optval2 = "eth1"; // 4 bytes long, so 4, below: setsockopt(s2, SOL_SOCKET, SO_BINDTODEVICE, optval2, 4); // see if the SO_BROADCAST flag is set: getsockopt(s3, SOL_SOCKET, SO_BROADCAST, &optval, &optlen); if (optval != 0) { print("SO_BROADCAST enabled on s3!\n"); }

The following options are supported for setsockopt():

SO_DEBUG
Provides the ability to turn on recording of debugging information. This option takes an int value in the optval argument. This is a BOOL option.

SO_BROADCAST
Permits sending of broadcast messages, if this is supported by the protocol. This option takes an int value in the optval argument. This is a BOOL option.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
60 | P a g e NETWORKPROGRAMMING

SO_REUSEADDR
Specifies that the rules used in validating addresses supplied to bind() should allow reuse of local addresses, if this is supported by the protocol. This option takes an int value in the optval argument. This is a BOOLoption.

SO_KEEPALIVE
Keeps connections active by enabling periodic transmission of messages, if this is supported by the protocol. If the connected socket fails to respond to these messages, the connection is broken and processes writing to that socket are notified with an ENETRESET errno. This option takes an int value in the optval argument. This is a BOOL option.

SO_LINGER
Specifies whether the socket lingers on close() if data is present. If SO_LINGER is set, the system blocks the process during close() until it can transmit the data or until the end of the interval indicated by the l_lingermember, whichever comes first. If SO_LINGER is not specified, and close() is issued, the system handles the call in a way that allows the process to continue as quickly as possible. This option takes a linger structure in the optval argument.

SO_OOBINLINE
Specifies whether the socket leaves received out-of-band data (data marked urgent) in line. This option takes an int value in optval argument. This is a BOOL option.

SO_SNDBUF
Sets send buffer size information. This option takes an int value in the optval argument.

SO_RCVBUF
Sets receive buffer size information. This option takes an int value in the optval argument.

SO_DONTROUTE
Specifies whether outgoing messages bypass the standard routing facilities. The destination must be on a directly-connected network, and messages are directed to the appropriate network interface according to the destination address. The effect, if any, of this option depends on what protocol is in use. This option takes an int value in the optval argument. This is a BOOL option.

TCP_NODELAY

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
61 | P a g e NETWORKPROGRAMMING

Specifies whether the Nagle algorithm used by TCP for send coalescing is to be disabled. This option takes an int value in the optval argument. This is a BOOL option.

For boolean options, a zero value indicates that the option is disabled and a non-zero value indicates that the option is enabled.

The following options are supported for getsockopt():

SO_DEBUG
Reports whether debugging information is being recorded. This option stores an int value in the optval argument. This is a BOOL option.

SO_ACCEPTCONN
Reports whether socket listening is enabled. This option stores an int value in the optval argument. This is a BOOL option.

SO_BROADCAST
Reports whether transmission of broadcast messages is supported, if this is supported by the protocol. This option stores an int value in the optval argument. This is a BOOL option.

SO_REUSEADDR
Reports whether the rules used in validating addresses supplied to bind() should allow reuse of local addresses, if this is supported by the protocol. This option stores an int value in the optval argument. This is a BOOLoption.

SO_KEEPALIVE
Reports whether connections are kept active with periodic transmission of messages, if this is supported by the protocol. If the connected socket fails to respond to these messages, the connection is broken and processes writing to that socket are notified with an ENETRESET errno. This option stores an int value in the optval argument. This is a BOOL option.

SO_LINGER
Reports whether the socket lingers on close() if data is present. If SO_LINGER is set, the system blocks the process during close() until it can transmit the data or until the end of the interval indicated by the l_lingermember, whichever comes first. If SO_LINGER is not specified, and close() is issued, the system handles the call in a way that allows the process to continue as quickly as possible. This option stores a linger structure in the optval argument.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
62 | P a g e NETWORKPROGRAMMING

SO_OOBINLINE
Reports whether the socket leaves received out-of-band data (data marked urgent) in line. This option stores an int value in optval argument. This is a BOOL option.

SO_SNDBUF
Reports send buffer size information. This option stores an int value in the optval argument.

SO_RCVBUF
Reports receive buffer size information. This option stores an int value in the optval argument.

SO_ERROR
Reports information about error status and clears it. This option stores an int value in the optval argument.

SO_TYPE
Reports the socket type. This option stores an int value in the optval argument.

SO_DONTROUTE
Reports whether outgoing messages bypass the standard routing facilities. The destination must be on a directly-connected network, and messages are directed to the appropriate network interface according to the destination address. The effect, if any, of this option depends on what protocol is in use. This option stores an int value in the optval argument. This is a BOOL option.

SO_MAX_MSG_SIZE
Maximum size of a message for message-oriented socket types (for example, SOCK_DGRAM). Has no meaning for stream-oriented sockets. This option stores an int value in the optval argument.

TCP_NODELAY
Specifies whether the Nagle algorithm used by TCP for send coalescing is disabled. This option stores an int value in the optval argument. This is a BOOL option.

For boolean options, a zero value indicates that the option is disabled and a non-zero value indicates that the option is enabled.

http://www.mkssoftware.com/docs/man3/setsockopt.3.asp http://www.mkssoftware.com/docs/man3/getsockopt.3.asp

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
63 | P a g e NETWORKPROGRAMMING

fcntl()
Control socket descriptors

Prototypes
#include <sys/unistd.h> #include <sys/fcntl.h> int fcntl(int s, int cmd, long arg);

Description
This function is typically used to do file locking and other file-oriented stuff, but it also has a couple socket-related functions that you might see or use from time to time. Parameter s is the socket descriptor you wish to operate on, cmd should be set to F_SETFL, and arg can be one of the following commands. (Like I said, there's more to fcntl() than I'm letting on here, but I'm trying to stay socket-oriented.)
O_NONBLOCK O_ASYNC

Set the socket to be non-blocking. See the section on blocking for more details. Set the socket to do asynchronous I/O. When data is ready to be recv()'d on the socket, the signal SIGIO will be raised. This is rare to see, and beyond the scope of the guide. And I think it's only available on certain systems.

Return Value
Returns zero on success, or -1 on error (and errno will be set accordingly.) Different uses of the fcntl() system call actually have different return values, but I haven't covered them here because they're not socket-related. See your local fcntl() man page for more information.

Example
int s = socket(PF_INET, SOCK_STREAM, 0); fcntl(s, F_SETFL, O_NONBLOCK); fcntl(s, F_SETFL, O_ASYNC); // set to non-blocking // set to asynchronous I/O

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
64 | P a g e NETWORKPROGRAMMING

UNIT-V Elementary UDP Sockets

Introduction
There are some fundamental differences between applications written using TCP versus those that use UDP. These are because of the differences in the two transport layers: UDP is a connectionless, unreliable, datagram protocol, quite unlike the connection-oriented, reliable byte stream provided by TCP. Nevertheless, there are instances when it makes sense to use UDP instead of TCP. Some popular applications are built using UDP: DNS, NFS, and SNMP, for example.

The below figure shows the function calls for a typical UDP client/server. The client does not establish a connection with the server. Instead, the client just sends a datagram to the server using the sendto function, which requires the address of the destination (the server) as a parameter. Similarly, the server does not accept a connection from a client. Instead, the server just calls the recvfrom function, which waits until data arrives from some client. recvfrom returns the protocol address of the client, along with the datagram, so the server can send a response to the correct client.

The figure also shows a timeline of the typical scenario that takes place for a UDP client/server exchange. We can compare this to the typical TCP exchange. We will also describe the new functions that we us with UDP sockets, recvfrom and sendto, and redo our echo client/server to use UDP. We will also describe the use of the connect function with a UDP socket, and the concept of asynchronous errors.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
65 | P a g e NETWORKPROGRAMMING

send(), sendto()
Send data out over a socket

Prototypes
#include <sys/types.h> #include <sys/socket.h> ssize_t send(int s, const void *buf, size_t len, int flags); ssize_t sendto(int s, const void *buf, size_t len, int flags, const struct sockaddr *to, socklen_t tolen);

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
66 | P a g e NETWORKPROGRAMMING

Description
These functions send data to a socket. Generally speaking, send() is used for TCP SOCK_STREAM connected sockets, and sendto() is used for

UDP SOCK_DGRAM unconnected datagram sockets. With the unconnected sockets, you must specify the destination of a packet each time you send one, and that's why the last parameters of sendto() define where the packet is going. With both send() and sendto(), the parameter s is the socket, buf is a pointer to the data you want to send, len is the number of bytes you want to send, and flags allows you to specify more information about how the data is to be sent. Set flags to zero if you want it to be "normal" data. Here are some of the commonly used flags, but check your local send() man pages for more details:
MSG_OOB

Send as "out of band" data. TCP supports this, and it's a way to tell the receiving system that this data has a higher priority than the normal data. The receiver will receive the signal SIGURG and it can then receive this data without first receiving all the rest of the normal data in the queue. Don't send this data over a router, just keep it local. If send() would block because outbound traffic is clogged, have it return EAGAIN. This is like a "enable non-blocking just for this send." See the section on blocking for more details. If you send() to a remote host which is no longer recv()ing, you'll typically get the signal SIGPIPE. Adding this flag prevents that signal from being raised.

MSG_DONTROUTE MSG_DONTWAIT

MSG_NOSIGNAL

Return Value
Returns the number of bytes actually sent, or -1 on error (and errno will be set accordingly.) Note that the number of bytes actually sent might be less than the number you asked it to send! See the section on handling partial send()s for a helper function to get around this. Also, if the socket has been closed by either side, the process calling send() will get the signal SIGPIPE. (Unless send() was called with the MSG_NOSIGNAL flag.)
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
67 | P a g e NETWORKPROGRAMMING

Example
int spatula_count = 3490; char *secret_message = "The Cheese is in The Toaster"; int stream_socket, dgram_socket; struct sockaddr_in dest; int temp; // first with TCP stream sockets: // assume sockets are made and connected //stream_socket = socket(... //connect(stream_socket, ... // convert to network byte order temp = htonl(spatula_count); // send data normally: send(stream_socket, &temp, sizeof temp, 0); // send secret message out of band: send(stream_socket, secret_message, strlen(secret_message)+1, MSG_OOB); // now with UDP datagram sockets: //getaddrinfo(... //dest = ... // assume "dest" holds the address of the destination //dgram_socket = socket(... // send secret message normally: sendto(dgram_socket, secret_message, strlen(secret_message)+1, 0, (struct sockaddr*)&dest, sizeof dest);

recv(), recvfrom()
Receive data on a socket

Prototypes

#include <sys/types.h> #include <sys/socket.h> ssize_t recv(int s, void *buf, size_t len, int flags); ssize_t recvfrom(int s, void *buf, size_t len, int flags, struct sockaddr *from, socklen_t *fromlen);

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
68 | P a g e NETWORKPROGRAMMING

Description
Once you have a socket up and connected, you can read incoming data from the remote side using the recv() (for TCP SOCK_STREAM sockets) and recvfrom() (for

UDP SOCK_DGRAMsockets). Both functions take the socket descriptor s, a pointer to the buffer buf, the size (in bytes) of the buffer len, and a set of flags that control how the functions work. Additionally, the recvfrom() takes a struct sockaddr*, from that will tell you where the data came from, and will fill in fromlen with the size of struct sockaddr. (You must also initializefromlen to be the size of from or struct sockaddr.) So what wondrous flags can you pass into this function? Here are some of them, but you should check your local man pages for more information and what is actually supported on your system. You bitwise-or these together, or just set flags to 0 if you want it to be a regular vanilla recv().
MSG_OOB

Receive Out of Band data. This is how to get data that has been sent to you with the MSG_OOB flag in send(). As the receiving side, you will have had signal SIGURG raised telling you there is urgent data. In your handler for that signal, you could call recv()with this MSG_OOB flag. If you want to call recv() "just for pretend", you can call it with this flag. This will tell you what's waiting in the buffer for when you call recv() "for real" (i.e. without the MSG_PEEK flag. It's like a sneak preview into the next recv() call. Tell recv() to not return until all the data you specified in the len parameter. It will ignore your wishes in extreme circumstances, however, like if a signal interrupts the call or if some error occurs or if the remote side closes the connection, etc. Don't be mad with it.

MSG_PEEK

MSG_WAITALL

When you call recv(), it will block until there is some data to read. If you want to not block, set the socket to non-blocking or check with select() or poll() to see if there is incoming data before calling recv() or recvfrom().
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
69 | P a g e NETWORKPROGRAMMING

Return Value
Returns the number of bytes actually received (which might be less than you requested in the len parameter), or -1 on error (and errno will be set accordingly.) If the remote side has closed the connection, recv() will return 0. This is the normal method for determining if the remote side has closed the connection. Normality is good, rebel!

Example
// stream sockets and recv() struct addrinfo hints, *res; int sockfd; char buf[512]; int byte_count; // get host info, make socket, and connect it memset(&hints, 0, sizeof hints); hints.ai_family = AF_UNSPEC; // use IPv4 or IPv6, whichever hints.ai_socktype = SOCK_STREAM; getaddrinfo("www.example.com", "3490", &hints, &res); sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol); connect(sockfd, res->ai_addr, res->ai_addrlen); // all right! now that we're connected, we can receive some data! byte_count = recv(sockfd, buf, sizeof buf, 0); printf("recv()'d %d bytes of data in buf\n", byte_count); // datagram sockets and recvfrom() struct addrinfo hints, *res; int sockfd; int byte_count; socklen_t fromlen; struct sockaddr_storage addr; char buf[512]; char ipstr[INET6_ADDRSTRLEN]; // get host info, make socket, bind it to port 4950 memset(&hints, 0, sizeof hints); hints.ai_family = AF_UNSPEC; // use IPv4 or IPv6, whichever hints.ai_socktype = SOCK_DGRAM; hints.ai_flags = AI_PASSIVE; getaddrinfo(NULL, "4950", &hints, &res); sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol); bind(sockfd, res->ai_addr, res->ai_addrlen); // no need to accept(), just recvfrom():

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
70 | P a g e NETWORKPROGRAMMING

fromlen = sizeof addr; byte_count = recvfrom(sockfd, buf, sizeof buf, 0, &addr, &fromlen); printf("recv()'d %d bytes of data in buf\n", byte_count); printf("from IP address %s\n", inet_ntop(addr.ss_family, addr.ss_family == AF_INET? ((struct sockadd_in *)&addr)->sin_addr: ((struct sockadd_in6 *)&addr)->sin6_addr, ipstr, sizeof ipstr);

Lost Datagrams
Our UDP client/server example is not reliable. If a client datagram is lost (say it is discarded by some router between the client and server), the client will block forever in its call to recvfrom in the function dg_cli, waiting for a server reply that will never arrive. Similarly, if the client datagram arrives at the server but the server's reply is lost, the client will again block forever in its call to recvfrom. A typical way to prevent this is to place a timeout on the client's call to recvfrom.

Just placing a timeout on the recvfrom is not the entire solution. For example, if we do time out, we cannot tell whether our datagram never made it to the server, or if the server's reply never made it back. If the client's request was something like "transfer a certain amount of money from account A to account B" (instead of our simple echo server), it would make a big difference as to whether the request was lost or the reply was lost.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
71 | P a g e NETWORKPROGRAMMING

connect Function with UDP


an asynchronous error is not returned on a UDP socket unless the socket has been connected. Indeed, we are able to call connect for a UDP socket. But this does not result in anything like a TCP connection: There is no three-way handshake. Instead, the kernel just checks for any immediate errors (e.g., an obviously unreachable destination), records the IP address and port number of the peer (from the socket address structure passed to connect), and returns immediately to the calling process.

Overloading the connect function with this capability for UDP sockets is confusing. If theconvention that sockname is the local protocol address and peername is the foreign protocol address is used, then a better name would have been setpeername. Similarly, a better name for the bind function would be setsockname. With this capability, we must now distinguish between

An unconnected UDP socket, the default when we create a UDP socket A connected UDP socket, the result of calling connect on a UDP socket

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
72 | P a g e NETWORKPROGRAMMING

With a connected UDP socket, three things change, compared to the default unconnected UDP socket:

1. We can no longer specify the destination IP address and port for an output operation. That is, we do not use sendto, but write or send instead. Anything written to a connected UDP socket is automatically sent to the protocol address (e.g., IP address and port) specified by connect. 2. We do not need to use recvfrom to learn the sender of a datagram, but read, recv, or recvmsg instead. The only datagrams returned by the kernel for an input operation on a connected UDP socket are those arriving from the protocol address specified in connect. Datagrams destined to the connected UDP socket's local protocol address (e.g., IP address and port) but arriving from a protocol address other than the one to which the socket was connected are not passed to the connected socket. This limits a connected UDP socket to exchanging datagrams with one and only one peer. 3. Asynchronous errors are returned to the process for connected UDP sockets. The corollary, as we previously described, is that unconnected UDP sockets do not receive asynchronous errors.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
73 | P a g e NETWORKPROGRAMMING

Lack of Flow Control with UDP


We observe two cases:

CASE 1:

SLOW CLIENT

FAST SERVER

CASE 2:

FAST CLIENT

SLOW SERVER

WE KNOW THE STATEMENT AT ANY MOMENT OF TIME, SENDER WILL NOT OVERFLOW THE RECEIVER BUFFER FROM TCP CONCEPT.

Based on this statement, we explain the concept like this:

W.r.to Client: SLOW-BIT RATE IS LESS FAST-BIT RATE IS MORE W.r.to Server: SLOW-RECEIVER BUFFER (WINDOW) SIZE IS LESS FAST- RECEIVER BUFFER (WINDOW) SIZE IS MORE

In CASE 2, the Datagrams are lost to the maximum extent. This is the normal situation that is present in UDP Communication.

In CASE 1, the Datagrams are maintained and delivered to the receiver (as there will be flow control). Consider the following example for CASE 2: The client sent 2,000 datagrams, but the server application received only 30 of these, for a 98% loss rate. is no indication whatsoever to the server application or to the client application that these datagrams were As we have said, UDP has no flow control and it is unreliable. It is trivial, as we have shown, for a UDP sender overrun the receiver. If we look at the netstat output, the total number of datagrams received by the server host (not the server application) is 2,000 (73,208 - 71,208). The counter "dropped due to full socket
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
74 | P a g e NETWORKPROGRAMMING

buffers" indicates how many datagrams were received by UDP but were discarded because the receiving socket's receive queue was full 775 of TCPv2). This value is 1,970 (3,491 1,971), which when added to the counter output by the application.

The following Output specifies this:

THE FIRST SET OF LINES IS WHEN THE DATAGRAMS ARE NOT YET OBTAINED AT THE CLIENT SIDE (BEFORE THIS COMMUNICATION).

THE SECOND SET OF LINES IS WHEN DATAGRAMS ARE COMMUNICATED IN THIS (CURRENT) COMMUNICATION.

This specifies clearly that there is lack of flow control with the UDP Service.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
75 | P a g e NETWORKPROGRAMMING

Determining Outgoing Interface with UDP


A connected UDP socket can also be used to determine the outgoing interface that will be used to a particular destination. This is because of a side effect of the connect function when applied to a UDP socket: The kernel chooses the local IP address (assuming the process has not already called bind to explicitly assign this). This local IP address is chosen by searching the routing table for the destination IP address, and then using the primary IP address for the resulting interface.

In the above figure, UDP Client connects with the UDP Server using bind(). But, in order for the datagrams to move from UDP Client to UDP Server, they should move through intermediate routers. So, PEER System now becomes R1 but not UDP Server. This is because we are using connect() within the UDP communication.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
76 | P a g e NETWORKPROGRAMMING

UNIT-VI Elementary UDP Sockets


All the examples so far in this text have used numeric addresses for the hosts (e.g., 206.6.226.33) and numeric port numbers to identify the servers (e.g., port 13 for the standard daytime server and port 9877 for our echo server). We should, however, use names instead of numbers for numerous reasons: Names are easier to remember; the numeric address can change but the name can remain the same; and with the move to IPv6, numeric addresses become much longer, making it much more error-prone to enter an address by hand. This chapter describes the functions that convert between names and numeric values: gethostbyname and gethostbyaddr to convert between hostnames and IPv4 addresses, and getservbyname and getservbyport to convert between service names and port numbers.

Domain Name System (DNS)


The DNS is used primarily to map between hostnames and IP addresses. A hostname can be either a simple name, such as solaris or freebsd, or a fully qualified domain name '(FQDN), such as solaris.unpbook.com. Technically, an FQDN is also called an absolute name and must end with a period, but users often omit the ending period. The trailing period tells the resolver that this name is fully qualified and it doesn't need to search its list of possible domains.

Resource Records
Entries in the DNS are known as resource records (RRs). There are only a few types of RRs that we are interested in.

A AAAA

A record maps a hostname into a 32-bit IPv4 address. A AAAA record, called a "quad A" record, maps a hostname into a 128-bit IPv6 address. The term "quad A" was chosen because a 128-bit address is four times larger than a 32-bit address.

PTR

PTR records (called "pointer records") map IP addresses into hostnames. For an IPv4 address, then 4 bytes of the 32-bit address is reversed, each byte is
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
77 | P a g e NETWORKPROGRAMMING

converted to its decimal ASCII value (0255), and in-addr.arpa is the appended. The resulting string is used in the PTR query. For an IPv6 address, the 32 4-bit nibbles of the 128-bit address are reversed, each nibble is converted to its corresponding hexadecimal ASCII value (09af), and ip6.arpa is appended. MX An MX record specifies a host to act as a "mail exchanger" for the specified host. In the example for the host freebsd above, two MX records are provided: The first has a preference value of 5 and the second has a preference value of 10. When multiple MX records exist, they are used in order of preference, starting with the smallest value. CNAME CNAME stands for "canonical name." A common use is to assign CNAM records for common services, such as ftp and www. If people use these service names instead of the actual hostnames, it is transparent when a service is moved to another host. For example, the following could be CNAMEs for our host linux: ftp IN CNAME linux.unpbook.com. www IN CNAME linux.unpbook.com.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
78 | P a g e NETWORKPROGRAMMING

Resolvers and Name Servers


Organizations run one or more name servers, often the program known as BIND (Berkeley Internet Name Domain). Applications such as the clients and servers that we are writing in this text contact a DNS server by calling functions in a library known as the resolver. The common resolver functions are gethostbyname and gethostbyaddr, both of which are described in this chapter. The former maps a hostname into its IPv4 addresses, and the latter does the reverse mapping. The figure below shows a typical arrangement of applications, resolvers, and name servers. We now write the application code. On some systems, the resolver code is contained in a system library and is link-edited into the application when the application is built. On others, there is a centralized resolver daemon that all applications share, and the system library code performs RPCs to this daemon. In either case, application code calls the resolver code using normal function calls, typically calling the functions gethostbyname and gethostbyaddr.

The resolver code reads its system-dependent configuration files to determine the location of the organization's name servers. (We use the plural "name servers" because most organizations run multiple name servers, even though we show only one local server in the figure. Multiple name servers are absolutely required for reliability and redundancy.) The file /etc/resolv.conf normally contains the IP addresses of the local name servers.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
79 | P a g e NETWORKPROGRAMMING

It might be nice to use the names of the name servers in the /etc/resolv.conf file, since the names are easier to remember and configure, but this introduces a chicken-and-egg problem of where to go to do the name-to-address conversion for the server that will do the name and address conversion! The resolver sends the query to the local name server using UDP. If the local name server does not know the answer, it will normally query other name servers across the Internet, also using UDP. If the answers are too large to fit in a UDP packet, the resolver will automatically switch to TCP.

gethostbyname Function (Returns: IPV4 Address)


Host computers are normally known by human-readable names. All the examples that we have shown so far in this book have intentionally used IP addresses instead of names, so we know exactly what goes into the socket address structures for functions such as connect and sendto, and what is returned by functions such as accept and recvfrom. But, most applications should deal with names, not addresses. This is especially true as we move to IPv6, since IPv6 addresses (hex strings) are much longer than IPv4 dotted-decimal numbers. (The example AAAA record and ip6.arpa PTR record in the previous section should make this obvious.) The most basic function that looks up a hostname is gethostbyname. If successful, it returns a pointer to a hostent structure that contains all the IPv4 addresses for the host. However, it is limited in that it can only return IPv4 addresses.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
80 | P a g e NETWORKPROGRAMMING

gethostbyname differs from the other socket functions that we have described in that it does not set errno when an error occurs. Instead, it sets the global integer h_errno to one of the following constants defined by including <netdb.h>:

HOST_NOT_FOUND TRY_AGAIN NO_RECOVERY NO_DATA (identical to NO_ADDRESS)

gethostbyaddr Function (Returns:Hostname)


The function gethostbyaddr takes a binary IPv4 address and tries to find the hostname corresponding to that address. This is the reverse of gethostbyname.

This function returns a pointer to the same hostent structure that we described with gethostbyname. The field of interest in this structure is normally h_name, the canonical hostname.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
81 | P a g e NETWORKPROGRAMMING

The addr argument is not a char*, but is really a pointer to an in_addr structure containing the IPv4 address. len is the size of this structure: 4 for an IPv4 address. The family argument is AF_INET.In terms of the DNS, gethostbyaddr queries a name server for a PTR record in the inaddr.arpa domain.

getservbyname and getservbyport Functions (Returns: Port Number and Service Name)
Services, like hosts, are often known by names, too. If we refer to a service by its name in our code, instead of by its port number, and if the mapping from the name to port number is contained in a file (normally /etc/services), then if the port number changes, all we need to modify is one line in the /etc/services file instead of having to recompile the applications. The next function, getservbyname, looks up a service given its name.

The service name servname must be specified. If a protocol is also specified (protoname is a non-null pointer), then the entry must also have a matching protocol. Some Internet services
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
82 | P a g e NETWORKPROGRAMMING

are provided using either TCP or UDP, while others support only a single protocol (e.g., FTP requires TCP). If protoname is not specified and the service supports multiple protocols, it is implementation-dependent as to which port number is returned. Normally this does not matter, because services that support multiple protocols often use the same TCP and UDP port number,but this is not guaranteed.

The main field of interest in the servent structure is the port number. Since the port number is returned in network byte order, we must not call htons when storing this into a socket address structure.

The next function, getservbyport, looks up a service given its port number and an optional protocol.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
83 | P a g e NETWORKPROGRAMMING

UNIT-VII INTER PROCESS COMMUNICATION


In computing, Inter-process communication (IPC) is a set of methods for the exchange of data among multiple threads in one or more processes. Processes may be running on one or more computers connected by a network. IPC methods are divided into methods for message passing, synchronization, shared memory, and remote procedure calls (RPC). The method of IPC used may vary based on the bandwidth and latency of communication between the threads, and the type of data being communicated.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
84 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
85 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
86 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
87 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
88 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
89 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
90 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
91 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
92 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
93 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
94 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
95 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
96 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
97 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
98 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
99 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
100 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
101 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
102 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
103 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
104 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
105 | P a g e NETWORKPROGRAMMING

File Locking
File locking provides a very simple yet incredibly useful mechanism for coordinating file accesses. Before I begin to lay out the details, let me fill you in on some file locking secrets: There are two types of locking mechanisms: mandatory and advisory. Mandatory systems will actually prevent read()s and write()s to file. Several Unix systems support them. Nevertheless, I'm going to ignore them throughout this document, preferring instead to talk solely about advisory locks. With an advisory lock system, processes can still read and write from a file while it's locked. Useless? Not quite, since there is a way for a process to check for the existence of a lock before a read or write. See, it's a kind of cooperative locking system. This is easily sufficient for almost all cases where file locking is necessary. Since that's out of the way, whenever I refer to a lock from now on in this document, I'm referring to advisory locks. So there.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
106 | P a g e NETWORKPROGRAMMING

Now, let me break down the concept of a lock a little bit more. There are two types of (advisory!) locks: read locks and write locks (also referred to as shared locks and exclusive locks, respectively.) The way read locks work is that they don't interfere with other read locks. For instance, multiple processes can have a file locked for reading at the same. However, when a process has an write lock on a file, no other process can activate either a read or write lock until it is relinquished. One easy way to think of this is that there can be multiple readers simultaneously, but there can only be one writer at a time. One last thing before beginning: there are many ways to lock files in Unix systems. System V likes lockf(), which, personally, I think sucks. Better systems supportflock() which offers better control over the lock, but still lacks in certain ways. For portability and for completeness, I'll be talking about how to lock files usingfcntl(). I encourage you, though, to use one of the higher-level flock()-style functions if it suits your needs, but I want to portably demonstrate the full range of power you have at your fingertips. (If your System V Unix doesn't support the POSIX-y fcntl(), you'll have to reconcile the following information with yourlockf() man page.)

Setting a lock
The fcntl() function does just about everything on the planet, but we'll just use it for file locking. Setting the lock consists of filling out a struct flock (declared in fcntl.h) that describes the type of lock needed, open()ing the file with the matching mode, and calling fcntl() with the proper arguments.
struct flock fl; int fd; fl.l_type fl.l_whence fl.l_start fl.l_len fl.l_pid = = = = = F_WRLCK; SEEK_SET; 0; 0; getpid(); /* /* /* /* /* F_RDLCK, F_WRLCK, F_UNLCK SEEK_SET, SEEK_CUR, SEEK_END Offset from l_whence length, 0 = to EOF our PID */ */ */ */ */

fd = open("filename", O_WRONLY); fcntl(fd, F_SETLKW, &fl); /* F_GETLK, F_SETLK, F_SETLKW */

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
107 | P a g e NETWORKPROGRAMMING

What just happened? Let's start with the struct flock since the fields in it are used to describe the locking action taking place. Here are some field definitions:
l_type

This is where you signify the type of lock you want to set. It's either F_RDLCK, F_WRLCK, or F_UNLCK if you want to set a read lock, write lock, or clear the lock, respectively. This field determines where the l_start field starts from (it's like an offset for the offset). It can be either SEEK_SET, SEEK_CUR, or SEEK_END, for beginning of file, current file position, or end of file. This is the starting offset in bytes of the lock, relative to l_whence. This is the length of the lock region in bytes (which starts from l_start which is relative to l_whence. The process ID of the process dealing with the lock. Use getpid() to get this.

l_whence

l_start l_len l_pid

The next step is to open() the file, since flock() needs a file descriptor of the file that's being locked. Note that when you open the file, you need to open it in the same mode as you have specified in the lock, as shown in the table, below. If you open the file in the wrong mode for a given lock type, fcntl() will return -1 and errno will be set to EBADF.
l_type

mode

F_RDLCK O_RDONLY or O_RDWR F_WRLCK O_WRONLY or O_RDWR

Finally, the call to fcntl() actually sets, clears, or gets the lock. See, the second argument (the cmd) to fcntl() tells it what to do with the data passed to it in the struct flock. The following list summarizes what each fcntl() cmd does:
F_SETLKW

This argument tells fcntl() to attempt to obtain the lock requested in the struct flock structure. If the lock cannot be obtained (since someone else has it locked already), fcntl() will wait (block) until the lock has cleared, then will set it itself. This is a very useful command. I use it all the time. This function is almost identical to F_SETLKW. The only difference is that this one will not wait if it cannot obtain a lock. It will return immediately with -1. This function can be used to clear a lock by setting the l_type field in the struct flock to F_UNLCK.

F_SETLK

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
108 | P a g e NETWORKPROGRAMMING

F_GETLK

If you want to only check to see if there is a lock, but don't want to set one, you can use this command. It looks through all the file locks until it finds one that conflicts with the lock you specified in the struct flock. It then copies the conflicting lock's information into the struct and returns it to you. If it can't find a conflicting lock, fcntl() returns the struct as you passed it, except it sets the l_type field to F_UNLCK.

In our above example, we call fcntl() with F_SETLKW as the argument, so it blocks until it can set the lock, then sets it and continues.

Clearing a lock
Whew! After all the locking stuff up there, it's time for something easy: unlocking! Actually, this is a piece of cake in comparison. I'll just reuse that first example and add the code to unlock it at the end:
struct flock fl; int fd;

fl.l_type

= F_WRLCK;

/* F_RDLCK, F_WRLCK, F_UNLCK

*/

fl.l_whence = SEEK_SET; /* SEEK_SET, SEEK_CUR, SEEK_END */ fl.l_start fl.l_len fl.l_pid = 0; = 0; /* Offset from l_whence /* length, 0 = to EOF */ */ */

= getpid(); /* our PID

fd = open("filename", O_WRONLY); fcntl(fd, F_SETLKW, &fl); . . . fl.l_type = F_UNLCK;

/* get the file descriptor */

/* set the lock, waiting if necessary */

/* tell it to unlock the region */

fcntl(fd, F_SETLK, &fl); /* set the region to unlocked */

Now, I left the old locking code in there for high contrast, but you can tell that I just changed the l_type field to F_UNLCK (leaving the others completely unchanged!) and

called fcntl() withF_SETLK as the command. Easy!

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
109 | P a g e NETWORKPROGRAMMING

FILE LOCKING-A DEMO PROGRAM:


#include <stdio.h> #include <stdlib.h> #include <errno.h> #include <fcntl.h> #include <unistd.h>

int main(int argc, char *argv[]) { /* l_type l_whence l_start 0, l_len 0, l_pid 0 }; */

struct flock fl = {F_WRLCK, SEEK_SET, int fd;

fl.l_pid = getpid();

if (argc > 1) fl.l_type = F_RDLCK;

if ((fd = open("lockdemo.c", O_RDWR)) == -1) { perror("open"); exit(1); }

printf("Press <RETURN> to try to get lock: "); getchar(); printf("Trying to get lock...");

if (fcntl(fd, F_SETLKW, &fl) == -1) { perror("fcntl"); exit(1); }

printf("got lock\n"); printf("Press <RETURN> to release lock: "); getchar();

fl.l_type = F_UNLCK;

/* set to unlock same region */

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
110 | P a g e NETWORKPROGRAMMING

if (fcntl(fd, F_SETLK, &fl) == -1) { perror("fcntl"); exit(1); }

printf("Unlocked.\n");

close(fd);

return 0; }

Compile that puppy up and start messing with it in a couple windows. Notice that when one lockdemo has a read lock, other instances of the program can get their own read locks with no problem. It's only when a write lock is obtained that other processes can't get a lock of any kind. Another thing to notice is that you can't get a write lock if there are any read locks on the same region of the file. The process waiting to get the write lock will wait until all the read locks are cleared. One upshot of this is that you can keep piling on read locks (because a read lock doesn't stop other processes from getting read locks) and any processes waiting for a write lock will sit there and starve. There isn't a rule anywhere that keeps you from adding more read locks if there is a process waiting for a write lock. You must be careful. Practically, though, you will probably mostly be using write locks to guarantee exclusive access to a file for a short amount of time while it's being updated; that is the most common use of locks

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
111 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
112 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
113 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
114 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
115 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
116 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
117 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
118 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
119 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
120 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
121 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
122 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
123 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
124 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
125 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
126 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
127 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
128 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
129 | P a g e NETWORKPROGRAMMING

Shared Memory Introduction


Introduction Shared memory is the fastest form of IPC available. Once the memory is mapped into the address space of the processes that are sharing the memory region, no kernel involvement occurs in passing data between the processes. What is normally required, however, is some form of synchronization between the processes that are storing and fetching information to and from the shared memory region.

Flow of file data from client to server


Consider the normal steps involved in the clientserver file copying program that we used as an example for the various types of message passing. The server reads from the input file. The file data is read by the kernel into its memory and then copied from the kernel to the process. The server writes this data in a message, using a pipe, FIFO, or message queue. These forms of IPC normally require the data to be copied from the process to the kernel. The client reads the data from the IPC channel, normally requiring the data to be copied from the kernel to the process. Finally, the data is copied from the clients buffer, the second argument to the write function, to the output file.

A total of four copies of the data are normally required. Additionally, these four copies are done between the kernel and a process, often an expensive copy (more expensive than copying data within the kernel, or copying data within a single process).
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
130 | P a g e NETWORKPROGRAMMING

The problem with these forms of IPCpipes, FIFOs, and message queuesis that for two processes to exchange information, the information has to go through the kernel.

Shared memory provides a way around this by letting two or more processes share a region of memory. The processes must, of course, coordinate or synchronize the use of the shared memory among themselves. (Sharing a common piece of memory is similar to sharing a disk file, such as the sequence number file used in all the file locking examples.)

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
131 | P a g e NETWORKPROGRAMMING

Shared Memory System Calls

The sections that follow will explore the shared memory system calls and discuss how they were applied to this utility program. The discussion covers the following areas: Creating and Accessing Shared Memory Obtaining Information about Shared Memory Changing Shared Memory Attributes Attaching Shared Memory Detaching Shared Memory Destroying Shared Memory

Shared memory must be created, or it must be located if another process has already created it. The program is given an IPC ID to refer to when it has been created or located. Once you have this IPC ID, it is possible to inquire about the shared memory region attributes and change some of them, such as the ownership and permissions. Before shared memory can be read from or written to, it must be attached to the memory space of your current process. This involves the selection of a starting address for your shared memory region.

When a process is finished with a shared memory region, it is able to detach it from its memory space. Once all processes have finished with the shared memory region and detached it, the region can be destroyed to give the memory back to the kernel.

Creating and Accessing Shared Memory

Shared memory is created and accessed if it already exists using the shmget(2) function. Its function synopsis is as follows: #include <sys/types.h> #include <sys/ipc.h> #include <sys/shm.h> int shmget(key_t key, int size, int flag);

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
132 | P a g e NETWORKPROGRAMMING

The argument key is the value of the IPC key to use, or the value IPC_PRIVATE. The size argument specifies the minimum size of the shared memory region required. The actual size created will be rounded up to a platform-specific multiple of a virtual memory page size. The flag option must contain the permission bits if shared memory is being created. Additional flags that may be used include IPC_CREAT and IPC_EXCL, when shared memory is being created.

The return value is the IPC ID of the shared memory region when the call is successful (this includes the value zero). The value -1 is returned if the call fails, with errno set.

Obtaining Information About Shared Memory

Attributes of the shared memory, including its permissions and actual size, are obtained using the shmctl(2) system call. Its function synopsis is as follows:

#include <sys/types.h> #include <sys/ipc.h> #include <sys/shm.h> int shmctl(int shmid, int cmd, struct shmid_ds *buf);

The argument shmid specifies the shared memory IPC ID, which is obtained from shmget(2). The cmd is a shmctl(2) command value, while buf is an argument used with certain commands. The valid commands for shmctl(2) are:

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
133 | P a g e NETWORKPROGRAMMING

Attaching Shared Memory Shared memory must be attached to your process memory space before you can use it as memory. This is performed by calling upon shmat(2):

#include <sys/types.h> #include <sys/ipc.h> #include <sys/shm.h> void * shmat(int shmid, void *addr, int flag);

The argument shmid specifies the IPC ID of the shared memory that you want to attach to your process. The argument addr indicates the address that you want to use for this. A null pointer for addr specifies that the UNIX kernel should pick the address instead. The flag argument permits the option flag SHM_RND to be specified. Specify 0 for flag if no options apply.

When shmat(2) succeeds, a (void *) address is returned that represents the starting address of the shared memory region. If the function fails, the value (void *)(-1) is returned instead.

The combination of the addr and the flag option SHM_RND allow three possible ways for the memory region to be attached:

Detaching Shared Memory

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
134 | P a g e NETWORKPROGRAMMING

Detaching shared memory is automatically performed when your process terminates. However, if you need to detach it before it terminates, you accomplish that with the shmdt(2) function:

#include <sys/types.h> #include <sys/ipc.h> #include <sys/shm.h> int shmdt(void *addr);

The shmdt(2) function simply accepts the address of the shared memory, as it was attached by shmat(2), in argument addr. The return value is 0 when successful. Otherwise, -1 is returned and errno holds the error code. Destroying Shared Memory The IPC_RMID command of shmctl(2) being used. The critical lines of code are repeated here for your convenience: 41: /* 42: * Destroy shared memory : 43: */ 44: z = shmctl(shmid,IPC_RMID,NULL); 45: 46: if ( z == -1 ) { 47: fprintf(stderr,"%s: shmctl(%d,IPC_RMID)\n", 48: strerror(errno),shmid); 49: exit(1); 50: }

Notice that argument three (buf) is not required by the IPC_RMID command for shmctl(2). This code is exercised by the -r option of the globvar utility.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
135 | P a g e NETWORKPROGRAMMING

UNIT VIII REMOTE LOGIN


http://book.chinaunix.net/special/ebook/addisonWesley/APUE2/0201433079/ch18lev1sec1.html

Introduction
The handling of terminal I/O is a messy area, regardless of the operating system. The UNIX System is no exception. The manual page for terminal I/O is usually one of the longest in most editions of the programmer's manuals. With the UNIX System, a schism formed in the late 1970s when System III developed a different set of terminal routines from those of Version 7. The System III style of terminal I/O continued through System V, and the Version 7 style became the standard for the BSD-derived systems. As with signals, this difference between the two worlds has been conquered by POSIX.1. In this chapter, we look at all the POSIX.1 terminal functions and some of the platform-specific additions. Part of the complexity of the terminal I/O system occurs because people use terminal I/O for so many different things: terminals, hardwired lines between computers, modems, printers, and so on.

Terminal Line Discipline and Its Modes Overview


Terminal I/O has two modes: 1. Canonical mode input processing. In this mode, terminal input is processed as lines. The terminal driver returns at most one line per read request. 2. Noncanonical mode input processing. The input characters are not assembled into lines. If we don't do anything special, canonical mode is the default. For example, if the shell redirects standard input to the terminal and we use
read

and

write

to copy standard input returns at most one

to standard output, the terminal is in canonical mode, and each line. Programs that manipulate the entire screen, such as the
vi

read

editor, use noncanonical

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
136 | P a g e NETWORKPROGRAMMING

mode, since the commands may be single characters and are not terminated by newlines. Also, this editor doesn't want processing by the system of the special characters, since they may overlap with the editor commands. For example, the ControlD character is often the end-of-file character for the terminal, but it's also a to scroll down one-half screen. The Version 7 and older BSD-style terminal drivers supported three modes for terminal input: (a) cooked mode (the input is collected into lines, and the special characters are processed), (b) raw mode (the input is not assembled into lines, and there is no processing of special characters), and (c) cbreak mode (the input is not assembled into lines, but some of the special characters are processed). Figure 18.20shows a POSIX.1 function that places a terminal in cbreak or raw mode. POSIX.1 defines 11 special input characters, 9 of which we can change. We've been using some of these throughout the text: the end-of-file character (usually Control-D) and the suspend character (usually Control-Z), for example. Section 18.3 describes each of these characters. We can think of a terminal device as being controlled by a terminal driver, usually within the kernel. Each terminal device has an input queue and an output queue, shown in Figure 18.1.
vi

command

Figure 18.1. Logical picture of input and output queues for a terminal device

There are several points to consider from this picture. If echoing is enabled, there is an implied link between the input queue and the output queue.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
137 | P a g e NETWORKPROGRAMMING

The size of the input queue,

MAX_INPUT

(see Figure 2.11), is finite. When the input

queue for a particular device fills, the system behavior is implementation dependent. Most UNIX systems echo the bell character when this happens. There is another input limit,
MAX_CANON,

that we don't show here. This limit is the

maximum number of bytes in a canonical input line. Although the size of the output queue is finite, no constants defining that size are accessible to the program, because when the output queue starts to fill up, the kernel simply puts the writing process to sleep until room is available. We'll see how the
tcflush

flush function allows us to flush either the input queue or


tcsetattr

the output queue. Similarly, when we describe the

function, we'll see how

we can tell the system to change the attributes of a terminal device only after the output queue is empty. (We want to do this, for example, if we're changing the output attributes.) We can also tell the system to discard everything in the input queue when changing the terminal attributes. (We want to do this if we're changing the input attributes or changing between canonical and noncanonical modes, so that previously entered characters aren't interpreted in the wrong mode.) Most UNIX systems implement all the canonical processing in a module called the terminal line discipline. We can think of this module as a box that sits between the kernel's generic read and write functions and the actual device driver (see Figure 18.2).

Figure 18.2. Terminal line discipline

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
138 | P a g e NETWORKPROGRAMMING

All the terminal device characteristics that we can examine and change are contained in a
termios

structure. This structure is defined in the header

<termios.h>,

which we use

throughout this chapter: struct termios { tcflag_t tcflag_t tcflag_t tcflag_t cc_t }; Roughly speaking, the input flags control the input of characters by the terminal device driver (strip eighth bit on input, enable input parity checking, etc.), the output flags control the driver output (perform output processing, map newline to CR/LF, etc.), the control flags affect the RS-232 serial lines (ignore modem status lines, one or two stop bits per character, etc.), and the local flags affect the interface between the driver and the user (echo on or off, visually erase characters, enable terminal-generated signals, job control stop signal for background output, etc.). The type an
tcflag_t

c_iflag; c_oflag; c_cflag; c_lflag;

/* input flags */ /* output flags */ /* control flags */ /* local flags */

c_cc[NCCS]; /* control characters */

is big enough to hold each of the flag values and is often defined as
unsigned long.

unsigned int

or an

The

c_cc

array contains all the special characters that we

can change.

NCCS

is the number of elements in this array and is typically between 15 and type is large enough to hold each special character

20 (since most implementations of the UNIX System support more than the 11 POSIXdefined special characters). The and is typically an Versions named of
unsigned char. cc_t

System

that

predated

the

POSIX

standard
s

had

header

<termio.h>

and a structure named

termio.

POSIX.1 added an

to the names, to

differentiate them from their predecessors.

Overview Pseudo Terminal


The term pseudo terminal implies that it looks like a terminal to an application program, but it's not a real terminal. The diagram shows the typical arrangement of the processes

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

www.UandiStar.org
139 | P a g e NETWORKPROGRAMMING

involved when a pseudo terminal is being used. The key points in this figure are the following.

Typical arrangement of processes using a pseudo terminal

Normally, a process opens the pseudo-terminal master and then calls fork. The child establishes a new session, opens the corresponding pseudo-terminal slave, duplicates the file descriptor to the standard input, standard output, and standard error, and then calls exec. The pseudo-terminal slave becomes the controlling terminal for the child process.

It appears to the user process above the slave that its standard input, standard output, and standard error are a terminal device. The process can issue all the terminal I/O functions on these descriptors. But since there is not a real terminal device beneath the slave, functions that don't make sense (change the baud rate, send a break character, set odd parity, etc.) are just ignored.

Anything written to the master appears as input to the slave and vice versa. Indeed, all the input to the slave comes from the user process above the pseudo-terminal master. This behaves like a bidirectional pipe, but with the terminal line discipline module above the slave, we have additional capabilities over a plain pipe.
http://book.chinaunix.net/special/ebook/addisonWesley/APUE2/0201433079/main.html Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

Potrebbero piacerti anche