全部博文(22)
分类: C/C++
2011-12-26 17:19:58
tcp_socket = socket(PF_INET, SOCK_STREAM, 0);
raw_socket = socket(PF_INET, SOCK_RAW, protocol);
udp_socket = socket(PF_INET, SOCK_DGRAM, protocol);
The programmer's interface is BSD sockets compatible. For more information on sockets, see (7).
An IP socket is created by calling the (2) function as socket(PF_INET, socket_type, protocol). Valid socket types are SOCK_STREAM to open a (7) socket, SOCK_DGRAM to open a (7) socket, or SOCK_RAW to open a (7) socket to access the IP protocol directly. protocol is the IP protocol in the IP header to be received or sent. The only valid values for protocol are 0 and IPPROTO_TCP for TCP sockets and 0 and IPPROTO_UDP for UDP sockets. For SOCK_RAW you may specify a valid IANA IP protocol defined in RFC1700 assigned numbers.
When a process wants to receive new incoming packets or connections, it should bind a socket to a local interface address using (2). Only one IP socket may be bound to any given local (address, port) pair. When INADDR_ANY is specified in the bind call the socket will be bound to all local interfaces. When (2) or (2) are called on a unbound socket the socket is automatically bound to a random free port with the local address set to INADDR_ANY.
A TCP local socket address that has been bound is unavailable for some time after closing, unless the SO_REUSEADDR flag has been set. Care should be taken when using this flag as it makes TCP less reliable.
ADDRESS FORMAT An IP socket address is defined as a combination of an IP interface address and a port number. The basic IP protocol does not supply port numbers, they are implemented by higher level protocols like (7) and (7). On raw sockets sin_port is set to the IP protocol.
sin_family is always set to AF_INET. This is required; in Linux 2.2 most networking functions return EINVAL when this setting is missing. sin_port contains the port in network byte order. The port numbers below 1024 are called reserved ports. Only processes with effective user id 0 or the CAP_NET_BIND_SERVICE capability may (2) to these sockets. Note that the raw IPv4 protocol as such has no concept of a port, they are only implemented by higher protocols like (7) and (7).
sin_addr is the IP host address. The addr member of struct in_addr contains the host interface address in network order. in_addr should be only accessed using the (3), (3), (3) library functions or directly with the name resolver (see (3)). IPv4 addresses are divided into unicast, broadcast and multicast addresses. Unicast addresses specify a single interface of a host, broadcast addresses specify all hosts on a network and multicast addresses address all hosts in a multicast group. Datagrams to broadcast addresses can be only sent or received when the SO_BROADCAST socket flag is set. In the current implementation connection oriented sockets are only allowed to use unicast addresses.
Note that the address and the port are always stored in network order. In particular, this means that you need to call (3) on the number that is assigned to a port. All address/port manipulation functions in the standard library work in network order.
There are several special addresses: INADDR_LOOPBACK (127.0.0.1) always refers to the local host via the loopback device; INADDR_ANY (0.0.0.0) means any address for binding; INADDR_BROADCAST (255.255.255.255) means any host and has the same effect on bind as INADDR_ANY for historical reasons.
SOCKET OPTIONS
IP supports some protocol specific socket options that can be set with (2) and read with (2). The socket option level for IP is SOL_IP. A boolean integer flag is zero when it is false, otherwise true.
IP_OPTIONS Sets or get the IP options to be sent with every packet from this socket. The arguments are a pointer to a memory buffer containing the options and the option length. The (2) call sets the IP options associated with a socket. The maximum option size for IPv4 is 40 bytes. See RFC791 for the allowed options. When the initial connection request packet for a SOCK_STREAM socket contains IP options, the IP options will be set automatically to the options from the initial packet with routing headers reversed. Incoming packets are not allowed to change options after the connection is established. The processing of all incoming source routing options is disabled by default and can be enabled by using the accept_source_route sysctl. Other options like timestamps are still handled. For datagram sockets, IP options can be only set by the local user. Calling (2) with IP_OPTIONS puts the current IP options used for sending into the supplied buffer.
IP_PKTINFO Pass an IP_PKTINFO ancillary message that contains a pktinfo structure that supplies some information about the incoming packet. This only works for datagram oriented sockets. The argument is a flag that tells the socket whether the IP_PKTINFO message should be passed or not. The message itself can only be sent/retrieved as control message with a packet using (2) or (2).
IP_RECVTTL When this flag is set pass a IP_RECVTTL control message with the time to live field of the received packet as a byte. Not supported for SOCK_STREAM sockets.
IP_RECVOPTS Pass all incoming IP options to the user in a IP_OPTIONS control message. The routing header and other options are already filled in for the local host. Not supported for SOCK_STREAM sockets.
IP_RETOPTS Identical to IP_RECVOPTS but returns raw unprocessed options with timestamp and route record options not filled in for this hop.
IP_TOS Set or receive the Type-Of-Service (TOS) field that is sent with every IP packet originating from this socket. It is used to prioritize packets on the network. TOS is a byte. There are some standard TOS flags defined: IPTOS_LOWDELAY to minimize delays for interactive traffic, IPTOS_THROUGHPUT to optimize throughput, IPTOS_RELIABILITY to optimize for reliability, IPTOS_MINCOST should be used for "filler data" where slow transmission doesn't matter. At most one of these TOS values can be specified. Other bits are invalid and shall be cleared. Linux sends IPTOS_LOWDELAY datagrams first by default, but the exact behaviour depends on the configured queueing discipline. Some high priority levels may require an effective user id of 0 or the CAP_NET_ADMIN capability. The priority can also be set in a protocol independent way by the (SOL_SOCKET, SO_PRIORITY) socket option (see (7)).
IP_TTL Set or retrieve the current time to live field that is send in every packet send from this socket.
IP_HDRINCL If enabled the user supplies an ip header in front of the user data. Only valid for SOCK_RAW sockets. See (7) for more information. When this flag is enabled the values set by IP_OPTIONS, IP_TTL and IP_TOS are ignored.
IP_RECVERR (defined in <>) Enable extended reliable error message passing. When enabled on a datagram socket all generated errors will be queued in a per-socket error queue. When the user receives an error from a socket operation the errors can be received by calling (2) with the MSG_ERRQUEUE flag set. The sock_extended_err structure describing the error will be passed in a ancillary message with the type IP_RECVERR and the level SOL_IP. This is useful for reliable error handling on unconnected sockets. The received data portion of the error queue contains the error packet.
IP_MTU_DISCOVER Sets or receives the Path MTU Discovery setting for a socket. When enabled, Linux will perform Path MTU Discovery as defined in RFC1191 on this socket. The don't fragment flag is set on all outgoing datagrams. The system-wide default is controlled by the ip_no_pmtu_disc sysctl for SOCK_STREAM sockets, and disabled on all others. For non SOCK_STREAM sockets it is the user's responsibility to packetize the data in MTU sized chunks and to do the retransmits if necessary. The kernel will reject packets that are bigger than the known path MTU if this flag is set (with EMSGSIZE ).
Path MTU discovery flags | Meaning |
IP_PMTUDISC_WANT | Use per-route settings. |
IP_PMTUDISC_DONT | Never do Path MTU Discovery. |
IP_PMTUDISC_DO | Always do Path MTU Discovery. |
When PMTU discovery is enabled the kernel automatically keeps track of the path MTU per destination host. When it is connected to a specific peer with (2) the currently known path MTU can be retrieved conveniently using the IP_MTU socket option (e.g. after a EMSGSIZE error occurred). It may change over time. For connectionless sockets with many destinations the new also MTU for a given destination can also be accessed using the error queue (see IP_RECVERR). A new error will be queued for every incoming MTU update.
While MTU discovery is in progress initial packets from datagram sockets may be dropped. Applications using UDP should be aware of this and not take it into account for their packet retransmit strategy.
To bootstrap the path MTU discovery process on unconnected sockets it is possible to start with a big datagram size (up to 64K-headers bytes long) and let it shrink by updates of the path MTU.
To get an initial estimate of the path MTU connect a datagram socket to the destination address using (2) and retrieve the MTU by calling (2) with the IP_MTU option.
IP_MTU Retrieve the current known path MTU of the current socket. Only valid when the socket has been connected. Returns an integer. Only valid as a (2). IP_ROUTER_ALERT Pass all to-be forwarded packets with the IP Router Alert option set to this socket. Only valid for raw sockets. This is useful, for instance, for user space RSVP daemons. The tapped packets are not forwarded by the kernel, it is the users responsibility to send them out again. Socket binding is ignored, such packets are only filtered by protocol. Expects an integer flag. IP_MULTICAST_TTL Set or reads the time-to-live value of outgoing multicast packets for this socket. It is very important for multicast packets to set the smallest TTL possible. The default is 1 which means that multicast packets don't leave the local network unless the user program explicitly requests it. Argument is an integer. IP_MULTICAST_LOOP Sets or reads a boolean integer argument whether sent multicast packets should be looped back to the local sockets. IP_ADD_MEMBERSHIP Join a multicast group. Argument is a struct ip_mreqn structure.
When this boolean frag is enabled (not equal 0) incoming fragments (parts of IP packets that arose when some host between origin and destination decided that the packets were too large and cut them into pieces) will be reassembled (defragmented) before being processed, even if they are about to be forwarded.
Only enable if running either a firewall that is the sole link to your network or a transparent proxy; never ever turn on here for a normal router or host. Otherwise fragmented communication may me disturbed when the fragments would travel over different links. Defragmentation also has a large memory and CPU time cost.
This is automagically turned on when masquerading or transparent proxying are configured.
neigh/* See (7). IOCTLS All ioctls described in (7) apply to ip.The ioctls to configure firewalling are documented in (4) from the ipchains package.
Ioctls to configure generic device parameters are described in netdevice(7).
NOTES Be very careful with the SO_BROADCAST option - it is not privileged in Linux. It is easy to overload the network with careless broadcasts. For new application protocols it is better to use a multicast group instead of broadcasting. Broadcasting is discouraged.Some other BSD sockets implementations provide IP_RCVDSTADDR and IP_RECVIF socket options to get the destination address and the interface of received datagrams. Linux has the more general IP_PKTINFO for the same task.
ERRORS ENOTCONN The operation is only defined on a connected socket, but the socket wasn't connected. EINVAL Invalid argument passed. For send operations this can be caused by sending to a blackhole route. EMSGSIZE Datagram is bigger than an MTU on the path and it cannot be fragmented. EACCES The user tried to execute an operation without the necessary permissions. These include: Sending a packet to a broadcast address without having the SO_BROADCAST flag set. Sending a packet via a prohibit route. Modifying firewall settings without CAP_NET_ADMIN or effective user id 0. Binding to a reserved port without the CAP_NET_BIND_SERVICE capacibility or effective user id 0.
EADDRINUSE Tried to bind to an address already in use. ENOPROTOOPT and EOPNOTSUPP Invalid socket option passed. EPERM User doesn't have permission to set high priority, change configuration, or send signals to the requested process or group. EADDRNOTAVAIL A non-existent interface was requested or the requested source address was not local. EAGAIN Operation on a non-blocking socket would block. ESOCKTNOSUPPORT The socket is not configured or an unknown socket type was requested. EISCONN (2) was called on an already connected socket. EALREADY An connection operation on a non-blocking socket is already in progress. ECONNABORTED A connection was closed during an (2). EPIPE The connection was unexpectedly closed or shut down by the other end. ENOENT SIOCGSTAMP was called on a socket where no packet arrived. EHOSTUNREACH No valid routing table entry matches the destination address. This error can be caused by a ICMP message from a remote router or for the local routing table. ENODEV Network device not available or not capable of sending IP. ENOPKG A kernel subsystem was not configured. ENOBUFS, ENOMEM Not enough free memory. This often means that the memory allocation is limited by the socket buffer limits, not by the system memory, but this is not 100% consistent.
Other errors may be generated by the overlaying protocols; see (7), (7), (7) and (7).
VERSIONS IP_PKTINFO, IP_MTU, IP_MTU_DISCOVER, IP_PKTINFO, IP_RECVERR and IP_ROUTER_ALERT are new options in Linux 2.2. They are also all Linux specific and should not be used in programs intended to be portable.struct ip_mreqn is new in Linux 2.2. Linux 2.0 only supported ip_mreq.
The sysctls were introduced with Linux 2.2.
COMPATIBILITY For compatibility with Linux 2.0, the obsolete socket(PF_INET, SOCK_RAW, protocol) syntax is still supported to open a (7) socket. This is deprecated and should be replaced by socket(PF_PACKET, SOCK_RAW, protocol) instead. The main difference is the new sockaddr_ll address structure for generic link layer information instead of the old sockaddr_pkt. BUGS There are too many inconsistent error values.The ioctls to configure IP-specific interface options and ARP tables are not described.
Some versions of glibc forget to declare in_pktinfo. Workaround currently is to copy it into your program from this man page.
Receiving the original destination address with MSG_ERRQUEUE in msg_name by (2) does not work in some 2.2 kernels.
SEE ALSO (2), (2), (4), (7), (7), (7), (7), (7)
RFC791 for the original IP specification.
RFC1122 for the IPv4 host requirements.
RFC1812 for the IPv4 router requirements.