Network Buffers And Memory Management
Reprinted with permission of Linux Journal, from issue 29, September 1996. Some changes have been made to accomodate the web. This article was originally written for the Kernel Korner column. The Kernel Korner series has included many other articles of interest to Linux kernel hackers, as well.
by Alan Cox
The HyperNews Linux KHG Discussion Pages
Network Buffers And Memory Management
Reprinted with permission of Linux Journal, from issue 29, September 1996. Some changes have been made to accomodate the web. This article was originally written for the Kernel Korner column. The Kernel Korner series has included many other articles of interest to Linux kernel hackers, as well.
by Alan Cox
The Linux operating system implements the industry-standard Berkeley socket API, which has its origins in the BSD unix developments (4.2/4.3/4.4 BSD). In this article, we will look at the way the memory management and buffering is implemented for network layers and network device drivers under the existing Linux kernel, as well as explain how and why some things have changed over time.
Core Concepts
The networking layer tries to be fairly object-oriented in its design, as indeed is much of the Linux kernel. The core structure of the networking code goes back to the initial networking and socket implementations by Ross Biro and Orest Zborowski respectively. The key objects are:
- Device or Interface:
- A network interface represents a thing which sends and receives packets. This is normally interface code for a physical device like an ethernet card. However some devices are software only such as the loopback device which is used for sending data to yourself.
- Protocol:
- Each protocol is effectively a different language of networking. Some protocols exist purely because vendors chose to use proprietary networking schemes, others are designed for special purposes. Within the Linux kernel each protocol is a seperate module of code which provides services to the socket layer.
- Socket:
- So called from the notion of plugs and sockets. A socket is a connection in the networking that provides unix file I/O and exists to the user program as a file descriptor. In the kernel each socket is a pair of structures that represent the high level socket interface and low level protocol interface.
- sk_buff:
- All the buffers used by the networking layers are sk_buffs. The control for these is provided by core low-level library routines available to the whole of the networking. sk_buffs provide the general buffering and flow control facilities needed by network protocols.
Implementation of sk_buffs
The primary goal of the sk_buff routines is to provide a consistent and efficient buffer handling method for all of the network layers, and by being consistent to make it possible to provide higher level sk_buff and socket handling facilities to all the protocols.
An sk_buff is a control structure with a block of memory attached. There are two primary sets of functions provided in the sk_buff library. Firstly routines to manipulate doubly linked lists of sk_buffs, secondly functions for controlling the attached memory. The buffers are held on linked lists optimised for the common network operations of append to end and remove from start. As so much of the networking functionality occurs during interrupts these routines are written to be atomic. The small extra overhead this causes is well worth the pain it saves in bug hunting.
We use the list operations to manage groups of packets as they arrive from the network, and as we send them to the physical interfaces. We use the memory manipulation routines for handling the contents of packets in a standardised and efficient manner.
At its most basic level, a list of buffers is managed using functions like this:
void append_frame(char *buf, int len)
{
struct sk_buff *skb=alloc_skb(len, GFP_ATOMIC);
if(skb==NULL)
my_dropped++;
else
{
skb_put(skb,len);
memcpy(skb->data,data,len);
skb_append(&my_list, skb);
}
}
void process_queue(void)
{
struct sk_buff *skb;
while((skb=skb_dequeue(&my_list))!=NULL)
{
process_data(skb);