Netstrings
D. J. Bernstein, djb@pobox.com
19970201
1. Introduction
1. 介绍
A netstring is a self-delimiting encoding of a string. Netstrings are
very easy to generate and to parse. Any string may be encoded as a
netstring; there are no restrictions on length or on allowed bytes.
Another virtue of a netstring is that it declares the string size up
front. Thus an application can check in advance whether it has enough
space to store the entire string.
netstring是个自定界编码方式的字符串。netstring非常易于生成和解析。
任何字符串都可编码为一个netstring;对长度或允许的字节没有限制。
netstring的另一个优点是,它预先声明字符串的长度。
这样应用能事先检查是否具有足够的空间存储整个字符串。
Netstrings may be used as a basic building block for reliable network
protocols. Most high-level protocols, in effect, transmit a sequence
of strings; those strings may be encoded as netstrings and then
concatenated into a sequence of characters, which in turn may be
transmitted over a reliable stream protocol such as TCP.
netstring可用作可靠的网络协议的基础构件。实际上,
多数高层协议传输一系列的字符串;那些字符串可被编码为netstring,
然后被连接成一系列的字符,这些字符可依次通过例如TCP这样的
可靠的流协议传输。
Note that netstrings can be used recursively. The result of encoding
a sequence of strings is a single string. A series of those encoded
strings may in turn be encoded into a single string. And so on.
注意,netstring能递归使用。编码一系列字符串的结果是单个字符串。
一连串这样编码的字符串可依次被编码在单个字符串中。依此类推。
In this document, a string of 8-bit bytes may be written in two
different forms: as a series of hexadecimal numbers between angle
brackets, or as a sequence of ASCII characters between double quotes.
For example, <68 65 6c 6c 6f 20 77 6f 72 6c 64 21> is a string of
length 12; it is the same as the string "hello world!".
在本文档中,8位字节组成的字符串可写为两种不同的形式:
作为尖括号中的一串十六进制数,或者作为双引号中的一系列ASCII字符。
例如,<68 65 6c 6c 6f 20 77 6f 72 6c 64 21>是长度为12的字符串;
它与字符串"hello world!"是一样的。
Although this document restricts attention to strings of 8-bit bytes,
netstrings could be used with any 6-bit-or-larger character set.
尽管本文档只关注8位字节的字符串,netstring也可用于6位或更高的字符集。
2. Definition
2. 定义
Any string of 8-bit bytes may be encoded as [len]":"[string]",".
Here [string] is the string and [len] is a nonempty sequence of ASCII
digits giving the length of [string] in decimal. The ASCII digits are
<30> for 0, <31> for 1, and so on up through <39> for 9. Extra zeros
at the front of [len] are prohibited: [len] begins with <30> exactly
when [string] is empty.
任何8位字节的字符串可被编码为[len]":"[string]","。
此处的[string]是字符串,[len]是非空的ASCII数字序列,
以十进制数形式给出[string]的长度。这些ASCII数字为:
<30>表示0,<31>表示1,等等直至<39>表示9。
禁止在[len]开头出现额外的0:仅当[string]为空时,[len]以<30>开始。
For example, the string "hello world!" is encoded as <31 32 3a 68
65 6c 6c 6f 20 77 6f 72 6c 64 21 2c>, i.e., "12:hello world!,". The
empty string is encoded as "0:,".
例如,字符串"hello world!"被编码为
<31 32 3a 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 2c>,
也就是"12:hello world!,"。空字符串被编码为"0:,"。
[len]":"[string]"," is called a netstring. [string] is called the
interpretation of the netstring.
[len]":"[string]","称为一个netstring。[string]称为netstring的解释。
3. Sample code
3. 示例代码
The following C code starts with a buffer buf of length len and
prints it as a netstring.
下面的C代码以一个长度为len的缓冲区buf开始,并将它打印为netstring。
if (printf("%lu:",len) < 0) barf();
if (fwrite(buf,1,len,stdout) < len) barf();
if (putchar(',') < 0) barf();
The following C code reads a netstring and decodes it into a
dynamically allocated buffer buf of length len.
下面的C代码读取一个netstring,并把它解码放进一个动态分配的
长度为len的缓冲区buf中。
if (scanf("%9lu",&len) < 1) barf(); /* >999999999 bytes is bad */
if (getchar() != ':') barf();
buf = malloc(len + 1); /* malloc(0) is not portable */
if (!buf) barf();
if (fread(buf,1,len,stdin) < len) barf();
if (getchar() != ',') barf();
/* >999999999 bytes is bad -> >999999999个字节是无效的
* malloc(0) is not portable -> malloc(0)不可移植
*/
Both of these code fragments assume that the local character set is
ASCII, and that the relevant stdio streams are in binary mode.
这两个代码片断都假设本地字符集是ASCII,并且相应的stdio流是二进制模式。
4. Security considerations
4. 安全性考虑
The famous Finger security hole may be blamed on Finger's use of the
CRLF encoding. In that encoding, each string is simply terminated by
CRLF. This encoding has several problems. Most importantly, it does
not declare the string size in advance. This means that a correct
CRLF parser must be prepared to ask for more and more memory as it is
reading the string. In the case of Finger, a lazy implementor found
this to be too much trouble; instead he simply declared a fixed-size
buffer and used C's gets() function. The rest is history.
著名的Finger安全漏洞可归咎于Finger对CRLF编码方式的使用。
在那种编码方式中,每个字符串只是用CRLF结尾。该方式有几个问题。
最重要的是,它没有预先声明字符串尺寸。这意味着必须准备好一个恰当的
CRLF解析器,它在读取字符串时会请求越来越多的内存。在Finger的情况中,
一个懒惰的实现者发现这样太麻烦;他改为只是声明一个固定尺寸的缓冲区
并使用C的gets()函数。其余的就是历史了。
In contrast, as the above sample code shows, it is very easy to
handle netstrings without risking buffer overflow. Thus widespread
use of netstrings may improve network security.
相较之下,如上面的示例代码所示,处理netstring非常容易,
而且没有缓冲区溢出的风险。因而netstring的广泛应用可改善网络安全性。