Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1053561
  • 博文数量: 326
  • 博客积分: 10135
  • 博客等级: 上将
  • 技术积分: 2490
  • 用 户 组: 普通用户
  • 注册时间: 2006-04-22 23:53
文章分类

全部博文(326)

文章存档

2014年(1)

2012年(4)

2011年(1)

2010年(4)

2009年(41)

2008年(44)

2007年(63)

2006年(168)

我的朋友

分类: LINUX

2009-12-07 15:31:33

Protocol Buffers
2009-03-11 14:20
Developer Guide

Welcome to the developer documentation for protocol buffers –
a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.

欢迎阅读protocol buffers的开发文档,protocol buffers是一个语言中立,平台中立,可扩展的序列化结构数据的方式
可用于通讯协议,数据存储等方面。

This documentation is aimed at Java, C++, or Python developers who want to use protocol buffers in their applications.
This overview introduces protocol buffers and tells you what you need to do to get started
– you can then go on to follow the tutorials or delve deeper into protocol buffer encoding.
API reference documentation is also provided for all three languages, as well as language and style guides for writing .proto files.

这篇文档的目标定位于使用protocol buffers开发的Java,C++或者Python开发人员。这个概述介绍了protocol Buffers并告诉你如何开始使用。
然后你就可以跟着教程继续深入研究prorocol buffer的编码。API参考文档也提供三种语言的。还给出了如何写.proto文件的文档。
--.proto文件估计是与语言无关的文件,类似CORBA的idl文件

What are protocol buffers?

Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data –
think XML, but smaller, faster, and simpler. You define how you want your data to be structured once,
then you can use special generated source code to easily write and read your structured data to and from a variety of data streams
and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format.

什么是protocol buffers?
Protocol buffers是一个灵活的,高效的,有自动机制(可能指编解码)工具用于序列化结构数据。
类似XML,但是更小、更快、更简单。你定义你的结构化数据,然后就可以使用工具生成的特殊代码方便的使用各种语言(前面支持的三种)
从各种数据流中读写你的结构化数据。你甚至可在不打断已经部署的程序的情况下重新更新你的数据结构(热部署)。

How do they work?

You specify how you want the information you're serializing to be structured by defining protocol buffer message types in .proto files.
Each protocol buffer message is a small logical record of information, containing a series of name-value pairs.
Here's a very basic example of a .proto file that defines a message containing information about a person:

如何工作?
你可以通过.proto文件定义你需要序列化的信息。每个buffer消息是一个逻辑记录,包括一系列名值对。下面是一个关于一个人的信息的.proto文件的例子
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;

enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
}

message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
}

repeated PhoneNumber phone = 4;
}

As you can see, the message format is simple – each message type has one or more uniquely numbered fields,
and each field has a name and a value type, where value types can be numbers (integer or floating-point), booleans, strings, raw bytes,
or even (as in the example above) other protocol buffer message types, allowing you to structure your data hierarchically.
You can specify optional fields, required fields, and repeated fields.
You can find more information about writing .proto files in the Protocol Buffer Language Guide.

象你看到的一样,消息格式很简单:每个消息类型有一个或者多个数据项,每个数据项有一个名字和一个数据类型。数据类型可以是数值(整形或者浮点型),
布尔型,字符串,字节流或者自定义的buffer类型,允许你子架构造数据体系。
你可以指定可选的数据项,必选数据项和重复数据项。关于如何写.proto文件,可以从protocol buffer language指南中得到更多信息。

Once you've defined your messages, you run the protocol buffer compiler for your application's language on your .proto file to generate data access classes.
These provide simple accessors for each field (like query() and set_query()) as well as methods to serialize/parse the whole structure to/from raw bytes –
so, for instance, if your chosen language is C++, running the compiler on the above example will generate a class called Person.
You can then use this class in your application to populate, serialize, and retrieve Person protocol buffer messages.
You might then write some code like this:

定义了消息后,就可以protocol buffer编译器编译,从.proto文件生成数据访问类。(与corba idl类似)
这些类提供了简单的访问数据项的方法,类似query(),set_query()
你可以在你的应用中使用这些类来构造、序列化和取回Person这个protocol buffer消息。你可以写如下代码:
Person person;
person.set_name("John Doe");
person.set_id(1234);
person.set_email("jdoe@example.com");
fstream output("myfile", ios::out | ios::binary);
person.SerializeToOstream(&output);

then, later on, you could read your message back in:
然后,从文件中读回信息

fstream input("myfile", ios::in | ios::binary);
Person person;
person.ParseFromIstream(&input);
cout << "Name: " << person.name() << endl;
cout << "E-mail: " << person.email() << endl;

You can add new fields to your message formats without breaking backwards-compatibility;
old binaries simply ignore the new field when parsing. So if you have a communications protocol that uses protocol buffers as its data format,
you can extend your protocol without having to worry about breaking existing code.

你可以增加数据项,不用考虑前向兼容;旧的代码仅仅是简单的忽略新增的项。
如果你使用protocol buffer作为你的通讯协议,你能够扩展你的协议,不用担心影响已经存在的代码。

You'll find a complete reference for using generated protocol buffer code in the API Reference section,
and you can find out more about how protocol buffer messages are encoded in Protocol Buffer Encoding.

你可在API文档中找到完整的参考资料,并能够了解协议是如何编解码的。

Why not just use XML?

Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

    * are simpler
    * are 3 to 10 times smaller
    * are 20 to 100 times faster
    * are less ambiguous
    * generate data access classes that are easier to use programmatically

For example, let's say you want to model a person with a name and an email. In XML, you need to do:
为什么不使用XML?
protocol buffer有很多XML不具备的优点:
1.简单;
2.小巧:3-10倍
3.效率高:20-100倍
4.无二义性
5.有自动工具生成访问类;(其实ASN.1, CORBA都有类似工具)

例如,Person模型使用xml表示

    John Doe
    jdoe@example.com


while the corresponding protocol buffer message (in protocol buffer text format) is:
对应的protocol文本格式

# Textual representation of a protocol buffer.
# This is *not* the binary format used on the wire.
person {
name: "John Doe"
email: "jdoe@example.com"
}

When this message is encoded to the protocol buffer binary format (the text format above is just a convenient
human-readable representation for debugging and editing), it would probably be 28 bytes long and take around 100-200 nanoseconds to parse.
The XML version is at least 69 bytes if you remove whitespace, and would take around 5,000-10,000 nanoseconds to parse.

当消息编码成二进制格式(上面的说明只是为了编译阅读的表示方式),protocol buffer将差不多28个子节长,用100-200ns时间解析。
而XML文件有69字节长,还要去掉空白符,使用5000-10000ns来解析

Also, manipulating a protocol buffer is much easier:
维护以很容易:

cout << "Name: " << person.name() << endl;
cout << "E-mail: " << person.email() << endl;

Whereas with XML you would have to do something like:
而XML要做如下的事情:

cout << "Name: "
       << person.getElementsByTagName("name")->item(0)->innerText()
       << endl;
cout << "E-mail: "
       << person.getElementsByTagName("email")->item(0)->innerText()
       << endl;

However, protocol buffers are not always a better solution than XML –
for instance, protocol buffers would not be a good way to model a text-based document with markup (e.g. HTML),
since you cannot easily interleave structure with text. In addition, XML is human-readable and human-editable; protocol buffers,
at least in their native format, are not. XML is also – to some extent –
self-describing. A protocol buffer is only meaningful if you have the message definition (the .proto file).

可是,protocol buffers并不是一直都比XML好-例如,protocol buffers不适合描述符号文本,如HTML,因为你不能很好的组织文本。
另外,XML更易于阅读和编辑。protocols buffers也不是自描述的

并且,protocol buffers在google内部已经广泛使用。

--------------------------------------------------------------------------

Google 公布了它自己使用的结构化数据存储描述语言 ,大致地看了下它的文档,可以将 Protocol Buffers 的特点归纳如下:

Protocol Buffers 是一种可用于通讯协议、数据存储等领域的语言无关、平台无关、可扩展的序列化结构数据格式。目前提供了 C++、Java、Python 三种语言的 API。

只需使用 Protocol Buffers 对数据结构进行一次描述,即可利用各种不同语言或从各种不同数据流中对你的结构化数据轻松读写。

Protocol Buffers 可扩展性好,“向后”兼容性好,你甚至不必破坏已部署的、依靠“老”数据格式的程序就可以对数据结构进行升级。

Protocol Buffers 以高效的二进制方式存储,比 XML 小 3 到 10 倍,快 20 到 100 倍,语义更清晰,无需类似 XML 解析器的东西(因为 Protocol Buffers 编译器会将 .proto 文件编译生成对应的数据访问类以对 Protocol Buffers 数据进行序列化、反序列化操作)。

Protocol Buffers 与 XML 相比也有不足之处。由于文本并不适合用来描述数据结构,所以 Protocol Buffers 也不适合用来对基于文本的标记文档(如 HTML)建模。另外,由于 XML 具有某种程度上的自解释性,它可以被人直接读取编辑,在这一点上 Protocol Buffers 不行,它以二进制的方式存储,除非你有 .proto 定义,否则你没法直接读出 Protocol Buffers 的任何内容。

总之,Protocol Buffers 是一种轻便高效的结构化数据存储格式,很适合做数据存储或 RPC 数据交换格式。

阅读(905) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~