The high-level syntax of HEVC contains numerous elements that have been inherited from the NAL of H.264/MPEG-4 AVC. The NAL provides the ability to map the video coding layer (VCL) data that represent the content of the pictures onto various transport layers, including RTP/IP, ISO MP4, and H.222.0/MPEG-2 Systems, and provides a framework for packet loss resilience. For general concepts of the NAL design such as NAL units, parameter sets, access units, the byte stream format, and packetized formatting, please refer to [9]–[11]. HEVC包含多数高级语法都继承自H.264/MPEG-4 AVC。 NAL提供了将表示图像内容的视频编码层(VCL)数据映射到各种传输层的能力, 包括 RTP/IP, ISO MP4, H.222/MPEG-2系统,并且对包丢失后的恢复提供了框架。 对NAL设计的一般概念,如NAL单元,参数集,访问单元,字节流格式,和封装格式, 可以参数引文。
NAL units are classified into VCL and non-VCL NAL units according to whether they contain coded pictures or other associated data, respectively. In the HEVC standard, several VCL NAL unit types identifying categories of pictures for decoder initialization and random-access purposes are included. Table I lists the NAL unit types and their associated meanings and type classes in the HEVC standard. NAL单元依据其是否包含编码图像或其它相关数据分为VCL NAL单元和非VCL NAL装饰。 在HEVC标准中,为了解码器初始化和支持随机访问,定义了多个VCL NAL单元类型标识的图像种类。 具体的定义可以看下面的表
TABLE I NAL Unit Types, Meanings, and Type Classes
The following subsections present a description of the new capabilities supported by the high-level syntax. 下面是高级语法的一些新的特性
A. 随机访问和码流拼接功能 Random Access and Bitstream Splicing Features
The new design supports special features to enable random access and bitstream splicing. In H.264/MPEG-4 AVC, a bitstream must always start with an IDR access unit. An IDR access unit contains an independently coded picture—i.e., a coded picture that can be decoded without decoding any previous pictures in the NAL unit stream. The presence of an IDR access unit indicates that no subsequent picture in the bitstream will require reference to pictures prior to the picture that it contains in order to be decoded. The IDR picture is used within a coding structure known as a closed GOP (in which GOP stands for group of pictures). HEVC支持随机访问和码流拼接; 在H.264/MPEG-4 AVC中,码流必须以一个IDR访问单元开始。 IDR访问单元包含一帧独立的编码图像--编码图像不需要解码NAL单元流中之前的图像就能解码的图像。 出现IDR访问单元就表示本图像之前图像序列都不会用来做参考。 IDR访问单元是编码结构GOP的开始帧。
The new clean random access (CRA) picture syntax specifies the use of an independently coded picture at the location of a random access point (RAP), i.e., a location in a bitstream at which a decoder can begin successfully decoding pictures without needing to decode any pictures that appeared earlier in the bitstream, which supports an efficient temporal coding order known as open GOP operation. Good support of random access is critical for enabling channel switching, seek operations, and dynamic streaming services. Some pictures that follow a CRA picture in decoding order and precede it in display order may contain interpicture prediction references to pictures that are not available at the decoder. These nondecodable pictures must therefore be discarded by a decoder that starts its decoding process at a CRA point. For this purpose, such nondecodable pictures are identified as random access skipped leading (RASL) pictures. HEVC标准定义了新的完全随机访问(CRA)图像语法, 它声明了如何在一个随机访问点(RAP)的位置使用独立编码图像, 它支持一个高效的时域编码顺序,即开放GOP操作; 随机访问点(RAP)的意思是解码器不需要解码码流中任何本图像之前的码流(或图像)就能成功解码 输出图像的码流中的位置; 良好的随机访问支持对频道切换,查找和动态流服务很重要。 在解码顺序上位于CRA图像之后,在显示顺序上先于CRA图像的图像可能包含有帧间预测的参考图像, 它们对解码器都是没有用的。 因此,这些不可解码图像在解码器遇到CRA点时必须被解码器丢弃。 为了实现这个目的,这些不可解码图像都标识为随机访问先行跳过(RASL)图像。
The location of splice points from different original coded bitstreams can be indicated by broken link access (BLA) pictures. A bitstream splicing operation can be performed by simply changing the NAL unit type of a CRA picture in one bitstream to the value that indicates a BLA picture and concatenating the new bitstream at the position of a RAP picture in the other bitstream. A RAP picture may be an IDR, CRA, or BLA picture, and both CRA and BLA pictures may be followed by RASL pictures in the bitstream (depending on the particular value of the NAL unit type used for a BLA picture). 不同原始编码码流的拼接点位置定义为断链访问(BLA)图像。 码流的拼接操作可以简单地将码流中一个CRA图像的NAL单元类型改为BLA图像类型, 并在另一个码流的RAP图像位置进行拼接,从而就生成了新的码流。 RAP图像可以是IDR, CRA, 或BLA图像, 并且CRA和BLA图像都位于码流的RASL图像之后。
Any RASL pictures associated with a BLA picture must always be discarded by the decoder, as they may contain references to pictures that are not actually present in the bitstream due to a splicing operation. The other type of picture that can follow a RAP picture in decoding order and precede it in output order is the random access decodable leading (RADL) picture, which cannot contain references to any pictures that precede the RAP picture in decoding order. RASL and RADL pictures are collectively referred to as leading pictures (LPs). Pictures that follow a RAP picture in both decoding order and output order, which are known as trailing pictures, cannot contain references to LPs for interpicture prediction. 任何和BLA图像相关的RASL图像,当它们包含有由于拼接操作产生的在码流中并不实际出现的参考图像 都必须被解码器丢弃; 在解码顺序上位于RAP图像之后,在显示顺序上位于RAP图像之前其它类型的图像都是 随机访问先行可解码(RADL)图像,这种图像不包含在解码顺序上先于RAP图像的参考图像。 RASL和RADL可以总称为先行图像(LP); 在解码顺序和显示顺序上都位于RAP图像之后,且不包含对LP作参考的图像总称为后继图像;
B.时域子层支持 Temporal Sublayering Support
Fig. 2. Example of a temporal prediction structure and the POC values, decoding order, and RPS content for each picture.
Similar to the temporal scalability feature in the H.264/MPEG-4 AVC scalable video coding (SVC) extension [12], HEVC specifies a temporal identifier in the NAL unit header, which indicates a level in a hierarchical temporal prediction structure. This was introduced to achieve temporal scalability without the need to parse parts of the bitstream other than the NAL unit header. 和H.264/MPEG-4 AVC SVC的时域缩放功能相似,HEVC在NAL单元头中定义了一个时域标识符, 它标识了时域预测结构的层级。 这是为了无需解析NAL单元头之外的部分就实现时域缩放功能而引入的。
Under certain circumstances, the number of decoded temporal sublayers can be adjusted during the decoding process of one coded video sequence. The location of a point in the bitstream at which sublayer switching is possible to begin decoding some higher temporal layers can be indicated by the presence of temporal sublayer access (TSA) pictures and stepwise TSA (STSA) pictures. At the location of a TSA picture, it is possible to switch from decoding a lower temporal sublayer to decoding any higher temporal sublayer, and at the location of an STSA picture, it is possible to switch from decoding a lower temporal sublayer to decoding only one particular higher temporal sublayer (but not the further layers above that, unless they also contain STSA or TSA pictures). 在某些特定环境中,在一个编码视频序列的解码处理过程中可以调整其解码时域子层的个数。 在开始时解码高时域层,然后进行子层切换的位置,定义在时域子层访问(TSA)图像的出现处, 和阶梯TSA图像处; 在TSA图像位置时,可以由低时域子层向高时域子层切换; 在STSA图像位置时,可以由低时域子层向某个特定的高时域子层切换。
C. 附加的参数集 Additional Parameter Sets
The VPS has been added as metadata to describe the overall characteristics of coded video sequences, including the dependences between temporal sublayers. The primary purpose of this is to enable the compatible extensibility of the standard in terms of signaling at the systems layer, e.g., when the base layer of a future extended scalable or multiview bitstream would need to be decodable by a legacy decoder, but for which additional information about the bitstream structure that is only relevant for the advanced decoder would be ignored. VPS作为元数据定义了整个编码视频序列的属性,还包括了时域子层的依赖关系。 附加参数集的目的是为了兼容和扩展系统层的信号机制, 如将来的可伸缩编码扩展,多视角编码扩展等。 通常的解码器可以将这些信息忽略掉。
D. 参考图像集和参考图像列表 Reference Picture Sets and Reference Picture Lists
For multiple-reference picture management, a particular set of previously decoded pictures needs to be present in the decoded picture buffer (DPB) for the decoding of the remainder of the pictures in the bitstream. To identify these pictures, a list of picture order count (POC) identifiers is transmitted in each slice header. The set of retained reference pictures is called the reference picture set (RPS). Fig. 2 shows POC values, decoding order, and RPSs for an example temporal prediction structure. 出于多参考图像的管理的目的,为了解码码流中剩余的图像, HEVC在解码图像buffer(DBP)中引入了一个特殊的已解码图像集。 为了标识这些图像,需要和每个片头一起传输图像顺序计数(POC)标识符列表。 保留的参考图像集称作参考图像集(RPS)。 图2显示了POC值,解码顺序,和一个示例的时域预测结构的RPS。
As in H.264/MPEG-4 AVC, there are two lists that are constructed as lists of pictures in the DPB, and these are called reference picture list 0 and list 1. An index called a reference picture index is used to identify a particular picture in one of these lists. For uniprediction, a picture can be selected from either of these lists. For biprediction, two pictures are selected—one from each list. When a list contains only one picture, the reference picture index implicitly has the value 0 and does not need to be transmitted in the bitstream. 和H.264/MPEG-4 AVC一样, HEVC的DPB中构建了两个图像列表,称为参考图像列表0和参考图像列表1. 另有一个参考图像索引用来标识这两个列表中的特殊图像。 对于单身预测来说,只能选择这两个列表中的一个图像作为参考图像。 对于双向预测来说,可以选择这两个列表中的各一个图像作为参考图像。 当只有一个参考图像列表时,它使用列表0,并且不用在码流中传输;
The high-level syntax for identifying the RPS and establishing the reference picture lists for interpicture prediction is more robust to data losses than in the prior H.264/MPEG-4 AVC design, and is more amenable to such operations as random access and trick mode operation (e.g., fast-forward, smooth rewind, seeking, and adaptive bitstream switching). A key aspect of this improvement is that the syntax is more explicit, rather than depending on inferences from the stored internal state of the decoding process as it decodes the bitstream picture by picture. Moreover, the associated syntax for these aspects of the design is actually simpler than it had been for H.264/MPEG-4 AVC. 标识RPS并建立为帧间预测用的参考图像列表的高级语法相比于H.264/MPEG-4 AVC有更好的数据丢失 后的健壮性,并更好地支撑随机访问,查找模式操作。 这种改进的关键因素在于语法更明确,而不依赖于解码图像过程中存储的状态。 而且,语法之间相关性更加简单。