转载自:
编解码器、数据帧、媒体流和容器是数字媒体处理系统的四个基本概念。
首先需要统一术语:
容器/文件(Conainer/File):即特定格式的多媒体文件。
媒体流(Stream):指时间轴上的一段连续数据,如一段声音数据,一段视频数据或一段字幕数据,可以是压缩的,也可以是非压缩的,压缩的数据需要关联特定的编解码器。
数据帧/数据包(Frame/Packet):通常,一个媒体流由大量的数据帧组成,对于压缩数据,帧对应着编解码器的最小处理单元。通常,分属于不同媒体流的数据帧交错复用于容器之中,参见。
编解码器:编解码器以帧为单位实现压缩数据和原始数据之间的相互转换。
在FFMPEG中,使用AVFormatContext、AVStream、AVCodecContext、AVCodec及AVPacket等结构来抽象这些基本要素,它们的关系如下图所示:
这是一个描述编解码器上下文的数据结构,包含了众多编解码器需要的参数信息,如下列出了部分比较重要的域:
如果是单纯使用libavcodec,这部分信息需要调用者进行初始化;如果是使用整个FFMPEG库,这部分信息在调用
avformat_open_input和avformat_find_stream_info的过程中根据文件的头信息及媒体流内的头部信息完成初始
化。其中几个主要域的释义如下:
extradata/extradata_size:这个buffer中存放了解码器可能会用到的额外信息,在av_read_frame中填充。一般来
说,首先,某种具体格式的demuxer在读取格式头信息的时候会填充extradata,其次,如果demuxer没有做这个事情,比如可能在头部压根
儿就没有相关的编解码信息,则相应的parser会继续从已经解复用出来的媒体流中继续寻找。在没有找到任何额外信息的情况下,这个buffer指针为
空。
time_base:
width/height:视频的宽和高。
sample_rate/channels:音频的采样率和信道数目。
sample_fmt: 音频的原始采样格式。
codec_name/codec_type/codec_id/codec_tag:编解码器的信息。
该结构体描述一个媒体流,定义如下:
- typedef struct AVStream {
- int index; /**< stream index in AVFormatContext */
- int id; /**< format-specific stream ID */
- AVCodecContext *codec; /**< codec context */
- /**
- * Real base framerate of the stream.
- * This is the lowest framerate with which all timestamps can be
- * represented accurately (it is the least common multiple of all
- * framerates in the stream). Note, this value is just a
- * For example, if the time base is 1/90000 and all frames have either
- * approximately 3600 or 1800 timer ticks, then r_frame_rate will be 50/1.
- */
- AVRational r_frame_rate;
-
- ......
-
- /**
- * This is the fundamental unit of time (in seconds) in terms
- * of which frame timestamps are represented. For fixed-fps content,
- * time base should be 1/framerate and timestamp increments should be 1.
- */
- AVRational time_base;
-
- ......
-
- /**
- * Decoding: pts of the first frame of the stream, in stream time base.
- * Only set this if you are absolutely 100% sure that the value you set
- * it to really is the pts of the first frame.
- * This may be undefined (AV_NOPTS_VALUE).
- * @note The ASF header does NOT contain a correct start_time the ASF
- * demuxer must NOT set this.
- */
- int64_t start_time;
- /**
- * Decoding: duration of the stream, in stream time base.
- * If a source file does not specify a duration, but does specify
- * a bitrate, this value will be estimated from bitrate and file size.
- */
- int64_t duration;
-
- #if LIBAVFORMAT_VERSION_INT < (53<<16)
- char language[4]; /** ISO 639-2/B 3-letter language code (empty string if undefined) */
- #endif
-
- /* av_read_frame() support */
- enum AVStreamParseType need_parsing;
- struct AVCodecParserContext *parser;
-
- ......
-
- /* av_seek_frame() support */
- AVIndexEntry *index_entries; /**< Only used if the format does not
- support seeking natively. */
- int nb_index_entries;
- unsigned int index_entries_allocated_size;
-
- int64_t nb_frames; ///< number of frames in this stream if known or 0
-
- ......
-
- /**
- * Average framerate
- */
- AVRational avg_frame_rate;
- ......
- } AVStream;
主要域的释义如下,其中大部分域的值可以由avformat_open_input根据文件头的信息确定,缺少的信息需要通过调用avformat_find_stream_info读帧及软解码进一步获取:
index/id:index对应流的索引,这个数字是自动生成的,根据index可以从AVFormatContext::streams表中索引到该流;而id则是流的标识,依赖于具体的容器格式。比如对于MPEG TS格式,id就是pid。
time_base:流的时间基准,是一个实数,该流中媒体数据的pts和dts都将以这个时间基准为粒度。通常,使用av_rescale/av_rescale_q可以实现不同时间基准的转换。
start_time:流的起始时间,以流的时间基准为单位,通常是该流中第一个帧的pts。
duration:流的总时间,以流的时间基准为单位。
need_parsing:对该流parsing过程的控制域。
nb_frames:流内的帧数目。
r_frame_rate/framerate/avg_frame_rate:帧率相关。
codec:指向该流对应的AVCodecContext结构,调用avformat_open_input时生成。
parser:指向该流对应的AVCodecParserContext结构,调用avformat_find_stream_info时生成。。
这个结构体描述了一个媒体文件或媒体流的构成和基本信息,定义如下:
- typedef struct AVFormatContext {
- const AVClass *av_class; /**< Set by avformat_alloc_context. */
- /* Can only be iformat or oformat, not both at the same time. */
- struct AVInputFormat *iformat;
- struct AVOutputFormat *oformat;
- void *priv_data;
- ByteIOContext *pb;
- unsigned int nb_streams;
- AVStream *streams[MAX_STREAMS];
- char filename[1024]; /**< input or output filename */
- /* stream info */
- int64_t timestamp;
- #if LIBAVFORMAT_VERSION_INT < (53<<16)
- char title[512];
- char author[512];
- char copyright[512];
- char comment[512];
- char album[512];
- int year; /**< ID3 year, 0 if none */
- int track; /**< track number, 0 if none */
- char genre[32]; /**< ID3 genre */
- #endif
-
- int ctx_flags; /**< Format-specific flags, see AVFMTCTX_xx */
- /* private data for pts handling (do not modify directly). */
- /** This buffer is only needed when packets were already buffered but
- not decoded, for example to get the codec parameters in MPEG
- streams. */
- struct AVPacketList *packet_buffer;
-
- /** Decoding: position of the first frame of the component, in
- AV_TIME_BASE fractional seconds. NEVER set this value directly:
- It is deduced from the AVStream values. */
- int64_t start_time;
- /** Decoding: duration of the stream, in AV_TIME_BASE fractional
- seconds. Only set this value if you know none of the individual stream
- durations and also dont set any of them. This is deduced from the
- AVStream values if not set. */
- int64_t duration;
- /** decoding: total file size, 0 if unknown */
- int64_t file_size;
- /** Decoding: total stream bitrate in bit/s, 0 if not
- available. Never set it directly if the file_size and the
- duration are known as FFmpeg can compute it automatically. */
- int bit_rate;
-
- /* av_read_frame() support */
- AVStream *cur_st;
- #if LIBAVFORMAT_VERSION_INT < (53<<16)
- const uint8_t *cur_ptr_deprecated;
- int cur_len_deprecated;
- AVPacket cur_pkt_deprecated;
- #endif
-
- /* av_seek_frame() support */
- int64_t data_offset; /** offset of the first packet */
- int index_built;
-
- int mux_rate;
- unsigned int packet_size;
- int preload;
- int max_delay;
-
- #define AVFMT_NOOUTPUTLOOP -1
- #define AVFMT_INFINITEOUTPUTLOOP 0
- /** number of times to loop output in formats that support it */
- int loop_output;
-
- int flags;
- #define AVFMT_FLAG_GENPTS 0x0001 ///< Generate missing pts even if it requires parsing future frames.
- #define AVFMT_FLAG_IGNIDX 0x0002 ///< Ignore index.
- #define AVFMT_FLAG_NONBLOCK 0x0004 ///< Do not block when reading packets from input.
- #define AVFMT_FLAG_IGNDTS 0x0008 ///< Ignore DTS on frames that contain both DTS & PTS
- #define AVFMT_FLAG_NOFILLIN 0x0010 ///< Do not infer any values from other values, just return what is stored in the container
- #define AVFMT_FLAG_NOPARSE 0x0020 ///< Do not use AVParsers, you also must set AVFMT_FLAG_NOFILLIN as the fillin code works on frames and no parsing -> no frames. Also seeking to frames can not work if parsing to find frame boundaries has been disabled
- #define AVFMT_FLAG_RTP_HINT 0x0040 ///< Add RTP hinting to the output file
-
- int loop_input;
- /** decoding: size of data to probe; encoding: unused. */
- unsigned int probesize;
-
- /**
- * Maximum time (in AV_TIME_BASE units) during which the input should
- * be analyzed in avformat_find_stream_info().
- */
- int max_analyze_duration;
-
- const uint8_t *key;
- int keylen;
-
- unsigned int nb_programs;
- AVProgram **programs;
-
- /**
- * Forced video codec_id.
- * Demuxing: Set by user.
- */
- enum CodecID video_codec_id;
- /**
- * Forced audio codec_id.
- * Demuxing: Set by user.
- */
- enum CodecID audio_codec_id;
- /**
- * Forced subtitle codec_id.
- * Demuxing: Set by user.
- */
- enum CodecID subtitle_codec_id;
-
- /**
- * Maximum amount of memory in bytes to use for the index of each stream.
- * If the index exceeds this size, entries will be discarded as
- * needed to maintain a smaller size. This can lead to slower or less
- * accurate seeking (depends on demuxer).
- * Demuxers for which a full in-memory index is mandatory will ignore
- * this.
- * muxing : unused
- * demuxing: set by user
- */
- unsigned int max_index_size;
-
- /**
- * Maximum amount of memory in bytes to use for buffering frames
- * obtained from realtime capture devices.
- */
- unsigned int max_picture_buffer;
-
- unsigned int nb_chapters;
- AVChapter **chapters;
-
- /**
- * Flags to enable debugging.
- */
- int debug;
- #define FF_FDEBUG_TS 0x0001
-
- /**
- * Raw packets from the demuxer, prior to parsing and decoding.
- * This buffer is used for buffering packets until the codec can
- * be identified, as parsing cannot be done without knowing the
- * codec.
- */
- struct AVPacketList *raw_packet_buffer;
- struct AVPacketList *raw_packet_buffer_end;
-
- struct AVPacketList *packet_buffer_end;
-
- AVMetadata *metadata;
-
- /**
- * Remaining size available for raw_packet_buffer, in bytes.
- * NOT PART OF PUBLIC API
- */
- #define RAW_PACKET_BUFFER_SIZE 2500000
- int raw_packet_buffer_remaining_size;
-
- /**
- * Start time of the stream in real world time, in microseconds
- * since the unix epoch (00:00 1st January 1970). That is, pts=0
- * in the stream was captured at this real world time.
- * - encoding: Set by user.
- * - decoding: Unused.
- */
- int64_t start_time_realtime;
- } AVFormatContext;
这是FFMpeg中最为基本的一个结构,是其他所有结构的根,是一个多媒体文件或流的根本抽象。其中:
nb_streams和streams所表示的AVStream结构指针数组包含了所有内嵌媒体流的描述;
iformat和oformat指向对应的demuxer和muxer指针;
pb则指向一个控制底层数据读写的ByteIOContext结构。
start_time和duration是从streams数组的各个AVStream中推断出的多媒体文件的起始时间和长度,以微妙为单位。
通常,这个结构由avformat_open_input在内部创建并以缺省值初始化部分成员。但是,如果调用者希望自己创建该结构,则需要显式为该结构的一些成员置缺省值——如果没有缺省值的话,会导致之后的动作产生异常。以下成员需要被关注:
probesize
mux_rate
packet_size
flags
max_analyze_duration
key
max_index_size
max_picture_buffer
max_delay
AVPacket定义在avcodec.h中,如下:
- typedef struct AVPacket {
- /**
- * Presentation timestamp in AVStream->time_base units; the time at which
- * the decompressed packet will be presented to the user.
- * Can be AV_NOPTS_VALUE if it is not stored in the file.
- * pts MUST be larger or equal to dts as presentation cannot happen before
- * decompression, unless one wants to view hex dumps. Some formats misuse
- * the terms dts and pts/cts to mean something different. Such timestamps
- * must be converted to true pts/dts before they are stored in AVPacket.
- */
- int64_t pts;
- /**
- * Decompression timestamp in AVStream->time_base units; the time at which
- * the packet is decompressed.
- * Can be AV_NOPTS_VALUE if it is not stored in the file.
- */
- int64_t dts;
- uint8_t *data;
- int size;
- int stream_index;
- int flags;
- /**
- * Duration of this packet in AVStream->time_base units, 0 if unknown.
- * Equals next_pts - this_pts in presentation order.
- */
- int duration;
- void (*destruct)(struct AVPacket *);
- void *priv;
- int64_t pos; ///< byte position in stream, -1 if unknown
-
- /**
- * Time difference in AVStream->time_base units from the pts of this
- * packet to the point at which the output from the decoder has converged
- * independent from the availability of previous frames. That is, the
- * frames are virtually identical no matter if decoding started from
- * the very first frame or from this keyframe.
- * Is AV_NOPTS_VALUE if unknown.
- * This field is not the display duration of the current packet.
- *
- * The purpose of this field is to allow seeking in streams that have no
- * keyframes in the conventional sense. It corresponds to the
- * recovery point SEI in H.264 and match_time_delta in NUT. It is also
- * essential for some types of subtitle streams to ensure that all
- * subtitles are correctly displayed after seeking.
- */
- int64_t convergence_duration;
- } AVPacket;
FFMPEG使用AVPacket来暂存解复用之后、解码之前的媒体数据(一个音/视频帧、一个字幕包等)及附加信息(解码时间戳、显示时间戳、时长等)。其中:
dts表示解码时间戳,pts表示显示时间戳,它们的单位是所属媒体流的时间基准。
stream_index给出所属媒体流的索引;
data为数据缓冲区指针,size为长度;
duration为数据的时长,也是以所属媒体流的时间基准为单位;
pos表示该数据在媒体流中的字节偏移量;
destruct为用于释放数据缓冲区的函数指针;
flags为标志域,其中,最低为置1表示该数据是一个关键帧。
AVPacket结构本身只是个容器,它使用data成员引用实际的数据缓冲区。这个缓冲区通常是由av_new_packet创建的,但也可能由
FFMPEG的API创建(如av_read_frame)。当某个AVPacket结构的数据缓冲区不再被使用时,要需要通过调用
av_free_packet释放。av_free_packet调用的是结构体本身的destruct函数,它的值有两种情
况:1)av_destruct_packet_nofree或0;2)av_destruct_packet,其中,情况1)仅仅是将data和
size的值清0而已,情况2)才会真正地释放缓冲区。
FFMPEG内部使用AVPacket结构建立缓冲区装载数据,同时提供destruct函数,如果FFMPEG打算自己维护缓冲区,则将
destruct设为av_destruct_packet_nofree,用户调用av_free_packet清理缓冲区时并不能够将其释放;如果
FFMPEG打算将该缓冲区彻底交给调用者,则将destruct设为av_destruct_packet,表示它能够被释放。安全起见,如果用户希望
自由地使用一个FFMPEG内部创建的AVPacket结构,最好调用av_dup_packet进行缓冲区的克隆,将其转化为缓冲区能够被释放的
AVPacket,以免对缓冲区的不当占用造成异常错误。av_dup_packet会为destruct指针为
av_destruct_packet_nofree的AVPacket新建一个缓冲区,然后将原缓冲区的数据拷贝至新缓冲区,置data的值为新缓冲区
的地址,同时设destruct指针为av_destruct_packet。
阅读(1496) | 评论(0) | 转发(0) |