移动端音视频 VOL.2 – AVFormatContext到读取流

写在前面的话

本期是移动端音视频的第二期，自上期已经足足过了一个月之久，这一个月笔者也没少闲着，自从做完了公司的一个“采用两种不同方式，在混乱的基础代码上，复杂的呈现数据的视图”的需求之后，便看起了关于“读取音视频流”“获取音视频包”的相关内容。所以到现在才开始书写这篇文章。（虽然我也干了其他重要的事，比如买了个烤肉架边喝酒边烤肉吃）

本期将会从创建一些FFmpeg的数据结构开始，一直到读取音视频流。

AVFormatContext

为什么直接把AVFormatContext拿到最开始讲呢，因为敏锐的我察觉到，它将会贯穿整个FFmpeg开发的伊始到终焉。
在FFmpeg的官方文档中，关于AVFormatContext的链接在这里。

简单来说，AVFormatContext用来打开媒体文件或者媒体流，可以从这个结构体中获取想要的数据信息，比如时长，缓存，文件名等等。

首先需要初始化一下ffmpeg：

av_register_all();

1	av_register_all();

然后，我们需要准备一个视频文件。
视频（资源文件）在工程目录下，且已经被包括进 Project - BuildPhases - Copy Bundle Resource 里了。

NSString* contentPath = [[NSBundle mainBundle] pathForResource:@"sm25392237" ofType:@".mp4"];
const char* clangContentPath = [contentPath cStringUsingEncoding:kCFStringEncodingUTF8];

1 2	NSString* contentPath = [[NSBundle mainBundle] pathForResource:@"sm25392237" ofType:@".mp4"]; const char* clangContentPath = [contentPath cStringUsingEncoding:kCFStringEncodingUTF8];

我们一步一步搭取这个读取音视频流的流程。
接下来，需要的类就是AVFormatContext，通过以下代码引入它，并且新建一个指针变量用以指向它（只是它现在还是为NULL）。

#import <libavformat/avformat.h>

// AVFormatContext
AVFormatContext* formatCtx = NULL;

// 初始化AVFormatContext
if(0 != avformat_open_input(&formatCtx, clangContentPath, NULL, NULL)){
    // AVFormatContext初始化失败，需要关闭文件
    avformat_close_input(&formatCtx);
    
    NSLog(@"open file failed.");
}

#import <libavformat/avformat.h>

// AVFormatContext

AVFormatContext* formatCtx = NULL;

// 初始化AVFormatContext

if(0 != avformat_open_input(&formatCtx, clangContentPath, NULL, NULL)){

// AVFormatContext初始化失败，需要关闭文件

avformat_close_input(&formatCtx);

NSLog(@"open file failed.");

}

在这篇文章里，比较好的说明了avformate_open_input和avformate_close_input的引用关系 —— FFmpeg源代码简单分析：avformat_close_input()。

接下来需要为AVFormatContext指定一下探测尺寸。

formatCtx->probesize = 512 * 1024;
formatCtx->max_analyze_duration = 5 * AV_TIME_BASE;

1 2	formatCtx->probesize = 512 * 1024; formatCtx->max_analyze_duration = 5 * AV_TIME_BASE;

并且马上会用到avformat_find_stream_info这个函数，我们先看一下avformat_find_stream_info的定义。

观察到第27行，指定了默认的max_analyze_duration的值为“5*AV_TIME_BASE”，这里的“5*AV_TIME_BASE”代表时间基数，可以先不管，把它当做5s长度就行了。

然后，对于probesize，官方文档中描述道：

Maximum size of the data read from input for determining the input container format.

Demuxing only, set by the caller before avformat_open_input().

Definition at line 1292 of file avformat.h.

Referenced by avformat_find_stream_info(), lavfi_read_header(), and mpegts_read_header().

—— AVFormatContext Struct Reference

并且它代表的是字节数。那么就是搜索，5s，512 x 1024 = 5KB 的长度。

至于决定于哪一个数值？这里这么写到：

which will cause ffmpeg to search until the first of those limits is reached. Note that both of these options must appear on the command line before the specification of the input via -i. For example:

ffmpeg -probesize 50M -analyzeduration 100M -i vts.vob

will search through vts.vob for all streams until it has read 50 MB of data or 100 seconds of video, whichever comes first.

—— FFMPEG An Intermediate Guide/subtitle options

也就是说，先到达了5s，停止，先到达了5KB，也停止。

AVStream & AVCodecContext

设定了探测尺寸后，就可以开始遍历所有的流了，所有的流以AVStream的数据结构存在于AVFormateContext的streams变量之中，共有nb_streams个数据流。
所以这样书写代码：

int streamCount = formatCtx->nb_streams;
for(int i = 0; i < streamCount; i++){
    AVStream* stream = formatCtx->streams[i];
    ...
}

int streamCount = formatCtx->nb_streams;

for(int i = 0; i < streamCount; i++){

AVStream* stream = formatCtx->streams[i];

...

}

而每个AVStream中包含一个codecpar变量，类型为AVCodecParameters，让我们来看看它的部分数据结构：

AVCodecParameters中包含了一些基本信息：

enum AVMediaType codec_type：流的类型；
enum AVCodecID codec_id：编码器ID，是一个标识符；
int format：在视频流中代表像素格式AVPixelFormat，在音频流中代表采样格式AVSampleFormat；
int64_t bit_rate：比特率。
...

而接下来需要把这个codecpar转换为一个标准的AVCodecContext。

【！】原来的AVStream包含一个类型为AVCodecContext的codec变量，可以直接调取，但是已经在ffmpeg3.3中被废弃了。经查，这个原因似乎是因为更好的解耦，以前使用codec代码简洁，但是耦合性比较大，在多线程分开处理解码+封装问题需要考虑互斥问题，但是现在使用codecpar可以单独生成新的AVCodecContext，但是目前只有找到这篇文章提及了这点： ffmpeg3.3新版本AVStream的封装流参数由codec替换codecpar（解码）。

如何将AVCodecParameters转换为AVCodecContext？ffmpeg已经为我们提供一个完美的参数复制的函数：

NSInteger ret = avcodec_parameters_to_context(codecCtx, stream->codecpar);
if(ret < 0){
    NSLog(@"AVCodec paramters to context failed.");
    avcodec_free_context(&codecCtx);
    return;
}

NSInteger ret = avcodec_parameters_to_context(codecCtx, stream->codecpar);

if(ret < 0){

NSLog(@"AVCodec paramters to context failed.");

avcodec_free_context(&codecCtx);

return;

}

至此，我们已经一个得到了一个充满信息的AVCodecContext实例。

接下来还有一步：

av_codec_set_pkt_timebase(codecCtx, stream->time_base);

1	av_codec_set_pkt_timebase(codecCtx, stream->time_base);

但是笔者并不知道为何需要重新设定AVCodecContext的time_base。
有关于av_codec_set_pkt_timebase这个函数：

源代码；
官方文档。

如果在座的各位能让我请教一番便再好不过了。

接下来，我们就可以愉快的获取各个流内的信息啦。

enum AVMediaType mediaType = codecCtx->codec_type;
switch (mediaType) {
    case AVMEDIA_TYPE_AUDIO:{
        int channelCount = codecCtx->channels;
        int duaration = stream->duration * av_q2d(stream->time_base);
        int sampleRate = codecCtx->sample_rate;
        int64_t bitRate = codecCtx->bit_rate;
        
        enum AVCodecID codecID = codecCtx->codec_id;
        const char* codecDesc = avcodec_get_name(codecID);
        
        enum AVSampleFormat sampleFormat = codecCtx->sample_fmt;
        const char* sampleFormatDesc = av_get_sample_fmt_name(sampleFormat);
        
        NSLog(@"%@", [NSString stringWithFormat:@"%d - 读取到音频流，声道数 = %d，时长 = %d s，采样率 = %.1f，比特率 = %d Kbps，编码格式 = %s，采样格式 = %s.",i, channelCount, duaration, sampleRate / 1000.0, (int)(bitRate / 1000.0), codecDesc, sampleFormatDesc]);
    }
        
        break;
    case AVMEDIA_TYPE_VIDEO:{
        int width = codecCtx->width;
        int height = codecCtx->height;
        int duration = stream->duration * av_q2d(stream->time_base);
        int64_t bitRate = codecCtx->bit_rate;
        
        enum AVCodecID codecID = codecCtx->codec_id;
        const char* codecDesc = avcodec_get_name(codecID);
        
        enum AVPixelFormat pixelFormat = codecCtx->pix_fmt;
        const char* pixelFormatDesc = av_get_pix_fmt_name(pixelFormat);
        
        // 这里的avg_frame_rate，r_frame_rate和time_base均采用AVRational这个“被除数/除数”的数据结构
        double fps = 0.04;
        if(stream->avg_frame_rate.den && stream->avg_frame_rate.num){
            fps = av_q2d(stream->avg_frame_rate);
        }
        else if(stream->r_frame_rate.den && stream->r_frame_rate.num){
            fps = av_q2d(stream->r_frame_rate);
        }
        else if(stream->time_base.den && stream->time_base.num){
            fps = 1.0 / av_q2d(stream->time_base);
        }
        
        NSLog(@"%@", [NSString stringWithFormat:@"%d - 读取到视频流，帧宽度 = %d，帧高度 = %d，时长 = %d，比特率 = %d，编码格式 = %s，像素格式 = %s，fps = %.3f fps.",i,width,height,duration,(int)(bitRate / 1000.0), codecDesc, pixelFormatDesc,fps]);
        
    }
        
        break;
    
    case AVMEDIA_TYPE_ATTACHMENT:{
        NSLog(@"%d - 读取到附加信息流.",i);
    }
        
        break;
        
    default:{
        NSLog(@"%d - 读取到其他信息流.",i);
    }
        break;
}

enum AVMediaType mediaType = codecCtx->codec_type;

switch (mediaType) {

case AVMEDIA_TYPE_AUDIO:{

int channelCount = codecCtx->channels;

int duaration = stream->duration * av_q2d(stream->time_base);

int sampleRate = codecCtx->sample_rate;

int64_t bitRate = codecCtx->bit_rate;

enum AVCodecID codecID = codecCtx->codec_id;

const char* codecDesc = avcodec_get_name(codecID);

enum AVSampleFormat sampleFormat = codecCtx->sample_fmt;

const char* sampleFormatDesc = av_get_sample_fmt_name(sampleFormat);

NSLog(@"%@", [NSString stringWithFormat:@"%d - 读取到音频流，声道数 = %d，时长 = %d s，采样率 = %.1f，比特率 = %d Kbps，编码格式 = %s，采样格式 = %s.",i, channelCount, duaration, sampleRate / 1000.0, (int)(bitRate / 1000.0), codecDesc, sampleFormatDesc]);

}

break;

case AVMEDIA_TYPE_VIDEO:{

int width = codecCtx->width;

int height = codecCtx->height;

int duration = stream->duration * av_q2d(stream->time_base);

int64_t bitRate = codecCtx->bit_rate;

enum AVCodecID codecID = codecCtx->codec_id;

const char* codecDesc = avcodec_get_name(codecID);

enum AVPixelFormat pixelFormat = codecCtx->pix_fmt;

const char* pixelFormatDesc = av_get_pix_fmt_name(pixelFormat);

// 这里的avg_frame_rate，r_frame_rate和time_base均采用AVRational这个“被除数/除数”的数据结构

double fps = 0.04;

if(stream->avg_frame_rate.den && stream->avg_frame_rate.num){

fps = av_q2d(stream->avg_frame_rate);

}

else if(stream->r_frame_rate.den && stream->r_frame_rate.num){

fps = av_q2d(stream->r_frame_rate);

}

else if(stream->time_base.den && stream->time_base.num){

fps = 1.0 / av_q2d(stream->time_base);

}

NSLog(@"%@", [NSString stringWithFormat:@"%d - 读取到视频流，帧宽度 = %d，帧高度 = %d，时长 = %d，比特率 = %d，编码格式 = %s，像素格式 = %s，fps = %.3f fps.",i,width,height,duration,(int)(bitRate / 1000.0), codecDesc, pixelFormatDesc,fps]);

}

break;

case AVMEDIA_TYPE_ATTACHMENT:{

NSLog(@"%d - 读取到附加信息流.",i);

}

break;

default:{

NSLog(@"%d - 读取到其他信息流.",i);

}

break;

}

需要注意的是：

这里的r_frame_rate是基本帧率，仅仅是一个猜测，优先级较低，avg_frame_rate是平均帧率，由整个流计算而来，优先级较高。
详情见 —— FFmpeg之ffprobe；
这里的sample_fmt和pix_fmt是分开的，而非AVCodecParameters中仅有一个format变量。

到此，我们流的信息就分析完毕了。
但是！最后别忘了做一些善后工作：

        ...
        // 每次用完AVCodecContext实例，需要清空
        avcodec_free_context(&codecCtx);
    }

    ...
    // 最后需要关闭AVFormateContext打开的文件
    avformat_close_input(&formatCtx);
}

...

// 每次用完AVCodecContext实例，需要清空

avcodec_free_context(&codecCtx);

}

...

// 最后需要关闭AVFormateContext打开的文件

avformat_close_input(&formatCtx);

}

完成的测试用例在这里：

在最后

以上就是本次想要稍微梳理介绍的，通过ffmpeg读取音视频流数据信息的基本步骤。
下次就是到通过多线程机制来进行AVPacket的读取啦。

移动端音视频 VOL.2 – AVFormatContext到读取流