音视频学习之 – H264编码
2014 年 7 月 8 日
有了前面[ 音视频学习之 – 基础概念 和[ 音视频学习之 – H264结构与码流解析 的基础,这篇文章开始写代码,前面根据 AVFoundation 框架做的采集工作流程就不写了,直接从采集的代理方法**captureOutput: didOutputSampleBuffer: fromConnection:**里开始对视频帧就行编码。大致的流程分为三步:
- 准备编码器,即创建session: VTCompressionSessionCreate ,并设置编码器属性;
- 开始编码: VTCompressionSessionEncodeFrame
- 编码完成的回调里处理数据:添加起始码**”\x00\x00\x00\x01″ ,添加 sps pps**等。
- 结束编码,清除数据,释放资源。
准备编码器
- 创建session : VTCompressionSessionCreate
- 设置属性 :VTSessionSetProperty 是否实时编码输出、是否产生B帧、设置关键帧、设置期望帧率、设置码率、最大码率值等等
- 准备开始编码 :VTCompressionSessionPrepareToEncodeFrames
-(void)initVideoToolBox { // cEncodeQueue是一个串行队列 dispatch_sync(cEncodeQueue, ^{ frameID = 0; int width = 480,height = 640; //创建编码session OSStatus status = VTCompressionSessionCreate(NULL, width, height, kCMVideoCodecType_H264, NULL, NULL, NULL, didCompressH264, (__bridge void *)(self), &cEncodeingSession); NSLog(@"H264:VTCompressionSessionCreate:%d",(int)status); if (status != 0) { NSLog(@"H264:Unable to create a H264 session"); return ; } //设置实时编码输出(避免延迟) VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_RealTime, kCFBooleanTrue); VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_ProfileLevel,kVTProfileLevel_H264_Baseline_AutoLevel); //是否产生B帧(因为B帧在解码时并不是必要的,是可以抛弃B帧的) VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_AllowFrameReordering, kCFBooleanFalse); //设置关键帧(GOPsize)间隔,GOP太小的话图像会模糊 int frameInterval = 10; CFNumberRef frameIntervalRaf = CFNumberCreate(kCFAllocatorDefault, kCFNumberIntType, &frameInterval); VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_MaxKeyFrameInterval, frameIntervalRaf); //设置期望帧率,不是实际帧率 int fps = 10; CFNumberRef fpsRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberIntType, &fps); VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_ExpectedFrameRate, fpsRef); //码率的理解:码率大了话就会非常清晰,但同时文件也会比较大。码率小的话,图像有时会模糊,但也勉强能看 //码率计算公式,参考印象笔记 //设置码率、上限、单位是bps int bitRate = width * height * 3 * 4 * 8; CFNumberRef bitRateRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberSInt32Type, &bitRate); VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_AverageBitRate, bitRateRef); //设置码率,均值,单位是byte int bigRateLimit = width * height * 3 * 4; CFNumberRef bitRateLimitRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberSInt32Type, &bigRateLimit); VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_DataRateLimits, bitRateLimitRef); //准备开始编码 VTCompressionSessionPrepareToEncodeFrames(cEncodeingSession); }); } 复制代码
VTCompressionSessionCreate创建编码对象参数详解:

- allocator :NULL 分配器,设置NULL为默认分配
- width :width
- height :height
- codecType :编码类型,如kCMVideoCodecType_H264
- encoderSpecification :NULL encoderSpecification: 编码规范。设置NULL由videoToolbox自己选择
- sourceImageBufferAttributes :NULL sourceImageBufferAttributes: 源像素缓冲区属性.设置NULL不让videToolbox创建,而自己创建
- compressedDataAllocator :压缩数据分配器.设置NULL,默认的分配
- outputCallback :编码回调 , 当VTCompressionSessionEncodeFrame被调用压缩一次后会被异步调用.这里设置的函数名是 didCompressH264
- outputCallbackRefCon :回调客户定义的参考值,此处把self传过去,因为我们需要在C函数中调用self的方法,而C函数无法直接调self
- compressionSessionOut : 编码会话变量
开始编码
- 拿到未编码的视频帧 : CVImageBufferRef imageBuffer = (CVImageBufferRef)CMSampleBufferGetImageBuffer(sampleBuffer);
- 设置帧时间 :CMTime presentationTimeStamp = CMTimeMake(frameID++, 1000);
- 开始编码 :调用 VTCompressionSessionEncodeFrame进行编码
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection { //开始视频录制,获取到摄像头的视频帧,传入encode 方法中 dispatch_sync(cEncodeQueue, ^{ [self encode:sampleBuffer]; }); } 复制代码
- (void) encode:(CMSampleBufferRef )sampleBuffer { //拿到每一帧未编码数据 CVImageBufferRef imageBuffer = (CVImageBufferRef)CMSampleBufferGetImageBuffer(sampleBuffer); //设置帧时间 CMTime presentationTimeStamp = CMTimeMake(frameID++, 1000); //开始编码 OSStatus statusCode = VTCompressionSessionEncodeFrame(cEncodeingSession, imageBuffer, presentationTimeStamp, kCMTimeInvalid, NULL, NULL, &flags); if (statusCode != noErr) { //编码失败 NSLog(@"H.264:VTCompressionSessionEncodeFrame faild with %d",(int)statusCode); //释放资源 VTCompressionSessionInvalidate(cEncodeingSession); CFRelease(cEncodeingSession); cEncodeingSession = NULL; return; } } 复制代码
VTCompressionSessionEncodeFrame编码函数参数详解:

- session :编码会话变量
- imageBuffer :未编码的数据
- presentationTimeStamp :获取到的这个sample buffer数据的展示时间戳。每一个传给这个session的时间戳都要大于前一个展示时间戳
- duration :对于获取到sample buffer数据,这个帧的展示时间.如果没有时间信息,可设置kCMTimeInvalid.
- frameProperties :包含这个帧的属性.帧的改变会影响后边的编码帧.
- sourceFrameRefcon :回调函数会引用你设置的这个帧的参考值.
- infoFlagsOut :指向一个VTEncodeInfoFlags来接受一个编码操作.如果使用异步运行,kVTEncodeInfo_Asynchronous被设置;同步运行,kVTEncodeInfo_FrameDropped被设置;设置NULL为不想接受这个信息.
编码完成后数据处理
- 判断是否是关键帧 :是的话, CMVideoFormatDescriptionGetH264ParameterSetAtIndex 获取sps和pps信息,并转换为二进制写入文件或者进行上传
- 组装NALU数据 : 获取编码后的h264流数据:CMBlockBufferRef dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer),通过 首地址 、单个长度、 总长度通过dataPointer指针偏移做遍历 OSStatus statusCodeRet = CMBlockBufferGetDataPointer(dataBuffer, 0, &length, &totalLength, &dataPointer); 读取数据时有个大小端模式:网络传输一般都是大端模式
/* 1.H264硬编码完成后,回调VTCompressionOutputCallback 2.将硬编码成功的CMSampleBuffer转换成H264码流,通过网络传播 3.解析出参数集SPS & PPS,加上开始码组装成 NALU。提现出视频数据,将长度码转换为开始码,组成NALU,将NALU发送出去。 */ void didCompressH264(void *outputCallbackRefCon, void *sourceFrameRefCon, OSStatus status, VTEncodeInfoFlags infoFlags, CMSampleBufferRef sampleBuffer) { NSLog(@"didCompressH264 called with status %d infoFlags %d",(int)status,(int)infoFlags); //状态错误 if (status != 0) { return; } //没准备好 if (!CMSampleBufferDataIsReady(sampleBuffer)) { NSLog(@"didCompressH264 data is not ready"); return; } ViewController *encoder = (__bridge ViewController *)outputCallbackRefCon; //判断当前帧是否为关键帧 CFArrayRef array = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, true); CFDictionaryRef dic = CFArrayGetValueAtIndex(array, 0); bool keyFrame = !CFDictionaryContainsKey(dic, kCMSampleAttachmentKey_NotSync); //判断当前帧是否为关键帧 //获取sps & pps 数据 只获取1次,保存在h264文件开头的第一帧中 //sps(sample per second 采样次数/s),是衡量模数转换(ADC)时采样速率的单位 //pps() if (keyFrame) { //图像存储方式,编码器等格式描述 CMFormatDescriptionRef format = CMSampleBufferGetFormatDescription(sampleBuffer); //sps size_t sparameterSetSize,sparameterSetCount; const uint8_t *sparameterSet; OSStatus statusCode = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 0, &sparameterSet, &sparameterSetSize, &sparameterSetCount, 0); if (statusCode == noErr) { //获取pps size_t pparameterSetSize,pparameterSetCount; const uint8_t *pparameterSet; //从第一个关键帧获取sps & pps OSStatus statusCode = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 1, &pparameterSet, &pparameterSetSize, &pparameterSetCount, 0); //获取H264参数集合中的SPS和PPS if (statusCode == noErr) { NSData *sps = [NSData dataWithBytes:sparameterSet length:sparameterSetSize]; NSData *pps = [NSData dataWithBytes:pparameterSet length:pparameterSetSize]; if(encoder) { [encoder gotSpsPps:sps pps:pps]; } } } } CMBlockBufferRef dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer); size_t length,totalLength; char *dataPointer; OSStatus statusCodeRet = CMBlockBufferGetDataPointer(dataBuffer, 0, &length, &totalLength, &dataPointer); if (statusCodeRet == noErr) { size_t bufferOffset = 0; static const int AVCCHeaderLength = 4;//返回的nalu数据前4个字节不是001的startcode,而是大端模式的帧长度length //循环获取nalu数据 while (bufferOffset < totalLength - AVCCHeaderLength) { uint32_t NALUnitLength = 0; //读取 一单元长度的 nalu memcpy(&NALUnitLength, dataPointer + bufferOffset, AVCCHeaderLength); //从大端模式转换为系统端模式 NALUnitLength = CFSwapInt32BigToHost(NALUnitLength); //获取nalu数据 NSData *data = [[NSData alloc]initWithBytes:(dataPointer + bufferOffset + AVCCHeaderLength) length:NALUnitLength]; //将nalu数据写入到文件 [encoder gotEncodedData:data isKeyFrame:keyFrame]; //move to the next NAL unit in the block buffer //读取下一个nalu 一次回调可能包含多个nalu数据 bufferOffset += AVCCHeaderLength + NALUnitLength; } } } //第一帧写入 sps & pps - (void)gotSpsPps:(NSData*)sps pps:(NSData*)pps { const char bytes[] = "\x00\x00\x00\x01"; size_t length = (sizeof bytes) - 1; // 最后一位是\0结束符 NSData *ByteHeader = [NSData dataWithBytes:bytes length:length]; [fileHandele writeData:ByteHeader]; [fileHandele writeData:sps]; [fileHandele writeData:ByteHeader]; [fileHandele writeData:pps]; } - (void)gotEncodedData:(NSData*)data isKeyFrame:(BOOL)isKeyFrame { if (fileHandele != NULL) { //添加4个字节的H264 协议 start code 分割符 //一般来说编码器编出的首帧数据为PPS & SPS //H264编码时,在每个NAL前添加起始码 0x00000001,解码器在码流中检测起始码,当前NAL结束。 const char bytes[] ="\x00\x00\x00\x01"; //长度 size_t length = (sizeof bytes) - 1; //头字节 NSData *ByteHeader = [NSData dataWithBytes:bytes length:length]; //写入头字节 [fileHandele writeData:ByteHeader]; //写入H264数据 [fileHandele writeData:data]; } } 复制代码
结束编码
-(void)endVideoToolBox { VTCompressionSessionCompleteFrames(cEncodeingSession, kCMTimeInvalid); VTCompressionSessionInvalidate(cEncodeingSession); CFRelease(cEncodeingSession); cEncodeingSession = NULL; } 复制代码