您当前位置：首页 > php开源 > 综合技术 > 详解如何使用代码进行音频合成

详解如何使用代码进行音频合成

来源：程序员人生发布时间：2016-07-26 13:18:42 阅读次数：2992次

作者：郑童宇
GitHub：https://github.com/CrazyZty

1.前言

　　音频合成在现实生活中利用广泛，在网上可以搜索到很多相干的讲授和代码实现，但个人感觉在网上搜索到的音频合成相干文章的讲授都并不是10分透彻，故而写下本篇博文，计划通过讲授如何使用代码实现音频合成功能从而将本人对音频合成的理解论述给各位，力图读完的各位可以对音频合成整体进程有1个清晰的了解。
　　本篇博文以Java为示例语言，以Android为示例平台。
　　本篇博文着力于讲授音频合成实现原理与进程中的细节和潜伏问题，目的是让各位不被编码语言所限制，在本质上理解如何实现音频合成的功能。

2.音频合成

2.1.功能简介

　　本次实现的音频合成功能参考"唱吧"的音频合成，功能流程是：录音生成PCM文件，接着根据录音时长对背景音乐文件进行解码加裁剪，同时将解码后的音频调制到与录音文件相同的采样率，采样点字节数，声道数，接着根据指定系数对两个音频文件进行音量调理并合成为PCM文件，最落后行紧缩编码生成MP3文件。

2.2.功能实现

2.2.1.录音

　　录音功能生成的目标音频格式是PCM格式，对PCM的定义，维基百科上是这么写到的："Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, Compact Discs, digital telephony and other digital audio applications. In a PCM stream, the amplitude of the analog signal is sampled regularly at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps."，大致意思是PCM是用来采样摹拟信号的1种方法，是现在数字音频利用中数字音频的标准格式，而PCM采样的原理，是均匀间隔的将摹拟信号的振幅量化成指定数据范围内最贴近的数值。
　　PCM文件存储的数据是不经紧缩的纯音频数据，固然只是这么说可能有些抽象，我们拉上大家熟知的MP3文件进行对照，MP3文件存储的是紧缩后的音频，PCM与MP3二者之间的关系简单说就是：PCM文件经过MP3紧缩算法处理后生成的文件就是MP3文件。我们简单比较1下双方存储所消耗的空间，1分钟的每采样点16位的双声道的44.1kHz采样率PCM文件大小为：1*60*16/8*2*44.1*1000/1024=10335.9375KB，约为10MB，而对应的128kps的MP3文件大小仅为1MB左右，既然PCM文件占用存储空间这么大，我们是否是应当放弃使用PCM格式存储录音，恰恰相反，注意第1句话:"PCM文件存储的数据是不经紧缩的纯音频数据"，这意味只有PCM格式的音频数据是可以用来直接进行声音处理，例如进行音量调理，声音滤镜等操作，相对的其他的音频编码格式都是必须解码后才能进行处理（PCM编码的WAV文件也得先读取文件头），固然这不代表PCM文件就好用，由于没有文件头，所以进行处理或播放之前我们必须事前知道PCM文件的声道数，采样点字节数，采样率，编码大小端，这在大多数情况下都是不可能的，事实上就我所知没有播放器是直接支持PCM文件的播放。不过现在录音的各项系数都是我们定义的，所以我们就不用担心这个问题。
　　背景知识了解这些就足够了，下面我给出实现代码，综合代码讲授实现进程。

if (recordVoice) { audioRecord = new AudioRecord(MediaRecorder.AudioSource.MIC, Constant.RecordSampleRate, AudioFormat.CHANNEL_IN_MONO, pcmFormat.getAudioFormat(), audioRecordBufferSize); try { audioRecord.startRecording(); } catch (Exception e) { NoRecordPermission(); continue; } BufferedOutputStream bufferedOutputStream = FileFunction .GetBufferedOutputStreamFromFile(recordFileUrl); while (recordVoice) { int audioRecordReadDataSize = audioRecord.read(audioRecordBuffer, 0, audioRecordBufferSize); if (audioRecordReadDataSize > 0) { calculateRealVolume(audioRecordBuffer, audioRecordReadDataSize); if (bufferedOutputStream != null) { try { byte[] outputByteArray = CommonFunction .GetByteBuffer(audioRecordBuffer, audioRecordReadDataSize, Variable.isBigEnding); bufferedOutputStream.write(outputByteArray); } catch (IOException e) { e.printStackTrace(); } } } else { NoRecordPermission(); continue; } } if (bufferedOutputStream != null) { try { bufferedOutputStream.close(); } catch (Exception e) { LogFunction.error("关闭录音输出数据流异常", e); } } audioRecord.stop(); audioRecord.release(); audioRecord = null; }

　　录音的实际实现和控制代码较多，在此仅抽出核心的录音代码进行讲授。在此为获得录音的原始数据，我使用了Android原生的AudioRecord，其他的平台基本也会提供类似的工具类。这段代码实现的功能是当录音开始后，利用会根据设定的采样率和声道数和采样字节数来不断从MIC中获得原始的音频数据，然后将获得的音频数据写入到指定文件中，直至录音结束。这段代码逻辑比较清晰的，我就不过量讲授了。
　　潜伏问题的话，手机平台上是需要申请录音权限的，如果没有录音权限就没法生成正确的录音文件。

2.2.2.解码与裁剪背景音乐

　　如前文所说，除PCM格式之外的所有音频编码格式的音频都必须解码后才可以处理，因此要让背景音乐参与合成必须事前对背景音乐进行解码，同时为减少合成的MP3文件的大小，需要根据录音时长对解码的音频文件进行裁剪。本节不会详细解释解码算法，由于每一个平台都会有对应封装的工具类，直接使用便可。
　　背景知识先讲这些，本次功能实现进程中的潜伏问题较多，下面我给出实现代码，综合代码讲授实现进程。

private boolean decodeMusicFile(String musicFileUrl, String decodeFileUrl, int startSecond, int endSecond, Handler handler, DecodeOperateInterface decodeOperateInterface) { int sampleRate = 0; int channelCount = 0; long duration = 0; String mime = null; MediaExtractor mediaExtractor = new MediaExtractor(); MediaFormat mediaFormat = null; MediaCodec mediaCodec = null; try { mediaExtractor.setDataSource(musicFileUrl); } catch (Exception e) { LogFunction.error("设置解码音频文件路径毛病", e); return false; } mediaFormat = mediaExtractor.getTrackFormat(0); sampleRate = mediaFormat.containsKey(MediaFormat.KEY_SAMPLE_RATE) ? mediaFormat.getInteger(MediaFormat.KEY_SAMPLE_RATE) : 44100; channelCount = mediaFormat.containsKey(MediaFormat.KEY_CHANNEL_COUNT) ? mediaFormat.getInteger(MediaFormat.KEY_CHANNEL_COUNT) : 1; duration = mediaFormat.containsKey(MediaFormat.KEY_DURATION) ? mediaFormat.getLong(MediaFormat.KEY_DURATION) : 0; mime = mediaFormat.containsKey(MediaFormat.KEY_MIME) ? mediaFormat.getString(MediaFormat.KEY_MIME) : ""; LogFunction.log("歌曲信息", "Track info: mime:" + mime + " 采样率sampleRate:" + sampleRate + " channels:" + channelCount + " duration:" + duration); if (CommonFunction.isEmpty(mime) || !mime.startsWith("audio/")) { LogFunction.error("解码文件不是音频文件", "mime:" + mime); return false; } if (mime.equals("audio/ffmpeg")) { mime = "audio/mpeg"; mediaFormat.setString(MediaFormat.KEY_MIME, mime); } try { mediaCodec = MediaCodec.createDecoderByType(mime); mediaCodec.configure(mediaFormat, null, null, 0); } catch (Exception e) { LogFunction.error("解码器configure出错", e); return false; } getDecodeData(mediaExtractor, mediaCodec, decodeFileUrl, sampleRate, channelCount, startSecond, endSecond, handler, decodeOperateInterface); return true; }

　　decodeMusicFile方法的代码主要功能是获得背景音乐信息，初始化解码器，最后调用getDecodeData方法正式开始对背景音乐进行处理。
　　代码中使用了Android原生工具类作为解码器，事实上作为原生的解码器，我也遇到过兼容性问题不能不做了1些相应的处理，不能不抱怨1句不同的Android定制系统实在是致使了太多的兼容性问题。

private void getDecodeData(MediaExtractor mediaExtractor, MediaCodec mediaCodec, String decodeFileUrl, int sampleRate, int channelCount, int startSecond, int endSecond, Handler handler, final DecodeOperateInterface decodeOperateInterface) { boolean decodeInputEnd = false; boolean decodeOutputEnd = false; int sampleDataSize; int inputBufferIndex; int outputBufferIndex; int byteNumber; long decodeNoticeTime = System.currentTimeMillis(); long decodeTime; long presentationTimeUs = 0; final long timeOutUs = 100; final long startMicroseconds = startSecond * 1000 * 1000; final long endMicroseconds = endSecond * 1000 * 1000; ByteBuffer[] inputBuffers; ByteBuffer[] outputBuffers; ByteBuffer sourceBuffer; ByteBuffer targetBuffer; MediaFormat outputFormat = mediaCodec.getOutputFormat(); MediaCodec.BufferInfo bufferInfo; byteNumber = (outputFormat.containsKey("bit-width") ? outputFormat.getInteger("bit-width") : 0) / 8; mediaCodec.start(); inputBuffers = mediaCodec.getInputBuffers(); outputBuffers = mediaCodec.getOutputBuffers(); mediaExtractor.selectTrack(0); bufferInfo = new MediaCodec.BufferInfo(); BufferedOutputStream bufferedOutputStream = FileFunction .GetBufferedOutputStreamFromFile(decodeFileUrl); while (!decodeOutputEnd) { if (decodeInputEnd) { return; } decodeTime = System.currentTimeMillis(); if (decodeTime - decodeNoticeTime > Constant.OneSecond) { final int decodeProgress = (int) ((presentationTimeUs - startMicroseconds) * Constant.NormalMaxProgress / endMicroseconds); if (decodeProgress > 0) { handler.post(new Runnable() { @Override public void run() { decodeOperateInterface.updateDecodeProgress(decodeProgress); } }); } decodeNoticeTime = decodeTime; } try { inputBufferIndex = mediaCodec.dequeueInputBuffer(timeOutUs); if (inputBufferIndex >= 0) { sourceBuffer = inputBuffers[inputBufferIndex]; sampleDataSize = mediaExtractor.readSampleData(sourceBuffer, 0); if (sampleDataSize < 0) { decodeInputEnd = true; sampleDataSize = 0; } else { presentationTimeUs = mediaExtractor.getSampleTime(); } mediaCodec.queueInputBuffer(inputBufferIndex, 0, sampleDataSize, presentationTimeUs, decodeInputEnd ? MediaCodec.BUFFER_FLAG_END_OF_STREAM : 0); if (!decodeInputEnd) { mediaExtractor.advance(); } } else { LogFunction.error("inputBufferIndex", "" + inputBufferIndex); } // decode to PCM and push it to the AudioTrack player outputBufferIndex = mediaCodec.dequeueOutputBuffer(bufferInfo, timeOutUs); if (outputBufferIndex < 0) { switch (outputBufferIndex) { case MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED: outputBuffers = mediaCodec.getOutputBuffers(); LogFunction.error("MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED", "[AudioDecoder]output buffers have changed."); break; case MediaCodec.INFO_OUTPUT_FORMAT_CHANGED: outputFormat = mediaCodec.getOutputFormat(); sampleRate = outputFormat.containsKey(MediaFormat.KEY_SAMPLE_RATE) ? outputFormat.getInteger(MediaFormat.KEY_SAMPLE_RATE) : sampleRate; channelCount = outputFormat.containsKey(MediaFormat.KEY_CHANNEL_COUNT) ? outputFormat.getInteger(MediaFormat.KEY_CHANNEL_COUNT) : channelCount; byteNumber = (outputFormat.containsKey("bit-width") ? outputFormat.getInteger("bit-width") : 0) / 8; LogFunction.error("MediaCodec.INFO_OUTPUT_FORMAT_CHANGED", "[AudioDecoder]output format has changed to " + mediaCodec.getOutputFormat()); break; default: LogFunction.error("error", "[AudioDecoder] dequeueOutputBuffer returned " + outputBufferIndex); break; } continue; } targetBuffer = outputBuffers[outputBufferIndex]; byte[] sourceByteArray = new byte[bufferInfo.size]; targetBuffer.get(sourceByteArray); targetBuffer.clear(); mediaCodec.releaseOutputBuffer(outputBufferIndex, false); if ((bufferInfo.flags & MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) { decodeOutputEnd = true; } if (sourceByteArray.length > 0 && bufferedOutputStream != null) { if (presentationTimeUs < startMicroseconds) { continue; } byte[] convertByteNumberByteArray = ConvertByteNumber(byteNumber, Constant.RecordByteNumber, sourceByteArray); byte[] resultByteArray = ConvertChannelNumber(channelCount, Constant.RecordChannelNumber, Constant.RecordByteNumber, convertByteNumberByteArray); try { bufferedOutputStream.write(resultByteArray); } catch (Exception e) { LogFunction.error("输出解压音频数据异常", e); } } if (presentationTimeUs > endMicroseconds) { break; } } catch (Exception e) { LogFunction.error("getDecodeData异常", e); } } if (bufferedOutputStream != null) { try { bufferedOutputStream.close(); } catch (IOException e) { LogFunction.error("关闭bufferedOutputStream异常", e); } } if (sampleRate != Constant.RecordSampleRate) { Resample(sampleRate, decodeFileUrl); } if (mediaCodec != null) { mediaCodec.stop(); mediaCodec.release(); } if (mediaExtractor != null) { mediaExtractor.release(); } }

　　getDecodeData方法是此次的进行解码和裁剪的核心，方法的传入参数中mediaExtractor，mediaCodec用以实际控制处理背景音乐的音频数据，decodeFileUrl用以指明解码和裁剪后的PCM文件的存储地址，sampleRate，channelCount分别用以指明背景音乐的采样率，声道数，startSecond用以指明裁剪背景音乐的开始时间，目前功能中默许为0，endSecond用以指明裁剪背景音乐的结束时间，数值大小由录音时长直接决定。
　　getDecodeData方法中通过不断通过mediaCodec读入背景音乐原始数据进行处理，然后解码输出到buffer从而获得解码后的数据，由于mediaCodec的读取解码方法和平台相干就不过量描写，在解码进程中通过startSecond与endSecond来控制解码后音频数据输出的开始与结束。
　　解码和裁剪根据上文的描写是比较简单的，通过平台提供的工具类解码背景音乐数据，然后通过变量裁剪出指定长度的解码后音频数据输出到外文件，这1个流程结束功能就实现了，但在进程中存在几个潜伏问题点。
　　首先，要进行合成处理的话，我们必须要保证录音文件和解码后文件的采样率，采样点字节数，和声道数相同，由于录音文件的这3项系数已固定，所以我们必须对解码的音频数据进行处理以保证终究生成的解码文件3项系数和录音文件1致。在http://blog.csdn.net/ownwell/article/details/8114121/，我们可以了解PCM文件常见的4种存储格式。
　　格式字节1 字节2 字节3 字节4
　　8位单声道 0声道 0声道 0声道 0声道
　　8位双声道 0声道(左) 1声道(右) 0声道(左) 1声道(右)
　　16位单声道 0声道(低) 0声道(高) 0声道(低) 0声道(高)
　　16位双声道 0声道(左，低字节) 0声道(左，高字节) 1声道(右，低字节) 1声道(右，高字节)
　　了解这些知识后，我们就能够知道如何编码以将已知格式的音频数据转化到另外一采样点字节数和声道数。
　　getDecodeData方法中146行调用的ConvertByteNumber方法是通过处理音频数据以保证解码后音频文件和录音文件采样点字节数相同。

private static byte[] ConvertByteNumber(int sourceByteNumber, int outputByteNumber, byte[] sourceByteArray) { if (sourceByteNumber == outputByteNumber) { return sourceByteArray; } int sourceByteArrayLength = sourceByteArray.length; byte[] byteArray; switch (sourceByteNumber) { case 1: switch (outputByteNumber) { case 2: byteArray = new byte[sourceByteArrayLength * 2]; byte resultByte[]; for (int index = 0; index < sourceByteArrayLength; index += 1) { resultByte = CommonFunction.GetBytes((short) (sourceByteArray[index] * 256), Variable.isBigEnding); byteArray[2 * index] = resultByte[0]; byteArray[2 * index + 1] = resultByte[1]; } return byteArray; } break; case 2: switch (outputByteNumber) { case 1: int outputByteArrayLength = sourceByteArrayLength / 2; byteArray = new byte[outputByteArrayLength]; for (int index = 0; index < outputByteArrayLength; index += 1) { byteArray[index] = (byte) (CommonFunction.GetShort(sourceByteArray[2 * index], sourceByteArray[2 * index + 1], Variable.isBigEnding) / 256); } return byteArray; } break; } return sourceByteArray; }

　　ConvertByteNumber方法的参数中sourceByteNumber代表背景音乐文件采样点字节数，outputByteNumber代表录音文件采样点字节数，二者如果相同就不处理，不相同则根据背景音乐文件采样点字节数进行不同的处理，本方法只对单字节存储和双字节存储进行了处理，欢迎在各位Github上填充其他采样点字节数的处理方法，
　　getDecodeData方法中149行调用的ConvertChannelNumber方法是通过处理音频数据以保证解码后音频文件和录音文件声道数相同。

private static byte[] ConvertChannelNumber(int sourceChannelCount, int outputChannelCount, int byteNumber, byte[] sourceByteArray) { if (sourceChannelCount == outputChannelCount) { return sourceByteArray; } switch (byteNumber) { case 1: case 2: break; default: return sourceByteArray; } int sourceByteArrayLength = sourceByteArray.length; byte[] byteArray; switch (sourceChannelCount) { case 1: switch (outputChannelCount) { case 2: byteArray = new byte[sourceByteArrayLength * 2]; byte firstByte; byte secondByte; switch (byteNumber) { case 1: for (int index = 0; index < sourceByteArrayLength; index += 1) { firstByte = sourceByteArray[index]; byteArray[2 * index] = firstByte; byteArray[2 * index + 1] = firstByte; } break; case 2: for (int index = 0; index < sourceByteArrayLength; index += 2) { firstByte = sourceByteArray[index]; secondByte = sourceByteArray[index + 1]; byteArray[2 * index] = firstByte; byteArray[2 * index + 1] = secondByte; byteArray[2 * index + 2] = firstByte; byteArray[2 * index + 3] = secondByte; } break; } return byteArray; } break; case 2: switch (outputChannelCount) { case 1: int outputByteArrayLength = sourceByteArrayLength / 2; byteArray = new byte[outputByteArrayLength]; switch (byteNumber) { case 1: for (int index = 0; index < outputByteArrayLength; index += 2) { short averageNumber = (short) ((short) sourceByteArray[2 * index] + (short) sourceByteArray[2 * index + 1]); byteArray[index] = (byte) (averageNumber >> 1); } break; case 2: for (int index = 0; index < outputByteArrayLength; index += 2) { byte resultByte[] = CommonFunction.AverageShortByteArray(sourceByteArray[2 * index], sourceByteArray[2 * index + 1], sourceByteArray[2 * index + 2], sourceByteArray[2 * index + 3], Variable.isBigEnding); byteArray[index] = resultByte[0]; byteArray[index + 1] = resultByte[1]; } break; } return byteArray; } break; } return sourceByteArray; }

　　ConvertChannelNumber方法的参数中sourceChannelNumber代表背景音乐文件声道数，outputChannelNumber代表录音文件声道数，二者如果相同就不处理，不相同则根据声道数和采样点字节数进行不同的处理，本方法只对单双通道进行了处理，欢迎在Github上填充立体声等声道的处理方法。
　　getDecodeData方法中176行调用的Resample方法是用以处理音频数据以保证解码后音频文件和录音文件采样率相同。

private static void Resample(int sampleRate, String decodeFileUrl) { String newDecodeFileUrl = decodeFileUrl + "new"; try { FileInputStream fileInputStream = new FileInputStream(new File(decodeFileUrl)); FileOutputStream fileOutputStream = new FileOutputStream(new File(newDecodeFileUrl)); new SSRC(fileInputStream, fileOutputStream, sampleRate, Constant.RecordSampleRate, Constant.RecordByteNumber, Constant.RecordByteNumber, 1, Integer.MAX_VALUE, 0, 0, true); fileInputStream.close(); fileOutputStream.close(); FileFunction.RenameFile(newDecodeFileUrl, decodeFileUrl); } catch (IOException e) { LogFunction.error("关闭bufferedOutputStream异常", e); } }

　　为了修改采样率，在此使用了SSRC在Java真个实现，在网上可以搜到1份关于SSRC的介绍："SSRC = Synchronous Sample Rate Converter，同步采样率转换，直白地说就是只能做整数倍频，不支持任意频率之间的转换，比如44.1KHz<->48KHz。"，但不同的SSRC实现原理有所不同，我是用的是来自https://github.com/shibatch/SSRC在Java真个实现，简单读了此SSRC在Java端实现的源码，其代码实现中通过辨别重采样前后采样率的最大公约数是不是满足设定条件作为是不是可重采样的根据，可以支持常见的非整数倍频率的采样率转化，如44.1khz<->48khz，但如果目标采样率是比较特殊的采样率如某1较大的质数，那就没法支稳重采样。
　　至此，Resample，ConvertByteNumber，ConvertChannelNumber3个方法的处理保证了解码后文件和录音文件的采样率，采样点字节数，和声道数相同。
　　接着，此处潜伏的第2个问题就是大小端存储。对计算机体系结构有所了解的同学肯定了解"大小端"这个概念，大小端分别代表了多字节数据在内存中组织的两种不同顺序，如果对"大小端"不是太了解，可以阅读http://blog.jobbole.com/102432/的论述，在处理音频数据的方法中，我们可以看到"Variable.isBigEnding"这个参数，这个参数的含义就是当前平台是不是使用大端编码，这里大家肯定会有疑问，内存中多字节数据的组织顺序为何会影响我们对音频数据的处理，举个例子，如果我们在将采样点8位的音频数据转化为采样点16位，目前的做法是将原始数据乘以256，相当于每个byte转化为short，同时short的高字节为原byte的内容，低字节为0，那现在问题来了，那就是高字节放到高地址还是低地址，这就和平台采取的大小端存储格式息息相干了，固然如果我们输出的数据类型是short那就不用关心，Java会帮我们处理掉，但我们输出的是byte数组，这就需要我们自己对数据进行处理了。
　　这是1个很容易忽视的问题，由于正常情况下的软件开发进程中我们基本是不用关心大小真个问题的，但在这里必须对大小真个情况进行处理，不然会出现在某些平台合成的音频没法播放的情况。

2.2.3.合成与输出

　　录音和对背景音乐的处理结束了，接下来就是最后的合成了，对合成我们脑海中显现最多的会是甚么？相加，对没错，音频合成其实不神秘，音频合成的本质就是相同系数的音频文件之间数据的加和，固然现实中的合成常常并不是如此简单，在网上搜索"混音算法"，我们可以看到大量精深的音频合成算法，但就目前而言，我们没必要实现复杂的混音算法，只要让两个音频文件的原始音频数据相加便可，不过为了让我们的合成看上去略微有1些技术含量，此次提供的音频合成方法中允许任意音频文件相对另外一音频文件进行时间上的偏移，并可以通过两个权重数据进行音量调理。下面我就给出具体代码吧，讲授如何实现。

public static void ComposeAudio(String firstAudioFilePath, String secondAudioFilePath, String composeAudioFilePath, boolean deleteSource, float firstAudioWeight, float secondAudioWeight, int audioOffset, final ComposeAudioInterface composeAudioInterface) { boolean firstAudioFinish = false; boolean secondAudioFinish = false; byte[] firstAudioByteBuffer; byte[] secondAudioByteBuffer; byte[] mp3Buffer; short resultShort; short[] outputShortArray; int index; int firstAudioReadNumber; int secondAudioReadNumber; int outputShortArrayLength; final int byteBufferSize = 1024; firstAudioByteBuffer = new byte[byteBufferSize]; secondAudioByteBuffer = new byte[byteBufferSize]; mp3Buffer = new byte[(int) (7200 + (byteBufferSize * 1.25))]; outputShortArray = new short[byteBufferSize / 2]; Handler handler = new Handler(Looper.getMainLooper()); FileInputStream firstAudioInputStream = FileFunction.GetFileInputStreamFromFile(firstAudioFilePath); FileInputStream secondAudioInputStream = FileFunction.GetFileInputStreamFromFile(secondAudioFilePath); FileOutputStream composeAudioOutputStream = FileFunction.GetFileOutputStreamFromFile(composeAudioFilePath); LameUtil.init(Constant.RecordSampleRate, Constant.LameBehaviorChannelNumber, Constant.BehaviorSampleRate, Constant.LameBehaviorBitRate, Constant.LameMp3Quality); try { while (!firstAudioFinish && !secondAudioFinish) { index = 0; if (audioOffset < 0) { secondAudioReadNumber = secondAudioInputStream.read(secondAudioByteBuffer); outputShortArrayLength = secondAudioReadNumber / 2; for (; index < outputShortArrayLength; index++) { resultShort = CommonFunction.GetShort(secondAudioByteBuffer[index * 2], secondAudioByteBuffer[index * 2 + 1], Variable.isBigEnding); outputShortArray[index] = (short) (resultShort * secondAudioWeight); } audioOffset += secondAudioReadNumber; if (secondAudioReadNumber < 0) { secondAudioFinish = true; break; } if (audioOffset >= 0) { break; } } else { firstAudioReadNumber = firstAudioInputStream.read(firstAudioByteBuffer); outputShortArrayLength = firstAudioReadNumber / 2; for (; index < outputShortArrayLength; index++) { resultShort = CommonFunction.GetShort(firstAudioByteBuffer[index * 2], firstAudioByteBuffer[index * 2 + 1], Variable.isBigEnding); outputShortArray[index] = (short) (resultShort * firstAudioWeight); } audioOffset -= firstAudioReadNumber; if (firstAudioReadNumber < 0) { firstAudioFinish = true; break; } if (audioOffset <= 0) { break; } } if (outputShortArrayLength > 0) { int encodedSize = LameUtil.encode(outputShortArray, outputShortArray, outputShortArrayLength, mp3Buffer); if (encodedSize > 0) { composeAudioOutputStream.write(mp3Buffer, 0, encodedSize); } } } handler.post(new Runnable() { @Override public void run() { if (composeAudioInterface != null) { composeAudioInterface.updateComposeProgress(20); } } }); while (!firstAudioFinish || !secondAudioFinish) { index = 0; firstAudioReadNumber = firstAudioInputStream.read(firstAudioByteBuffer); secondAudioReadNumber = secondAudioInputStream.read(secondAudioByteBuffer); int minAudioReadNumber = Math.min(firstAudioReadNumber, secondAudioReadNumber); int maxAudioReadNumber = Math.max(firstAudioReadNumber, secondAudioReadNumber); if (firstAudioReadNumber < 0) { firstAudioFinish = true; } if (secondAudioReadNumber < 0) { secondAudioFinish = true; } int halfMinAudioReadNumber = minAudioReadNumber / 2; outputShortArrayLength = maxAudioReadNumber / 2; for (; index < halfMinAudioReadNumber; index++) { resultShort = CommonFunction.WeightShort(firstAudioByteBuffer[index * 2], firstAudioByteBuffer[index * 2 + 1], secondAudioByteBuffer[index * 2], secondAudioByteBuffer[index * 2 + 1], firstAudioWeight, secondAudioWeight, Variable.isBigEnding); outputShortArray[index] = resultShort; } if (firstAudioReadNumber != secondAudioReadNumber) { if (firstAudioReadNumber > secondAudioReadNumber) { for (; index < outputShortArrayLength; index++) { resultShort = CommonFunction.GetShort(firstAudioByteBuffer[index * 2], firstAudioByteBuffer[index * 2 + 1], Variable.isBigEnding); outputShortArray[index] = (short) (resultShort * firstAudioWeight); } } else { for (; index < outputShortArrayLength; index++) { resultShort = CommonFunction.GetShort(secondAudioByteBuffer[index * 2], secondAudioByteBuffer[index * 2 + 1], Variable.isBigEnding); outputShortArray[index] = (short) (resultShort * secondAudioWeight); } } } if (outputShortArrayLength > 0) { int encodedSize = LameUtil.encode(outputShortArray, outputShortArray, outputShortArrayLength, mp3Buffer); if (encodedSize > 0) { composeAudioOutputStream.write(mp3Buffer, 0, encodedSize); } } } } catch (Exception e) { LogFunction.error("ComposeAudio异常", e); handler.post(new Runnable() { @Override public void run() { if (composeAudioInterface != null) { composeAudioInterface.composeFail(); } } }); return; } handler.post(new Runnable() { @Override public void run() { if (composeAudioInterface != null) { composeAudioInterface.updateComposeProgress(50); } } }); try { final int flushResult = LameUtil.flush(mp3Buffer); if (flushResult > 0) { composeAudioOutputStream.write(mp3Buffer, 0, flushResult); } } catch (Exception e) { LogFunction.error("释放ComposeAudio LameUtil异常", e); } finally { try { composeAudioOutputStream.close(); } catch (Exception e) { LogFunction.error("关闭合成输出音频流异常", e); } LameUtil.close(); } if (deleteSource) { FileFunction.DeleteFile(firstAudioFilePath); FileFunction.DeleteFile(secondAudioFilePath); } try { firstAudioInputStream.close(); secondAudioInputStream.close(); } catch (IOException e) { LogFunction.error("关闭合成输入音频流异常", e); } handler.post(new Runnable() { @Override public void run() { if (composeAudioInterface != null) { composeAudioInterface.composeSuccess(); } } }); }

　　ComposeAudio方法是此次的进行合成的具体代码实现，方法的传入参数中firstAudioFilePath, secondAudioFilePath是用以合成的音频文件地址，composeAudioFilePath用以指明合成后输出的MP3文件的存储地址，firstAudioWeight，secondAudioWeight分别用以指明合成的两个音频文件在合成进程中的音量权重，audioOffset用以指明第1个音频文件相对第2个音频文件合成进程中的数据偏移，如为负数，则合成进程中先输出audioOffset个字节长度的第2个音频文件数据，如为正数，则合成进程中先输出audioOffset个字节长度的第1个音频文件数据，audioOffset在另外一程度上也代表着时间的偏移，目前我们合成的两个音频文件参数为16位单通道44.1khz采样率，那末audioOffset如果为1*16/8*1*44100=88200字节，那末终究合成出的MP3文件中会先播放1s的第1个音频文件的音频接着再播放两个音频文件加和的音频。
　　整体合成代码是很清晰的，由于加入了时间偏移，所以合成进程中是有可能有1个文件先输出完的，在代码中针对性的进行处理便可，固然即便没有时间偏移也是可能出现类似情况的，比如音乐时长2分钟，录音3分钟，音乐输出结束后那就只应当输出录音音频了，另外在代码中将PCM数据编码为MP3文件使用了LAME的MP3编码库，除此之外代码中就没有比较复杂的模块了。