会话级采样率转换

一、背景

在上一篇中，系统已经完成了 G722Codec 的接入，并基于 java-media-codec 为 RTP 层补齐了 G722 的实际编码和解码能力。到这一步，当前项目在 codec 维度上已经不再局限于 PCMU / PCMA 两种 G.711 窄带语音，而是具备了在 SDP 协商成功后进入 16k 宽带链路的能力。

不过，仅仅支持 G722 编解码还不够。因为在整个系统里，RTP、电话会话、Realtime 模型三者的采样率并不天然一致。

当前系统中至少同时存在三种采样率语义：

1. 会话侧采样率

这由 SIP/SDP 协商结果决定。

当协商为 PCMU / PCMA 时，会话 PCM 采样率是 8000
当协商为 G722 时，会话 PCM 采样率是 16000

也就是说，会话侧采样率不是固定值，而是动态值。

2. 模型输入采样率

Realtime 模型通常要求固定输入格式：

16k PCM

因此当电话会话是 8k 时，需要先做一次上采样；当会话本身就是 16k 时，则无需转换。

3. 模型输出采样率

Realtime 模型输出通常固定为：

24k PCM

因此无论当前电话会话是 8k 还是 16k，模型输出在进入电话侧之前，都需要一次下采样。

如果没有一套清晰的采样率转换策略，系统就很容易出现以下问题：

在不同模块中散落大量 if (sampleRate != ...)
多个地方重复创建 resampler
连续音频流每帧都临时创建和销毁重采样器
native resampler 资源生命周期无法统一管理
运行时出现重复转换、职责混乱或状态串用

因此，在 G722 编解码能力补齐之后，下一步必须解决的就是：

如何围绕“会话采样率 ↔ 模型采样率 ↔ RTP 编码采样率”组织一套清晰、可复用、可管理的重采样机制。

二、目标

本次设计的目标不是简单“提供一个工具类做 resample”，而是建立一套真正适用于实时通话场景的会话级采样率转换机制。

整体目标包括以下几个方面。

1. 统一采样率转换责任边界

把系统中的采样率转换分成三条明确链路：

上行：会话输入 → 模型输入
下行：模型输出 → 会话输出
RTP 出口兜底：处理 MediaProcessor 返回采样率与 codec 目标采样率不一致的情况

每一条链路都应该有明确的责任归属，而不是随意分散在多个类里。

2. 让重采样器按会话维度复用

高质量重采样器不是纯函数式工具，而是带状态的运行时资源。它内部通常包含：

滤波历史样本
分数相位
输出延迟缓存

因此，在连续音频流中更适合：

一个会话对应一个 resampler 实例
在会话生命周期内复用
通话结束时统一释放

而不是每次处理一帧音频都临时创建和销毁。

3. 保持 RTP 层、MediaProcessor、Realtime 三层职责清晰

本次设计仍然坚持上一篇中的责任划分：

RTP 层负责 codec decode / encode 和 RTP 收发
RealtimeMediaProcessor 负责会话音频到模型输入格式的转换
SipRealtimeSession 负责模型下行音频到会话格式的转换和缓存
RtpUdpHandler 只保留最终编码前的兜底重采样

这样，采样率转换虽然被引入了更多实现细节，但职责边界反而会更清楚。

三、为什么不能只靠静态工具方法

在最初版本中，重采样通常会写成这样：

short[] out = AudioResampler.resample(input, srcRate, dstRate);

这种写法看起来很直接，也适合 demo 或离线处理场景，但它并不适合实时通话中的主链路。

原因在于：

1. 重采样器内部是有状态的

高质量重采样并不是简单的数组变换。内部通常会维护：

滤波器历史
相位位置
输出缓存

如果每一帧都重新创建 resampler，那么连续音频流在帧边界上就可能出现不连续、失真或额外噪声。

2. JNI / native 对象不适合高频重复创建

当前 AudioResampler 是基于 JNI 封装的 native resampler。如果在一条持续的实时音频链路里频繁：

createResampler
resampleDirect
destroyResampler

会引入额外的分配和释放成本，也不利于性能和资源管理。

3. 会话结束时不容易统一回收

如果所有地方都只是临时静态调用，那么虽然短期好用，但长期很难统一管理 native 资源生命周期。而一旦改成“会话持有 resampler”，在 CallSession.release() 里统一释放就会非常自然。

因此，本次设计不再把 AudioResampler 仅仅当成一个工具类，而是把它纳入到 会话级媒体资源管理 中。

四、设计方案

1. `AudioResampler` 继续保留两种用法

为了兼容旧代码，AudioResampler 仍然支持两种方式：

静态一次性调用

适合少量、临时、非主链路的重采样：

short[] out = AudioResampler.resample(input, srcRate, dstRate);

实例复用调用

适合连续流式音频：

AudioResampler resampler = new AudioResampler(1, srcRate, dstRate, 5, 0);
short[] out = resampler.resample(input);

其中，真正的主链路应优先使用第二种“实例复用”方式。

2. 把 resampler 作为会话级媒体资源绑定到 `CallSession`

在当前实现中，CallSession 不再只是 SIP / RTP 的轻量会话对象，而是同时承担了“会话级媒体资源容器”的职责。

在 codec 之外，本次新增了三个会话级重采样器字段：

private AudioResampler inputResampler;
private AudioResampler outputResampler;
private AudioResampler rtpResampler;

它们分别对应三种不同用途：

`inputResampler`

用于把当前会话输入音频转换为模型输入采样率。典型场景：

8k -> 16k
16k -> 16k（无需创建）

`outputResampler`

用于把模型输出音频转换为当前会话采样率。典型场景：

24k -> 8k
24k -> 16k

`rtpResampler`

用于 RTP 发包前的最终兜底转换。它不是主链路重采样器，而是防御性补偿措施，用于处理：

MediaProcessor 返回的 AudioFrame.sampleRate
与当前 codec 目标采样率不一致

的场景。

3. `CallSession` 负责重采样器的创建、复用和重建

为了避免外部模块直接管理 resampler 生命周期，CallSession 提供了：

getOrCreateInputResampler(int srcRate, int dstRate)
getOrCreateOutputResampler(int srcRate, int dstRate)
getOrCreateRtpResampler(int srcRate, int dstRate)

它们的行为是一致的：

如果源采样率和目标采样率相同，则返回 null
如果已有 resampler 且参数一致，则直接复用
如果已有 resampler 但参数不一致，则关闭旧实例并重建
返回新的会话级 resampler

这样外部模块不需要关心：

什么时候该创建
什么时候该关闭
参数变了怎么办

只需要在真正使用时调用 getOrCreate...() 即可。

4. `CallSession.release()` 统一释放媒体资源

除了 AudioCodec 之外，重采样器也都属于 native 资源。因此在会话结束时，需要统一释放：

audioCodec
inputResampler
outputResampler
rtpResampler

这也是为什么本次实现把 resampler 绑定到 CallSession 上，而不是让它们散落在各个处理器里各自维护。

五、三条采样率转换链路

1. 上行：`RealtimeMediaProcessor` 使用 `inputResampler`

RealtimeMediaProcessor 的职责是把当前会话音频转换为模型输入格式。

原来的写法通常是：

short[] modelInputSamples = inputSamples;
if (inputSampleRate != MODEL_INPUT_SAMPLE_RATE) {
  modelInputSamples = AudioResampler.resample(inputSamples, inputSampleRate, MODEL_INPUT_SAMPLE_RATE);
}

改造后，真正的主链路应改为：

short[] modelInputSamples = inputSamples;
if (inputSampleRate != MODEL_INPUT_SAMPLE_RATE) {
  AudioResampler resampler = session.getOrCreateInputResampler(inputSampleRate, MODEL_INPUT_SAMPLE_RATE);
  modelInputSamples = resampler.resample(inputSamples);
}

这样做有几个好处：

连续上行语音流复用同一个 resampler
保留内部状态，减少边界失真
不需要每帧重复创建 native 对象
生命周期由 CallSession 统一管理

这说明：

inputResampler 的正确使用点是 RealtimeMediaProcessor。

2. 下行：`SipRealtimeSession` 使用 `outputResampler`

SipRealtimeSession 的职责是接收模型输出音频，并把它转换成当前会话格式后入队。

原来的写法通常是：

short[] pcmSessionRate = pcm24k;
if (MODEL_OUTPUT_SAMPLE_RATE != sessionSampleRate) {
  pcmSessionRate = AudioResampler.resample(pcm24k, MODEL_OUTPUT_SAMPLE_RATE, sessionSampleRate);
}

改造后应写成：

short[] pcmSessionRate = pcm24k;
if (MODEL_OUTPUT_SAMPLE_RATE != sessionSampleRate) {
  AudioResampler resampler = session.getOrCreateOutputResampler(MODEL_OUTPUT_SAMPLE_RATE, sessionSampleRate);
  pcmSessionRate = resampler.resample(pcm24k);
}

这样：

模型下行音频持续复用同一个 resampler
输出队列里缓存的始终是“当前会话格式”的 PCM
RealtimeMediaProcessor 后续取帧时无需再额外关心模型输出采样率

这说明：

outputResampler 的正确使用点是 SipRealtimeSession。

3. RTP 出口兜底：`RtpUdpHandler` 使用 `rtpResampler`

理论上，MediaProcessor 返回的 AudioFrame 应尽量已经是当前会话格式。但为了保证系统健壮性，RTP 层仍然需要保留最后一道防线。

当 outputFrame.getSampleRate() 与当前 codec 目标采样率不一致时，RtpUdpHandler 可以这样处理：

if (outputSampleRate != targetSampleRate) {
  AudioResampler rtpResampler = session.getOrCreateRtpResampler(outputSampleRate, targetSampleRate);
  outputSamples = rtpResampler.resample(outputSamples);
}

需要强调的是：

rtpResampler 不是主链路重采样器
它只是最终编码前的兜底机制
主采样率转换仍应尽量在 RealtimeMediaProcessor 和 SipRealtimeSession 中完成

也就是说：

rtpResampler 的定位是“最后一道保险”，不是系统的主要重采样路径。

六、为什么三种 resampler 要分开

表面上看，三者都只是“做采样率转换”，似乎可以只保留一个统一的 audioResampler 字段。但实际上，它们对应的是三条不同的媒体链路，职责完全不同。

1. 输入方向不同

inputResampler：会话输入 → 模型输入
outputResampler：模型输出 → 会话输出
rtpResampler：MediaProcessor 输出 → codec 目标采样率

2. 生命周期触发点不同

inputResampler 在模型上行链路中频繁使用
outputResampler 在模型下行链路中频繁使用
rtpResampler 只在最终发包前偶尔兜底使用

3. 参数变化来源不同

inputResampler 取决于当前输入帧采样率和模型输入采样率
outputResampler 取决于模型输出采样率和会话采样率
rtpResampler 取决于 MediaProcessor 输出采样率和当前 codec 目标采样率

因此，把它们拆开并命名清楚，比用一个泛化的 resampler 字段更利于维护。

七、最终效果

完成本次改造后，系统中的采样率转换将形成如下结构：

当会话协商为 G711（8k）时

上行：

8k -> 16k -> Realtime Model

下行：

24k -> 8k -> RTP encode

当会话协商为 G722（16k）时

上行：

16k -> Realtime Model

下行：

24k -> 16k -> RTP encode

而在整个过程中：

上行转换由 inputResampler 负责
下行转换由 outputResampler 负责
RTP 层发包前最终兜底由 rtpResampler 负责

这样，系统的采样率转换不再是零散的工具调用，而是成为会话级媒体资源管理的一部分。

八代码实现

AudioResampler

package nexus.io.sip.rtp.codec;

import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.Objects;

import nexus.io.media.MediaCodec;

/**
 * JNI Audio Resampler 封装
 *
 * 支持两种用法：
 * 1. 静态一次性调用：兼容旧代码
 * 2. 实例复用调用：适合连续流式音频
 */
public final class AudioResampler implements AutoCloseable {

  private static final int DEFAULT_CHANNELS = 1;
  private static final int DEFAULT_QUALITY = 5;
  private static final int DEFAULT_OPTIONS = 0;

  private final int channels;
  private final int srcRate;
  private final int dstRate;
  @SuppressWarnings("unused")
  private final int quality;
  @SuppressWarnings("unused")
  private final int options;

  private final Object lock = new Object();

  private long handle;
  private ByteBuffer inputBuffer;
  private ByteBuffer outputBuffer;

  public AudioResampler(int channels, int srcRate, int dstRate) {
    this(channels, srcRate, dstRate, DEFAULT_QUALITY, DEFAULT_OPTIONS);
  }

  public AudioResampler(int channels, int srcRate, int dstRate, int quality, int options) {
    if (channels <= 0) {
      throw new IllegalArgumentException("channels must be > 0");
    }
    if (srcRate <= 0) {
      throw new IllegalArgumentException("srcRate must be > 0");
    }
    if (dstRate <= 0) {
      throw new IllegalArgumentException("dstRate must be > 0");
    }
    if (quality < 0 || quality > 10) {
      throw new IllegalArgumentException("quality must be between 0 and 10");
    }

    this.channels = channels;
    this.srcRate = srcRate;
    this.dstRate = dstRate;
    this.quality = quality;
    this.options = options;

    this.handle = MediaCodec.createResampler(channels, srcRate, dstRate, quality, options);
    if (this.handle == 0) {
      throw new IllegalStateException("Failed to create resampler, channels=" + channels + ", srcRate=" + srcRate
          + ", dstRate=" + dstRate + ", quality=" + quality);
    }
  }

  /**
   * 兼容旧代码：默认单声道，一次性重采样。
   */
  public static short[] resample(short[] input, int srcRate, int dstRate) {
    return resample(input, DEFAULT_CHANNELS, srcRate, dstRate, DEFAULT_QUALITY, DEFAULT_OPTIONS);
  }

  /**
   * 兼容旧代码：可指定完整参数，一次性重采样。
   */
  public static short[] resample(short[] input, int channels, int srcRate, int dstRate, int quality, int options) {
    try (AudioResampler resampler = new AudioResampler(channels, srcRate, dstRate, quality, options)) {
      return resampler.resample(input);
    }
  }

  /**
   * 实例复用方式：适合连续流式音频。
   */
  public short[] resample(short[] input) {
    Objects.requireNonNull(input, "input");

    if (input.length == 0) {
      return new short[0];
    }
    if (srcRate == dstRate) {
      return input.clone();
    }
    if (input.length % channels != 0) {
      throw new IllegalArgumentException(
          "input length must be divisible by channels, length=" + input.length + ", channels=" + channels);
    }

    synchronized (lock) {
      ensureOpen();

      int inputSamplesPerChannel = input.length / channels;
      int outputSamplesPerChannel = MediaCodec.getResamplerExpectedOutputSamples(handle, inputSamplesPerChannel);
      if (outputSamplesPerChannel < 0) {
        throw new IllegalStateException("getResamplerExpectedOutputSamples failed, code=" + outputSamplesPerChannel);
      }

      int inputBytes = input.length * 2;
      int outputBytes = outputSamplesPerChannel * channels * 2;

      inputBuffer = ensureDirectBuffer(inputBuffer, inputBytes);
      outputBuffer = ensureDirectBuffer(outputBuffer, outputBytes);

      inputBuffer.clear();
      outputBuffer.clear();

      for (int i = 0; i < input.length; i++) {
        inputBuffer.putShort(i * 2, input[i]);
      }

      int actualOutputSamplesPerChannel = MediaCodec.resampleDirect(handle, inputBuffer, inputSamplesPerChannel,
          outputBuffer);
      if (actualOutputSamplesPerChannel < 0) {
        throw new IllegalStateException("resampleDirect failed, code=" + actualOutputSamplesPerChannel);
      }

      int totalOutputSamples = actualOutputSamplesPerChannel * channels;
      short[] out = new short[totalOutputSamples];
      for (int i = 0; i < totalOutputSamples; i++) {
        out[i] = outputBuffer.getShort(i * 2);
      }
      return out;
    }
  }

  public void reset() {
    synchronized (lock) {
      ensureOpen();
      int code = MediaCodec.resetResampler(handle);
      if (code < 0) {
        throw new IllegalStateException("resetResampler failed, code=" + code);
      }
    }
  }

  public int getChannels() {
    return channels;
  }

  public int getSrcRate() {
    return srcRate;
  }

  public int getDstRate() {
    return dstRate;
  }

  private void ensureOpen() {
    if (handle == 0) {
      throw new IllegalStateException("AudioResampler already closed");
    }
  }

  private static ByteBuffer ensureDirectBuffer(ByteBuffer buffer, int capacity) {
    if (buffer != null && buffer.capacity() >= capacity) {
      buffer.clear();
      buffer.order(ByteOrder.LITTLE_ENDIAN);
      return buffer;
    }
    return ByteBuffer.allocateDirect(capacity).order(ByteOrder.LITTLE_ENDIAN);
  }

  @Override
  public void close() {
    synchronized (lock) {
      if (handle != 0) {
        MediaCodec.destroyResampler(handle);
        handle = 0;
      }
      inputBuffer = null;
      outputBuffer = null;
    }
  }
}

CallSession

package nexus.io.sip.model;

import nexus.io.sip.rtp.RtpUdpServer;
import nexus.io.sip.rtp.codec.AudioCodec;
import nexus.io.sip.rtp.codec.AudioResampler;
import nexus.io.sip.rtp.codec.NegotiatedAudioFormatResolver;
import nexus.io.sip.sdp.CodecSpec;

public class CallSession {

  private int pcmSampleRate;
  private int channels = 1;

  private String callId;
  private String fromTag;
  private String toTag;

  private String transport;

  private String remoteSipIp;
  private int remoteSipPort;

  private String remoteRtpIp;
  private int remoteRtpPort;
  private int localRtpPort;

  private long createdTime;
  private long updatedTime;
  private long ackDeadline;

  private boolean ackReceived;
  private boolean terminated;

  private String last200Ok;
  private RtpUdpServer rtpServer;

  private CodecSpec selectedCodec;

  private AudioResampler inputResampler;
  private AudioResampler outputResampler;
  private AudioResampler rtpResampler;

  private boolean telephoneEventSupported;
  private int remoteTelephoneEventPayloadType = -1;
  private int ptime = 20;

  /**
   * 一个 session 一个运行时 codec 实例。
   * 对于 JNI codec，避免多个会话共享同一个 native 状态对象。
   */
  private AudioCodec audioCodec;

  private long localSsrc = System.nanoTime() & 0xFFFFFFFFL;
  private int sendSequence = 0;
  private long sendTimestamp = 0;
  private boolean rtpInitialized = false;

  public synchronized int nextSendSequence() {
    sendSequence = (sendSequence + 1) & 0xFFFF;
    return sendSequence;
  }

  public synchronized long nextSendTimestamp(int pcmSampleCount) {
    int step = toRtpTimestampStep(pcmSampleCount);
    if (step <= 0) {
      step = pcmSampleCount > 0 ? pcmSampleCount : 160;
    }

    if (!rtpInitialized) {
      rtpInitialized = true;
      sendTimestamp = step & 0xFFFFFFFFL;
      return sendTimestamp;
    }

    sendTimestamp = (sendTimestamp + step) & 0xFFFFFFFFL;
    return sendTimestamp;
  }

  private int toRtpTimestampStep(int pcmSampleCount) {
    if (pcmSampleCount <= 0) {
      return 0;
    }

    CodecSpec codec = this.selectedCodec;
    int clockRate = codec != null && codec.getClockRate() > 0 ? codec.getClockRate() : 8000;
    int pcmSampleRate = NegotiatedAudioFormatResolver.resolveSessionPcmSampleRate(codec);

    if (pcmSampleRate <= 0) {
      pcmSampleRate = clockRate > 0 ? clockRate : 8000;
    }

    long step = ((long) pcmSampleCount * clockRate) / pcmSampleRate;
    if (step <= 0) {
      step = 1;
    }
    return (int) step;
  }

  public synchronized AudioCodec getAudioCodec() {
    return audioCodec;
  }

  public synchronized void setAudioCodec(AudioCodec audioCodec) {
    this.audioCodec = audioCodec;
  }

  public synchronized void release() {
    if (audioCodec instanceof AutoCloseable) {
      try {
        ((AutoCloseable) audioCodec).close();
      } catch (Exception e) {
        // ignore
      }
    }
    audioCodec = null;

    if (inputResampler != null) {
      try {
        inputResampler.close();
      } catch (Exception ignore) {
      }
      inputResampler = null;
    }

    if (outputResampler != null) {
      try {
        outputResampler.close();
      } catch (Exception ignore) {
      }
      outputResampler = null;
    }

    if (rtpResampler != null) {
      try {
        rtpResampler.close();
      } catch (Exception ignore) {
      }
      rtpResampler = null;
    }
  }

  public long getLocalSsrc() {
    return localSsrc;
  }

  public void setLocalSsrc(long localSsrc) {
    this.localSsrc = localSsrc;
  }

  public int getSendSequence() {
    return sendSequence;
  }

  public void setSendSequence(int sendSequence) {
    this.sendSequence = sendSequence;
  }

  public long getSendTimestamp() {
    return sendTimestamp;
  }

  public void setSendTimestamp(long sendTimestamp) {
    this.sendTimestamp = sendTimestamp;
  }

  public boolean isRtpInitialized() {
    return rtpInitialized;
  }

  public void setRtpInitialized(boolean rtpInitialized) {
    this.rtpInitialized = rtpInitialized;
  }

  public String getCallId() {
    return callId;
  }

  public void setCallId(String callId) {
    this.callId = callId;
  }

  public String getFromTag() {
    return fromTag;
  }

  public void setFromTag(String fromTag) {
    this.fromTag = fromTag;
  }

  public String getToTag() {
    return toTag;
  }

  public void setToTag(String toTag) {
    this.toTag = toTag;
  }

  public String getTransport() {
    return transport;
  }

  public void setTransport(String transport) {
    this.transport = transport;
  }

  public String getRemoteSipIp() {
    return remoteSipIp;
  }

  public void setRemoteSipIp(String remoteSipIp) {
    this.remoteSipIp = remoteSipIp;
  }

  public int getRemoteSipPort() {
    return remoteSipPort;
  }

  public void setRemoteSipPort(int remoteSipPort) {
    this.remoteSipPort = remoteSipPort;
  }

  public String getRemoteRtpIp() {
    return remoteRtpIp;
  }

  public void setRemoteRtpIp(String remoteRtpIp) {
    this.remoteRtpIp = remoteRtpIp;
  }

  public int getRemoteRtpPort() {
    return remoteRtpPort;
  }

  public void setRemoteRtpPort(int remoteRtpPort) {
    this.remoteRtpPort = remoteRtpPort;
  }

  public int getLocalRtpPort() {
    return localRtpPort;
  }

  public void setLocalRtpPort(int localRtpPort) {
    this.localRtpPort = localRtpPort;
  }

  public long getCreatedTime() {
    return createdTime;
  }

  public void setCreatedTime(long createdTime) {
    this.createdTime = createdTime;
  }

  public long getUpdatedTime() {
    return updatedTime;
  }

  public void setUpdatedTime(long updatedTime) {
    this.updatedTime = updatedTime;
  }

  public long getAckDeadline() {
    return ackDeadline;
  }

  public void setAckDeadline(long ackDeadline) {
    this.ackDeadline = ackDeadline;
  }

  public boolean isAckReceived() {
    return ackReceived;
  }

  public void setAckReceived(boolean ackReceived) {
    this.ackReceived = ackReceived;
  }

  public boolean isTerminated() {
    return terminated;
  }

  public void setTerminated(boolean terminated) {
    this.terminated = terminated;
  }

  public String getLast200Ok() {
    return last200Ok;
  }

  public void setLast200Ok(String last200Ok) {
    this.last200Ok = last200Ok;
  }

  public RtpUdpServer getRtpServer() {
    return rtpServer;
  }

  public void setRtpServer(RtpUdpServer rtpServer) {
    this.rtpServer = rtpServer;
  }

  public CodecSpec getSelectedCodec() {
    return selectedCodec;
  }

  public void setSelectedCodec(CodecSpec selectedCodec) {
    this.selectedCodec = selectedCodec;
  }

  public boolean isTelephoneEventSupported() {
    return telephoneEventSupported;
  }

  public void setTelephoneEventSupported(boolean telephoneEventSupported) {
    this.telephoneEventSupported = telephoneEventSupported;
  }

  public int getRemoteTelephoneEventPayloadType() {
    return remoteTelephoneEventPayloadType;
  }

  public void setRemoteTelephoneEventPayloadType(int remoteTelephoneEventPayloadType) {
    this.remoteTelephoneEventPayloadType = remoteTelephoneEventPayloadType;
  }

  public int getPtime() {
    return ptime;
  }

  public void setPtime(int ptime) {
    this.ptime = ptime;
  }

  public int getPcmSampleRate() {
    return pcmSampleRate;
  }

  public void setPcmSampleRate(int pcmSampleRate) {
    this.pcmSampleRate = pcmSampleRate;
  }

  public int getChannels() {
    return channels;
  }

  public void setChannels(int channels) {
    this.channels = channels;
  }

  public AudioResampler getInputResampler() {
    return inputResampler;
  }

  public void setInputResampler(AudioResampler inputResampler) {
    this.inputResampler = inputResampler;
  }

  public AudioResampler getOutputResampler() {
    return outputResampler;
  }

  public void setOutputResampler(AudioResampler outputResampler) {
    this.outputResampler = outputResampler;
  }

  public AudioResampler getRtpResampler() {
    return rtpResampler;
  }

  public void setRtpResampler(AudioResampler rtpResampler) {
    this.rtpResampler = rtpResampler;
  }

  public synchronized AudioResampler getOrCreateInputResampler(int srcRate, int dstRate) {
    if (srcRate <= 0 || dstRate <= 0 || srcRate == dstRate) {
      return null;
    }

    if (inputResampler != null) {
      if (inputResampler.getSrcRate() == srcRate && inputResampler.getDstRate() == dstRate) {
        return inputResampler;
      }
      try {
        inputResampler.close();
      } catch (Exception ignore) {
      }
      inputResampler = null;
    }

    inputResampler = new AudioResampler(1, srcRate, dstRate, 5, 0);
    return inputResampler;
  }

  public synchronized AudioResampler getOrCreateOutputResampler(int srcRate, int dstRate) {
    if (srcRate <= 0 || dstRate <= 0 || srcRate == dstRate) {
      return null;
    }

    if (outputResampler != null) {
      if (outputResampler.getSrcRate() == srcRate && outputResampler.getDstRate() == dstRate) {
        return outputResampler;
      }
      try {
        outputResampler.close();
      } catch (Exception ignore) {
      }
      outputResampler = null;
    }

    outputResampler = new AudioResampler(1, srcRate, dstRate, 5, 0);
    return outputResampler;
  }

  public synchronized AudioResampler getOrCreateRtpResampler(int srcRate, int dstRate) {
    if (srcRate <= 0 || dstRate <= 0 || srcRate == dstRate) {
      return null;
    }

    if (rtpResampler != null) {
      if (rtpResampler.getSrcRate() == srcRate && rtpResampler.getDstRate() == dstRate) {
        return rtpResampler;
      }
      try {
        rtpResampler.close();
      } catch (Exception ignore) {
      }
      rtpResampler = null;
    }

    rtpResampler = new AudioResampler(1, srcRate, dstRate, 5, 0);
    return rtpResampler;
  }
}

RealtimeMediaProcessor

RealtimeMediaProcessor

这里应该用 inputResampler。

你现在这段：

short[] modelInputSamples = inputSamples;
if (inputSampleRate != MODEL_INPUT_SAMPLE_RATE) {
  modelInputSamples = AudioResampler.resample(inputSamples, inputSampleRate, MODEL_INPUT_SAMPLE_RATE);
}

改成：

short[] modelInputSamples = inputSamples;
if (inputSampleRate != MODEL_INPUT_SAMPLE_RATE) {
  AudioResampler resampler = session.getOrCreateInputResampler(inputSampleRate, MODEL_INPUT_SAMPLE_RATE);
  modelInputSamples = resampler.resample(inputSamples);
}

这里就是 inputResampler 的正确使用点。

SipRealtimeSession

SipRealtimeSession

这里应该用 outputResampler。

你现在这段：

short[] pcmSessionRate = pcm24k;
if (MODEL_OUTPUT_SAMPLE_RATE != sessionSampleRate) {
  pcmSessionRate = AudioResampler.resample(pcm24k, MODEL_OUTPUT_SAMPLE_RATE, sessionSampleRate);
}

改成：

short[] pcmSessionRate = pcm24k;
if (MODEL_OUTPUT_SAMPLE_RATE != sessionSampleRate) {
  AudioResampler resampler = session.getOrCreateOutputResampler(MODEL_OUTPUT_SAMPLE_RATE, sessionSampleRate);
  pcmSessionRate = resampler.resample(pcm24k);
}

RtpUdpHandler


    if (outputSampleRate != targetSampleRate) {
        AudioResampler rtpResampler = session.getOrCreateRtpResampler(outputSampleRate, targetSampleRate);
        outputSamples = rtpResampler.resample(outputSamples);
      }

九、总结

在支持 G722 宽带语音之后，系统中的采样率转换问题变得更加重要。因为会话侧已经不再固定为 8k，而是可能根据协商结果在 8k 和 16k 之间动态变化。

在这样的前提下，如果仍然把重采样简单当成静态工具方法来用，就很难满足：

连续流式音频的稳定性
native 资源的统一管理
多条媒体链路的职责划分
后续系统扩展时的可维护性

因此，本次设计采用了“会话级重采样器”的方案：

CallSession 统一持有 inputResampler / outputResampler / rtpResampler
RealtimeMediaProcessor 负责上行重采样
SipRealtimeSession 负责下行重采样
RtpUdpHandler 负责最终兜底重采样
会话结束时统一释放所有媒体资源

可以把这一篇的核心设计概括成一句话：

重采样不是一个零散工具，而是会话级媒体资源；上行、下行和 RTP 出口各自使用不同职责的 resampler，才能让整条语音链路真正清晰、稳定且可扩展。