Article Summary (Model: gpt-5-mini-2025-08-07)
Subject: PersonaPlex 7B on Apple
The Gist: A native Swift/MLX port runs NVIDIA’s PersonaPlex 7B as a single, full‑duplex speech‑to‑speech model on Apple Silicon. The project converts the original 16.7 GB checkpoint into a 4‑bit MLX safetensor (~5.3 GB), reuses the Mimi codec, implements a Depformer for per‑step weight slicing, and achieves faster‑than‑real‑time generation (~68 ms/step) with streaming support and round‑trip ASR verification.
Key Claims/Facts:
- Model architecture: PersonaPlex processes 17 parallel token streams (user/agent audio + text) through a temporal transformer and a Depformer that emits audio codebooks decoded by the Mimi codec.
- Quantization & size: The PyTorch checkpoint was converted to MLX safetensors and quantized to 4‑bit, shrinking the footprint to ~5.3 GB while preserving quality for tested ASR round‑trips.
- Performance & API: The Swift library exposes streaming (respondStream) and offline (respond) paths, includes Metal/MLX optimizations (compile, prefill batching, eval consolidation), and reports RTF ≈ 0.87 on M2 Max.
Discussion Summary (Model: gpt-5-mini-2025-08-07)
Consensus: Cautiously Optimistic — readers are impressed by native, faster‑than‑real‑time speech‑to‑speech but flag practical limits and risks.
Top Critiques & Pushback:
Better Alternatives / Prior Art:
Expert Context: