Structured Masked Diffusion for Joint Multiuser Decoding
摘要
In joint multiuser decoding, a receiver recovers a set of messages from a single noisy aggregate of many simultaneous transmissions. Classical decoders rely on rule-based mechanisms such as successive interference cancellation, joint belief propagation, or list recovery, all of which become brittle or expensive as ambiguity increases. We propose CIDER, a learned multiuser decoder with masked-diffusion refinement steps. CIDER uses demixing to prevent duplicate-row collapse and uses parity-aware propagation to provide soft guidance from the code constraints. In higher-load regimes, we further improve reliability via a lightweight quality-guided remasking step that selectively re-decodes low-confidence sequences. On commonly used error-correcting codes, CIDER matches or improves on FFT-accelerated joint belief propagation-style decoding in symbol error rate while running more than $6\times$ to over $100\times$ faster, with the speedup widening as the blocklength grows. Code is available at https://github.com/jiyunyoung/CIDER.
相关性判断
highDirectly on joint multiuser decoding for unsourced random access, with LDPC codes, parity-aware decoding, and communications performance/complexity results; squarely in cs.IT/eess.SP.
High relevance to cs.IT/eess.SP multiuser decoding and unsourced random access, with concrete mechanisms beyond generic learned decoding: masked diffusion, demixing, LDPC parity-aware propagation, and remasking. Structure analysis reports strong empirical claims against FFT-BP-style decoding, including comparable or better SER and large runtime gains as blocklength grows. Problem setting and assumptions are clearly identified, and the claimed contribution targets a real bottleneck in high-load joint decoding.
核心问题与主要方法
核心问题
recover an unordered set of transmitted messages from one noisy superposition under joint multiuser decoding constraints
场景:unsourced random access / shared-codebook uplink with two-stage decoding: fixed symbol-wise evidence S followed by multiuser decoding over K codewords
主要方法
Masked discrete diffusion initializes an all-mask K x L grid and repeatedly reveals high-confidence symbol predictions using a cosine reveal schedule. Row-wise demixing computes soft responsibilities over rows for each slot-symbol candidate, creating competition among user hypotheses and reducing duplicate-row collapse. Parity-aware propagation injects sparse LDPC constraints through Tanner-graph messages and finite-field coefficient actions, guiding each row toward code-consistent assemblies. Quality-guided remasking attaches a lightweight confidence head at higher loads, remasks low-confidence rows, and re-decodes them while clamping high-confidence rows. Evaluation uses Hungarian row matching to respect the unordered set-recovery target when computing SER and CER.
关键贡献与后续阅读
关键贡献
Introduces masked diffusion as a learned mechanism for shared-codebook joint multiuser decoding in URA, replacing rule-based stitching/SIC/BP-style assembly with fixed-step iterative refinement. Formulates the decoder around the fixed two-stage receiver interface S -> X, isolating the multiuser decoding problem from the AMP-MMSE symbol detector. Designs a demixing module that addresses the permutation-invariant duplicate-row collapse failure mode by making K hypothesis rows compete for shared slot-wise evidence. Designs parity-aware propagation over the LDPC Tanner graph so diffusion refinement can use global code constraints without relying on classical non-binary BP iterations as the decoder itself. Adds an inference-time PRISM-style quality-guided remasking mechanism for higher-load regimes. Provides empirical comparisons against classical decoders, one-shot neural decoders, generic masked diffusion, and ablations across LDPC lengths, per-bin loads, and additional sparse-graph code families.
研究启发
How robust are the gains under richer fading, imperfect synchronization, channel-estimation error, or non-synthetic traffic models beyond the controlled URA abstraction? How sensitive is CIDER to load-estimation errors, since the reported setup assumes known per-bin K or uses K-specific decoder banks? Would optimized batched GPU implementations of FFT-BP or SIC-BP reduce the reported wall-clock speedup, especially in regimes where latency rather than throughput is the bottleneck? Can one model generalize across K without a bank of K-specific decoders, and what accuracy/runtime tradeoff would that introduce?
限制与不确定性
Evidence is still from structure analysis only, not independent full-paper verification. Evaluation appears synthetic and depends on fixed AMP-MMSE evidence, known per-bin load, and K-specific decoders, which may limit protocol-level impact. Runtime claims may be sensitive to implementation and hardware.
底部评论
0 条根评论,可继续回复叠楼