Figure 1: (a) Existing point cloud joint compression methods rely on recoloring to align geometry and attributes. However, manual bit allocation leads to suboptimal reconstruction. (b) The proposed MEGA-PCC uses a shared latent space and loss-based bit allocation, enabling end-to-end compression without recoloring or posthoc model matching. (c) Traditional model matching involves an exhaustive search of the optimal pairing (e.g., pairing 5 rate points from PCGCv2 with ANF-PCAC) and selects top results based on PCQM scores, incurring high computational cost.
Joint compression of point cloud geometry and attributes is essential for efficient 3D data representation. Existing methods often rely on post-hoc recoloring procedures and manually tuned bitrate allocation between geometry and attribute bitstreams in inference, which hinders end-to-end optimization and increases system complexity.
To overcome these limitations, we propose MEGA-PCC, a fully end-to-end, learning-based framework featuring two specialized models for joint compression. The main compression model employs a shared encoder that encodes both geometry and attribute information into a unified latent representation, followed by dual decoders that sequentially reconstruct geometry and then attributes. Complementing this, the Mamba-based Entropy Model (MEM) enhances entropy coding by capturing spatial and channel-wise correlations to improve probability estimation. Both models are built on the Mamba architecture to effectively model long-range dependencies and rich contextual features.
By eliminating the need for recoloring and heuristic bitrate tuning, MEGA-PCC enables data-driven bitrate allocation during training and simplifies the overall pipeline. Extensive experiments demonstrate that MEGA-PCC achieves superior rate-distortion performance and runtime efficiency compared to both traditional and learning-based baselines, offering a powerful solution for AI-driven point cloud compression.
Figure 3: (a) Serialize point clouds into a sequence while preserving spatial proximity relationships between consecutive elements in the sequence. (b) Tri-Mamba, which is used in both encoder and decoder for feature extraction, combines forward, backward, and feature channel scanning to comprehensively understand spatial information and leverage channel-wise information to enrich feature representation.
Figure 4: The core component of MEM is the Bi-Mamba module, which performs two causal scans. Forward SSM processes the token sequence in a forward direction to model long-range spatial relationships, while Channel Flip SSM scans across feature channels to capture inter-channel correlations for each token. Both scanning operations are causal, ensuring that only past information is used when predicting the current token’s distribution.
Fig. 5 R-D performance of the proposed scheme in terms of 1-PCQM.
Table 1. BD-Rate (%) comparison for geometry and attribute distortion relative to G-PCCv23
@article{hsieh2025MEGAPCC,
title = {MEGA-PCC: A Mamba-based Efficient Approach for Joint Geometry andAttribute Point Cloud Compression},
author = {Kai-Hsiang Hsieh and Monyneath Yim and Wen-Hsiao Peng and Jui-Chiu Chiang},
year = {2025},
eprint={2512.22463},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/pdf/2512.22463}
}