MaskHand: Generative Masked Modeling for Robust Hand Mesh Reconstruction in the Wild

Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Mayur Jagdishbhai Patel, Hongfei Xue, Ahmed Helmy, Srijan Das, Pu Wang

University of North Carolina at Charlotte (UNCC)
arXiv

For any inquiries, please email to: msaleem2@charlotte.edu

Our demo videos showcase MaskHand's ability to reconstruct highly accurate and realistic 3D hand meshes from single RGB images, overcoming challenges like complex articulations, self-occlusions, and depth ambiguities.

MaskHand Training

MaskHand Training Phase . MaskHand consists of two key components: (1) VQ-MANO, which encodes 3D hand poses into a sequence of discrete tokens within a latent space, and (2) a Context-Guided Masked Transformer that models the probabilistic distributions of these tokens, conditioned on the input image, 2D pose cues, and a partially masked token sequence.

MMHMR Overview Image

Text-to-Mesh Generation

MMHMR Inference Image

Confidence-Aware Unconditional Mesh Generation

MMHMR Inference Image
Visitor map cannot be displayed because JavaScript is disabled in your browser.