MaskHand: Generative Masked Modeling for Robust Hand Mesh Reconstruction in the Wild

(ICCV 2025)

Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Mayur Jagdishbhai Patel, Hongfei Xue, Ahmed Helmy, Srijan Das, Pu Wang

University of North Carolina at Charlotte (UNCC)

For any inquiries, please email to: msaleem2@charlotte.edu

Our demo videos showcase MaskHand's ability to reconstruct highly accurate and realistic 3D hand meshes from single RGB images, overcoming challenges like complex articulations, self-occlusions, and depth ambiguities.

MaskHand Training

MaskHand Training Phase . MaskHand consists of two key components: (1) VQ-MANO, which encodes 3D hand poses into a sequence of discrete tokens within a latent space, and (2) a Context-Guided Masked Transformer that models the probabilistic distributions of these tokens, conditioned on the input image, 2D pose cues, and a partially masked token sequence.

MMHMR Overview Image

State-of-the-Art Comparison

SOTA Comparison Slide 1

SOTA Comparison Slide 2

SOTA Comparison Slide 3

Occluded Masked Regions around 80-90%: Challenging Poses

HINT Challenging Pose 1

HINT Challenging Pose 2

Multiple Reconstruction Hypotheses with Explicit Confidence Levels

Multiple reconstruction hypotheses with explicit confidence levels. The figure illustrates MaskHand's 3D hand mesh reconstructions in occluded scenarios, ranked by confidence. High-confidence hypotheses closely align with the ground truth, ensuring structural accuracy and fidelity. As confidence decreases (e.g., from the 100th to the 1000th hypothesis), reconstructions degrade, exhibiting distortions and unrealistic poses. This highlights the importance of prioritizing high-confidence hypotheses for robust and accurate 3D hand reconstruction.

Confidence Image 1

Confidence Image 2

Confidence Image 3

Confidence Image 4

HINT Benchmark: Challenging Poses in Wild

Complex Scenario 1

Complex Scenario 2

Complex Scenario 3

Complex Scenario 4

Visualizing In-Wild Results

MMHMR In-Wild Image 1

MMHMR In-Wild Image 2

Visualizing Reference Keypoints and their Offsets

MMHMR Overview Image 1

MMHMR Overview Image 2

MMHMR Overview Image 3

MMHMR Overview Image 4

Effectiveness of Proposed MaskHand Components

Component Analysis Image 1

Component Analysis Image 2

Component Analysis Image 3

Text-to-Mesh Generation

MMHMR Inference Image

Confidence-Aware Unconditional Mesh Generation

MMHMR Inference Image

Visitor map cannot be displayed because JavaScript is disabled in your browser.