MMHMR: Generative Masked Modeling for Hand Mesh Recovery

Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Mayur Jagdishbhai Patel, Hongfei Xue, Ahmed Helmy, Srijan Das, Pu Wang

University of North Carolina at Charlotte (UNCC)

arXiv

For any inquiries, please email to: msaleem2@charlotte.edu

Our demo videos showcase MMHMR's ability to reconstruct highly accurate and realistic 3D hand meshes from single RGB images, overcoming challenges like complex articulations, self-occlusions, and depth ambiguities.

MMHMR Training

MMHMR Training Phase . MMHMR consists of two key components: (1) VQ-MANO, which encodes 3D hand poses into a sequence of discrete tokens within a latent space, and (2) a Context-Guided Masked Transformer that models the probabilistic distributions of these tokens, conditioned on the input image, 2D pose cues, and a partially masked token sequence.

MMHMR Overview Image

State-of-the-Art Comparison

SOTA Comparison Slide 1

SOTA Comparison Slide 2

SOTA Comparison Slide 3

Effectiveness of Proposed MMHMR Components

Component Analysis Image 1

Component Analysis Image 2

Component Analysis Image 3

Component Analysis Image 4

Impact of most Confident Tokens

Impact of pose token confidence on MMHMR's reconstruction accuracy and diversity in occluded regions. The figure compares 3D hand meshes using the 1st, 7th, 10th, 100th, 250th, and 1000th most confident tokens. Higher-confidence tokens yield accurate reconstructions, while lower-confidence ones introduce variations, demonstrating how MMHMR captures uncertainty and generates diverse reconstructions, especially in occluded regions.

Confidence Image 1

Confidence Image 2

Confidence Image 3

Confidence Image 4

HINT Benchmark: Challenging Poses in Wild

HINT Challenging Pose 1

HINT Challenging Pose 2

HINT Challenging Pose 3

Complex and Occluded Scenarios

Complex Scenario 1

Complex Scenario 2

Complex Scenario 3

Complex Scenario 4

Visualizing In-Wild Results

MMHMR In-Wild Image 1

MMHMR In-Wild Image 2

Visualizing Reference Keypoints and their Offsets

MMHMR Overview Image 1

MMHMR Overview Image 2

MMHMR Overview Image 3

MMHMR Overview Image 4

Mesh-Guided Control for Hand Image Generation

ControlNet Image 1

ControlNet Image 2

Text-to-Mesh Generation

MMHMR Inference Image