Muhammad Usama Saleem

Muhammad Usama Saleem

Ph.D. Student

University of North Carolina at Charlotte

About me

I am a Ph.D. candidate in Computer Science at the University of North Carolina at Charlotte, supervised by Dr. Pu Wang in the GENIUS Lab. In industry, I work as a researcher with the Computer Vision teams at Amazon and Lowe’s where I am developing large-scale, multimodal language models (MLLMs) to enhance operational efficiency and customer experience in complex, real-world environments.

Research Interests

My research interests lie at the intersection of computer vision and generative AI, with a focus on 3D human modeling. Specifically, I focus on 3D human pose estimation and mesh reconstruction via generative masked modeling. Moreover, I’m interested in developing multimodal motion synthesis frameworks that synthesize controllable, high-fidelity 3D human animations for real time applications.

If you have any research opportunities, please feel free to reach out at msaleem2@charlotte.edu .

News

  • June 2025: Joined Amazon as Applied Scientist II Intern
  • June 2025: A paper on “MaskHand: Generative Masked Modeling for Robust Hand Mesh Reconstruction in the Wild” is accepted to ICCV 2025!
  • June 2025: A paper on “ControlMM: Controllable Masked Motion Generation” is accepted to ICCV 2025!
  • April 2025: “DanceMosaic: High-Fidelity Dance Generation with Multimodal Editability” is now available on arXiv.
  • Dec 2024: “GenHMR: Generative Human Mesh Recovery” was accepted to AAAI 2025, presented in Philadelphia, and received a travel award.
  • Oct 2024: “BioPose: Biomechanically-Accurate 3D Pose Estimation from Monocular Videos” is accepted to WACV 2025!
  • July 2024: “BAMM: Bidirectional Autoregressive Motion Model” is accepted to ECCV 2024!
  • Sept 2023: Joined Lowe’s as Research Lead of the Computer Vision UNCC Team
  • June 2023: “Private Data Synthesis from Decentralized Non-IID Data” accepted to IJCNN 2023, presented in Queensland, Australia, and received a $5500 travel grant!
  • April 2023: Presented at the SIAM International Conference on Data Mining (SDM’23) Doctoral Forum; awarded NSF $1400 travel grant.
  • July 2022: “Privacy Enhancement for Cloud-Based Few-Shot Learning” accepted to IJCNN 2022!
  • Jan 2022: “DP-Shield: Face Obfuscation with Differential Privacy” accepted to EDBT 2022!

Recent Publications

DanceMosaic: High-Fidelity Dance Generation with Multimodal Editability

arXiv 2025
*Equal Contribution
DanceMosaic is a novel multimodal masked motion framework—fusing text, music, and pose adapters via progressive generative masking with inference-time optimization for precise, editable motion.

MaskHand: Generative Masked Modeling for Robust Hand Mesh Reconstruction in the Wild

ICCV 2025
MaskHand is a probabilistic masked modeling framework—tokenizing articulations with VQ-MANO and using a context-aware masked transformer to fuse multi-scale image features and 2D cues for iterative, confidence-guided sampling.

ControlMM: Controllable Masked Motion Generation

ICCV 2025
ControlMM is a novel masked generative motion model—combining masked consistency and inference-time logit editing in a parallel decoder for fast, high-fidelity, spatially precise motion generation.

GenHMR: Generative Human Mesh Recovery

AAAI 2025
GenHMR reframes monocular HMR as an image-conditioned generative task—employing a VQ-VAE pose tokenizer and masked transformer to model 2D→3D uncertainty, iteratively sampling high-confidence tokens and refining them with 2D cues for accurate mesh recovery.

BioPose: Biomechanically-accurate 3D Pose Estimation from Monocular Videos

WACV 2025
BioPose is a novel biomechanics-guided 3D pose estimation framework—combining a multi-query deformable transformer for precise mesh recovery, a neural IK network that enforces anatomical constraints, and 2D-informed iterative pose refinement.

BAMM: Bidirectional Autoregressive Motion Model

ECCV 2024
BAMM is a novel text-to-motion framework that employs a hybrid-masked self-attention transformer—merging generative masking with autoregression to handle dynamic sequence lengths and enable editable, high-quality motion.

Private Data Synthesis from Decentralized Non-IID Data

IJCNN 2023
DPFedProxGAN is a federated, differentially-private GAN that generates realistic synthetic images from non-IID distributed data using local DP and FedProx optimization.
Private Data Synthesis

Privacy Enhancement for Cloud-Based Few-Shot Learning

IJCNN 2022
A novel few-shot framework uses a joint privacy–classification loss to learn embeddings that protect image data while maintaining high few-shot accuracy in cloud-based vision.
Privacy Enhancement

DP-Shield: Face Obfuscation with Differential Privacy

EDBT 2022
DP-Shield safeguards against unauthorized face recognition by applying differential privacy–based obfuscation and providing image quality and recognition-risk metrics.