Muhammad Usama Saleem

Ph.D. Student

University of North Carolina at Charlotte

About me

I am a Ph.D. candidate in Computer Science at the University of North Carolina at Charlotte, supervised by Dr. Pu Wang in the GENIUS Lab. In industry, I work as a researcher with the Computer Vision teams at Amazon and Lowe’s where I am developing large-scale, multimodal language models (MLLMs) to enhance operational efficiency and customer experience in complex, real-world environments.

Research Interests

My research interests lie at the intersection of computer vision and generative AI, with a focus on 3D human modeling. Specifically, I focus on 3D human pose estimation and mesh reconstruction via generative masked modeling. Moreover, I’m interested in developing multimodal motion synthesis frameworks that synthesize controllable, high-fidelity 3D human animations for real time applications.

If you have any research opportunities, please feel free to reach out at msaleem2@charlotte.edu .

News

June 2025: Joined Amazon as Applied Scientist II Intern
June 2025: A paper on “MaskHand: Generative Masked Modeling for Robust Hand Mesh Reconstruction in the Wild” is accepted to ICCV 2025!
June 2025: A paper on “ControlMM: Controllable Masked Motion Generation” is accepted to ICCV 2025!
April 2025: “DanceMosaic: High-Fidelity Dance Generation with Multimodal Editability” is now available on arXiv.
Dec 2024: “GenHMR: Generative Human Mesh Recovery” was accepted to AAAI 2025, presented in Philadelphia, and received a travel award.
Oct 2024: “BioPose: Biomechanically-Accurate 3D Pose Estimation from Monocular Videos” is accepted to WACV 2025!
July 2024: “BAMM: Bidirectional Autoregressive Motion Model” is accepted to ECCV 2024!
Sept 2023: Joined Lowe’s as Research Lead of the Computer Vision UNCC Team
June 2023: “Private Data Synthesis from Decentralized Non-IID Data” accepted to IJCNN 2023, presented in Queensland, Australia, and received a $5500 travel grant!
April 2023: Presented at the SIAM International Conference on Data Mining (SDM’23) Doctoral Forum; awarded NSF $1400 travel grant.
July 2022: “Privacy Enhancement for Cloud-Based Few-Shot Learning” accepted to IJCNN 2022!
Jan 2022: “DP-Shield: Face Obfuscation with Differential Privacy” accepted to EDBT 2022!

Recent Publications

DanceMosaic: High-Fidelity Dance Generation with Multimodal Editability

arXiv 2025

Foram Shah*, Parshwa Shah*, Muhammad Usama Saleem , Ekkasit Pinyoanuntapong, Pu Wang, Hongfei Xue, Ahmed Helmy

*Equal Contribution

DanceMosaic is a novel multimodal masked motion framework—fusing text, music, and pose adapters via progressive generative masking with inference-time optimization for precise, editable motion.

PDF Webpage

MaskHand: Generative Masked Modeling for Robust Hand Mesh Reconstruction in the Wild

ICCV 2025

Muhammad Usama Saleem , Ekkasit Pinyoanuntapong, Mayur Jagdishbhai Patel, Hongfei Xue, Ahmed Helmy, Srijan Das, Pu Wang

MaskHand is a probabilistic masked modeling framework—tokenizing articulations with VQ-MANO and using a context-aware masked transformer to fuse multi-scale image features and 2D cues for iterative, confidence-guided sampling.

PDF Webpage

ControlMM: Controllable Masked Motion Generation

ICCV 2025

Ekkasit Pinyoanuntapong, Muhammad Usama Saleem , Korrawe Karunratanakul, Pu Wang, Hongfei Xue, Chen Chen, Chuan Guo, Junli Cao, Jian Ren, Sergey Tulyakov

ControlMM is a novel masked generative motion model—combining masked consistency and inference-time logit editing in a parallel decoder for fast, high-fidelity, spatially precise motion generation.

PDF Webpage Code

GenHMR: Generative Human Mesh Recovery

AAAI 2025

Muhammad Usama Saleem , Ekkasit Pinyoanuntapong, Pu Wang, Hongfei Xue, Srijan Das, Chen Chen

GenHMR reframes monocular HMR as an image-conditioned generative task—employing a VQ-VAE pose tokenizer and masked transformer to model 2D→3D uncertainty, iteratively sampling high-confidence tokens and refining them with 2D cues for accurate mesh recovery.

PDF Webpage

BioPose: Biomechanically-accurate 3D Pose Estimation from Monocular Videos

WACV 2025

Muhammad Usama Saleem* , Farnoosh Koleini*, Pu Wang, Hongfei Xue, Ahmed Helmy, Abbey Fenwick

BioPose is a novel biomechanics-guided 3D pose estimation framework—combining a multi-query deformable transformer for precise mesh recovery, a neural IK network that enforces anatomical constraints, and 2D-informed iterative pose refinement.

PDF Webpage

BAMM: Bidirectional Autoregressive Motion Model

ECCV 2024

Ekkasit Pinyoanuntapong, Muhammad Usama Saleem , Pu Wang, Minwoo Lee, Srijan Das, Chen Chen

BAMM is a novel text-to-motion framework that employs a hybrid-masked self-attention transformer—merging generative masking with autoregression to handle dynamic sequence lengths and enable editable, high-quality motion.

PDF Webpage Code

Private Data Synthesis from Decentralized Non-IID Data

IJCNN 2023

Muhammad Usama Saleem , L. Fan

DPFedProxGAN is a federated, differentially-private GAN that generates realistic synthetic images from non-IID distributed data using local DP and FedProx optimization.

PDF Supplementary Webpage

Privacy Enhancement for Cloud-Based Few-Shot Learning

IJCNN 2022

A. Parnami, Muhammad Usama Saleem , L. Fan, M. Lee

A novel few-shot framework uses a joint privacy–classification loss to learn embeddings that protect image data while maintaining high few-shot accuracy in cloud-based vision.

PDF

DP-Shield: Face Obfuscation with Differential Privacy

EDBT 2022

Muhammad Usama Saleem , D. Reilly, L. Fan

DP-Shield safeguards against unauthorized face recognition by applying differential privacy–based obfuscation and providing image quality and recognition-risk metrics.

PDF Webpage Code