Biography

Roni Sengupta

Roni Sengupta, Assistant Professor, Department of Computer Science

I lead the Spatial & Physical Intelligence (SPIN) Lab at UNC Chapel Hill. My research lies at the intersection of Computer Vision and Computer Graphics, mainly centered around 3D Vision and Computational Photography. My lab is particularly interested in developing AI techniques for solving Inverse Graphics & Inverse Physics problems, where the goal is to decompose images and videos into its' intrinsic components -- geometry, motion, material reflectance, material propereties, and lighting. We solve Inverse Graphics & Physics problems to advance various applications in visual content creation and editing, telepresence, AR/VR, robotics, and healthcare.

Prior to this, I was a Postdoctoral Research Associate in Computer Science & Engineering at University of Washington, working with Prof. Steve Seitz, Prof. Brian Curless and Prof. Ira Kemelmacher-Shlizerman in the UW Reality Lab and GRAIL (2019-2022). I completed my Ph.D. (2013 - 2019) from University of Maryland - College Park (UMD), advised by Prof. David Jacobs and my undergraduate degree (2009-2013) in Electronics and Tele-Communication Engineering from Jadavpur University, Kolkata, India. I also had the pleasure to spend time and work with many amazing researchers from NVIDIA Research, Snapchat Research, The Weizmann Institute of Science (Israel), and TU Dortmund (Germany).

Email: ronisen at cs.unc.edu
Twitter:

Google Scholar
Office: Sitterson Hall 255, University of North Carolina at Chapel Hill

Awards and Honors

  • NIH NIBIB Trailblazer Award for New and Early Stage Investigators - 2024
  • UNC Junior Faculty Development Award - 2024
  • UNC CS Student Association Excellence in Teaching Award - 2023
  • CVPR Best Student Paper Honorable Mentions - 2021

Research

Overall

Our research lies at the intersection of Computer Vision, Computer Graphics, and Machine Learning, with a focus on 3D vision and computational image editing. We aim to understand and manipulate the physical and semantic structure of visual data, developing algorithms that enable richer 3D perception, interpretable scene decomposition, and high-quality visual synthesis. Currently, our lab focuses on four synergistic research themes: inverse rendering, 3D perception from endoscopy, inverse physics, and generative facial editing.

Inverse Rendering

Inverse Rendering

Inverse rendering seeks to recover the physical properties of a scene—such as geometry, material reflectance, and lighting—from images or videos. Our work develops both explicit estimation methods using neural networks and implicit understanding frameworks that enable intuitive manipulation of disentangled scene components. We leverage foundation models and generative priors to build robust, generalizable inverse rendering techniques.

Relevant Publications:

3D Perception from Endoscopy

Endoscopy

3D perception in endoscopy unlocks critical applications in medical imaging, including automated measurement of organ geometry, enhanced visualization for diagnosis, and automatic guidance for robotic surgery. However, the task is extremely challenging due to complex lighting effects—such as near-field illumination, global light transport, specular highlights, and subsurface scattering. Our research focuses on monocular depth estimation and SLAM methods by explicitly modeling light propagation, in close collaboration with experts in medical imaging, robotics, gastroenterology, and otolaryngology.

Relevant Publications:

Inverse Physics

physics

Inverse physics involves recovering an object’s 3D geometry and physical properties—such as material stiffness or initial force conditions—from sparse or single-view videos. This task is highly ill-posed and requires careful optimization. Our research addresses these challenges by designing effective optimization techniques and learning-based priors that improve inverse estimation under limited observations.

Relevant Publications:

Generative Facial Editing

Generative Facial Editing

We explore generative model-based approaches to high-quality facial attribute editing, including aging/de-aging, relighting, harmonization, and identity-preserving modifications for visual effects and creative applications. Our focus is on developing personalized, training-free methods that resolve the long-standing trade-off between inversion accuracy and editability in generative image editing frameworks.

Relevant Publications:


We are grateful for the generous support from our partners and sponsors. Their contributions are vital to the success of our research.

NIH NIBIB Lenovo NIH NIMDS

Publications

Pre-prints

-->

The Aging Multiverse: Generating Condition-Aware Facial Aging Tree via Training-Free Diffusion
Bang Gong*, Luchao Qi*, Jiaye Wu, Zhicheng Fu, Chunbo Song, David Jacobs, John Nicholson, Roni Sengupta
arXiv 2025
[Paper][Project Page][Code (coming soon)]

Aging Multiverse uses a training-free diffusion model to generate diverse, realistic aging paths from a single face, conditioned on health, lifestyle, and environment.

ProJo4D: Progressive Joint Optimization for Sparse-View Inverse Physics Estimation
Daniel Rho, Jun Myeong Choi, Biswadip Dey, Roni Sengupta
arXiv 2025
[Paper][Project Page][Code (coming soon)]

ProJo4D recovers 3D shape and physical behavior of deformable objects from sparse-view inputs using a progressive joint-optimization framework for inverse physics estimation.

TalkingHeadBench: A Multi-Modal Benchmark & Analysis of Talking-Head DeepFake Detection
Xinqi (Ana) Xiong*, Prakrut Patel*, Qingyuan Fan*, Amisha Wadhwa*, Sarathy Selvam, Xiao Guo, Luchao Qi, Xiaoming Liu, Roni Sengupta
arXiv 2025
[Paper][Project Page][Dataset & Benchmark (HuggingFace)]

PPS-Ctrl is an image translation framework that fuses Stable Diffusion and ControlNet, guided by a physics-informed Per-Pixel Shading map for realistic and structure-preserving endoscopy image translation.

NFL-BA: Improving Endoscopic SLAM with Near-Field Light Bundle Adjustment
Andrea Dunn Beltran*, Daniel Rho*, Marc Niethammer, Roni Sengupta
arXiv 2025
[Paper][Project Page][Code (Coming Soon)] [ Tweet ]

We introduce a novel Bundle Adjustment loss that uses lighting cues, i.e. points closer and facing the camera reflects more light, for improving pose and map estimation for dense visual SLAM algorithms in endoscopy (results on 3D Gaussian SLAMs).

GaNI: Global and Near Field Illumination Aware Neural Inverse Rendering
Jiaye Wu, Saeed Hadadan, Geng Lin, Matthias Zwicker, David Jacobs, Roni Sengupta
arXiv 2024
[Paper][Project Page][Code (Coming Soon)] [ Tweet ]

Global and Near-field Illumination-aware neural inverse rendering technique that can reconstruct geometry, albedo, and roughness parameters from images of a scene captured with co-located light and camera.

Published

MyTimeMachine: Personalized Facial Age Transformation
Luchao Qi, Jiaye Wu, Bang Gong, Annie N. Wang, David Jacobs, Roni Sengupta
To appear SIGGRAPH 2025 (Journal Track),
ACM Transactions on Graphics

[Paper][Project Page][Code (Coming Soon)] [ Tweet ]

We personalize a pre-trained global aging prior using 50 personal selfies, allowing age regression (de-aging) and age progression (aging) with high fidelity and identity preservation.

ScribbleLight: Single Image Indoor Relighting with Scribbles
Jun Myeong Choi, Annie N. Wang, Pieter Peers, Anand Bhattad, Roni Sengupta
CVPR 2025
[Paper][Project Page][Code] [ Tweet ]

ScribbleLight is a generative model that supports local fine-grained control of lighting effects through scribbles that describe changes in lighting.

My3DGen: A Scalable Personalized 3D Generative Model
Luchao Qi, Jiaye Wu, Annie Wang, Shengze Wang, Roni Sengupta
WACV 2025 (Oral)
[Paper][Project Page][Code][ Tweet ]

We propose a parameter efficient approach for building personalized 3D generative priors by updating only 0.6 million parameters compared to a full finetuning of 31 million parameters. Personalized 3D generative priors can reconstruct any test image and synthesize novel 3D images of an individual without any test-time optimization or finetuning.

Continual Learning of Personalized Generative Face Models with Experience Replay
Annie Wang, Luchao Qi, Roni Sengupta
WACV 205 [Paper][Project Page]

We introduce a continual learning problem of updating personalized 2D and 3D generative face models without forgetting past representations as new photos are regularly captured.

Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos
Akshay Paruchuri, Samuel Ehrenstein, Shuxian Wang, Inbar Fried, Stephen M. Pizer, Marc Niethammer, Roni Sengupta
ECCV 2024
[Paper][Project Page][Code ] [ Tweet ]

We model near-field lighting, emitted by the endoscope and reflected by the surface, as Per-Pixel Shading (PPS). We use PPS features to perform depth refinement (PPSNet) on clinical endoscopy videos with transfer learning of foundation models with self-supervision.

Personalized Video Relighting With an At-Home Light Stage
Jun Myeong Choi, Max Christman, Roni Sengupta
ECCV 2024
[Paper][Project Page][Code (Coming Soon)][ Tweet ]

We show how to build HQ face relighting model by recording a person watching YouTube videos on their monitor (at-home Light Stage) instead of expensive data capture with a Light Stage.

NePhi: Neural Deformation Fields for Approximately Diffeomorphic Medical Image Registration
Lin Tian, Hastings Greer, Raúl San José Estépar, Roni Sengupta, Marc Niethammer
ECCV 2024
[Paper][Project Page][Code ]

NePhi produces neural deformation field for medical image registration with less memory consumption compared to existing voxel-based deformations, unlocking the capability of applying image registration approaches on high-resolution images.

Structure-preserving Image Translation for Depth Estimation in Colonoscopy
Shuxian Wang, Akshay Paruchuri, Zhaoxi Zhang, Sarah K McGill, Roni Sengupta
MICCAI 2024 (Oral)
[Paper][Project Page + Dataset]

Monocular Depth Estimators suffer Sim2Real gap. We use GAN + structure preserving loss for sim2real transfer producing SOTA depths on clinical data.

Building Secure and Engaging Video Communication by Using Monitor Illumination
Jun Myeong Choi, Johnathan Leung, Noah Frahm, Max Christman, Gedas Bertasius, Roni Sengupta
CVPR 2024 Workshop on Multimedia Forensics
[Paper]

We use light reflected from the monitor to detect if a person in a video call is real/live (on) or deepfake (off).

Bringing Telepresence to Every Desk
Shengze Wang, Ziheng Wang, Ryan Schmelzle, Liujie Zheng, Youngjoong Kwon, Roni Sengupta, Henry Fuchs
to appear Transactions on Visualization and Computer Graphics, 2024
[Paper][Project Page]

We introduce a novel system that can render high-quality novel views from 4 RGBD camera focused on a tele-conferencing setup. We introduce a novel multiview point cloud rendering algorithm.

Universal Guidance for Diffusion Models
Arpit Bansal, Hong-Min Chu, Avi Schwarzschild, Roni Sengupta, Micah Goldblum, Jonas Geiping, Tom Goldstein
ICLR 2024
[Paper][Project Page + Code]

Enables controlling diffusion models by arbitrary guidance modalities without the need to retrain any use-specific components.

Motion Matters: Neural Motion Transfer for Better Camera Physiological Sensing
Akshay Paruchuri, Xin Liu, Yulu Pan, Shwetak Patel, Daniel McDuff*, Roni Sengupta*
WACV 2024 (Oral: 2.5% acceptance rate)
[Paper][Project Page][Code] [ Tweet ]

Neural Motion Transfer serves as an effective data augmentation technique for PPG signal estimation from facial videos. We devise the best strategy to augment publicly available datasets with motion augmentation, improving up to 75% over SOTA techniques on five benchmark datasets.

Joint Depth Prediction and Semantic Segmentation with Multi-View SAM
Mykhailo Shvets, Dongxu Zhao, Marc Niethammer, Roni Sengupta, Alexander C. Berg
WACV 2024
[Paper]

Generalized semantic features from Segment Anything (SAM) model help to build a richer cost volume for MVS. In turn, the depth predicted from the cost volume serves as a rich prompt for improving semantic segmentation.

rPPG-Toolbox: Deep Remote PPG Toolbox
Xin Liu, Akshay Paruchuri, Girish Narayanswamy, Xiaoyu Zhang, Jiankai Tang, Yuzhe Zhang, Yunato Wang, Roni Sengupta, Shwetak Patel, Daniel McDuff
NeurIPS 2023
Datasets and Benchmarks Track
[PDF][arXiv][Code]

We present a comprehensive toolbox, rPPG-Toolbox, that contains unsupervised and supervised rPPG models with support for public benchmark datasets, data augmentation, and systematic evaluation.

MVPSNet: Fast Generalizable Multi-view Photometric Stereo
Dongxu Zhao, Daniel Lichy, Pierre-Nicolas Perrin, Jan-Michael Frahm, Roni Sengupta
ICCV 2023
[Paper][Project Page][Code] [ Tweet ]

We propose generalized approach to multi-view photometric stereo that is significantly better than only multi-view stereo. It produces same reconstruction quality while being 400x faster than per-scene optimization techniques.

Measured Albedo in the Wild: Filling the Gap in Intrinsics Evaluation
Jiaye Wu, Sanjoy Chowdhury, Hariharmano Shanmugaraja, David Jacobs, Roni Sengupta
ICCP (International Conference on Computational Photography) 2023
[Paper][Project Page][Code & Dataset(Coming Soon)] [ Tweet ]

Existing benchmark (WHDR metric on IIW) for evaluating Intrinsic Image decomposition in the wild are often incomplete as it relies on pair-wise relative human judgements. In order to comprehensively evaluate albedo, we collect a new dataset, Measured Albedo in the Wild (MAW), and propose three new metrics that complement WHDR: intensity, chromaticity and texture metrics. We show that SOTA inverse rendering and intrinsic image decomposition algorithms overfit on WHDR metric and our proposed MAW benchmark can properly evaluate these algorithms that match their visual quality.

A Surface-normal Based Neural Framework for Colonoscopy Reconstruction
Shuxian Wang, Yubo Zhang, Sarah K McGill, Julian G Rosenman, Jan-Michael Frahm, Roni Sengupta, Stephen M Pizer
International Conference on Image Processing and Machine Intelligence (IPMI 2023)
[Paper]

Using SLAM + near-field Photometric Stereo for 3D colon reconstruction from colonoscopy videos.

Towards Unified Keyframe Propagation Models
Patrick Esser, Peter Michael, Roni Sengupta
(CVPRW 2022 - AI for Content Creation Workshop)
[Paper][Project Page][Code]

We present a two-stream approach for video in-painting, where high-frequency features interact locally and low-frequency features interact globally via attention mechanism.

Real-Time Light-Weight Near-Field Photometric Stereo
Daniel Lichy, Roni Sengupta, David Jacobs
(CVPR 2022)
[Paper][Project Page][Code]

Near-field Photometric Stereo technique is useful for 3D imaging of large objects. We capture multiple images of an object by moving a flashlight and reconstruct the 3D mesh. Our method is significnatly faster and memory-efficient while producing better quality than SOTA methods.

Robust High-Resolution Video Matting with Temporal Guidance
Peter Lin, Linjie Yang, Imran Saleemi, Roni Sengupta
(WACV 2022)
[Paper][Project Page][Code]

Background Removal a.k.a Alpha matting on videos by exploiting temporal information with a recurrent architecture. Does not require capturing background image or manual annotations.

A Light Stage on Every Desk
Roni Sengupta, Brian Curless, Ira Kemelmacher-Shlizerman, Steve Seitz
(ICCV 2021)
[Paper][Project Page]

We learn a personalized relighting model by capturing a person watching YouTube videos. Potential application includes relighting during a zoom call.

Shape and Material Capture at Home
Daniel Lichy, Jiaye Wu, Roni Sengupta, David Jacobs
(CVPR 2021)
[Paper][Project Page][Code]

High-quality Photometric Stereo can be achieved with a simple flashlight. Recovers hi-res geometry and reflectance by progressively refining the predictions at each scale, conditioned on the prediction at previous scale.

Real-Time High Resolution Background Matting
Peter Lin*, Andrey Ryabtsev*, Roni Sengupta, Brian Curless, Steve Seitz, Ira Kemelmacher-Shlizerman
(CVPR 2021 ORAL)(Best Paper Candidate 32/1600+ accepted papers)(Best Student Paper Honorable Mentions)
[Paper][Project Page][Code]

Background replacement at 30fps on 4K and 60fps on HD. Alpha matte is first extracted at low-res and then selectively refined with patches.

Lifespan Age Transformation Synthesis
Roy Or-El, Roni Sengupta, Ohad Fried, Eli Shechtman, Ira Kemelmacher-Shlizerman
(ECCV 2020)
[Paper][Project Page][Code]

Age transformation from 0-70. Continuous aging is modeled by assuming 10 anchor age classes with interpolation in the latent space between them.

Background Matting: The World is Your Green Screen
Roni Sengupta, Vivek Jayaram, Brian Curless, Steve Seitz, Ira Kemelmacher-Shlizerman
(CVPR 2020)
[Paper][Project Page][Code][Two Minute Papers Video][Microsoft AI using our code][CEO of Microsoft Satya Nadella talks about our work]

By simply capturing an additional image of the background, alpha matte can be extracted easily without requiring extensive human annotation in form of trimap.

Neural Inverse Rendering of an Indoor Scene from a Single Image
Roni Sengupta, Jinwei Gu, Kihwan Kim, Guilin Liu, David Jacobs, Jan Kautz
(ICCV 2019)
[Paper][Project Page]

Self-supervision on real data is achieved with a Residual Appearnace Renderer network. It can cast shadows, add inter-reflections and near-field lighting, given the normal and albedo of the scene.

SfSNet : Learning Shape, Reflectance and Illuminance of Faces in the Wild
Roni Sengupta, Angjoo Kanazawa, Carlos D. Castillo, David Jacobs.
CVPR 2018 [Spotlight].
[Paper] [Project Page / Code][Download PyTorch Code]

Decomposes an unconstrained human face into surface normal, albedo and spherical harmonics lighting. Learns from synthetic 3DMM followed by self-supervised finetuning on unlabelled real images.

Roni Sengupta, Daniel Lichy, Angjoo Kanazawa, Carlos D. Castillo, David Jacobs.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020.
[Paper]

Introduces SfSMesh that utilizes the surface normal predicted by SfSNet to reconstruct a 3D face mesh.

A New Rank Constraint on Multi-view Fundamental Matrices, and its Application to Camera Location Recovery
Roni Sengupta, Tal Amir, Meirav Galun, Amit Singer, Tom Goldstein, David Jacobs, Ronen Basri.
CVPR 2017 [Spotlight].
[Paper] [Code]

We prove that a matrix formed by stacking fundamental matrices between pairs of images has rank 6. We then introduce a non-linear optimization algorithm based on ADMM, that can better estimate the camera parameters using this rank constraint. This improves Structure-from-Motion algorithms which require initial camera estimation (bundle adjustment).

Solving Uncalibrated Photometric Stereo Using Fewer Images by Jointly Optimizing Low-rank Matrix Completion and Integrability
Roni Sengupta, Hao Zhou, Walter Forkel, Ronen Basri, Tom Goldstein, David Jacobs.
Journal of Mathematical Imaging and Vision (JMIV), 2018.
[Paper]

We solve uncalibrated Photometric Stereo using as few as 4-6 images as a rank-constrained non-linear optimization with ADMM.

Frontal to Profile Face Verification in the Wild
Roni Sengupta, Jun-Cheng Chen, Carlos D. Castillo, Vishal M. Patel, Rama Chellappa, David Jacobs.
WACV 2016.
[Project Page] , [Paper]

We introduce a dataset of frontal vs profile face verfication in the wild -- CFP. We show that SOTA face verification algorithms degrade about 10% on frontal-profile verification compared to frontal-frontal. Our dataset has been widely used to improve face verification across poses, but also for face warping and pose synthesis with GAN.

A Frequency Domain Approach to Silhouette Based Gait Recognition
Roni Sengupta, Udit Halder, Rameshwar Panda, Ananda S Chowdhury.
NCVPRIPG 2013.
[Paper]

Lab Pictures

Lab Picture 1 Lab Picture 2 Lab Picture 3 Lab Picture 4
Fall 2023 CVPR Deadline Day 2024 Thanksgiving 2024 Spring - End Sem 2025

Teaching

Team

PhD students

Luchao Qi Jun Myeong Choi Daniel Rho Ana Xiong Noah Frahm
Luchao Qi Jun Myeong Choi Daniel Rho Ana Xiong Noah Frahm
4th year 4th year 2nd year 2nd year 2nd year

MS students

Andrea Dunn Beltran Prakrut Patel
Andrea Dunn Beltran Prakrut Patel

Undergraduate students

Amisha Wadhwa
Amisha Wadhwa

Alumni

Former PhD Students

Former MS/BS Students

  • Annie Wang (MS), now at Databricks
  • Bang Gong (BS-MS)
  • Pierre-Nicolas Perrin (BS-MS), now at Capitol One
  • Yulu Pan (BS), now MS at UNC
  • Max Christman (BS), now MS at UNC
  • Andrey Ryabstev (BS-MS)(University of Washington), now at Google
  • Peter Lin (BS-MS)(University of Washington), now at ByteDance
  • Jackson Stokes (BS-MS)(University of Washington), now at Google
  • Peter Michael (BS-MS)(University of Washington), now PhD at Cornell University