Statistical Guarantees of Deep Generative Models Involving Diverse Spaces: Generation Consistency and Robustness

Chakrabarty, Anish

Statistical Guarantees of Deep Generative Models Involving Diverse Spaces: Generation Consistency and Robustness

Files

Form 17-Anish Chakrabarty.pdf (391.95 KB)

Thesis-Anish Chakrabarty.pdf (23.46 MB)

Date

2026-02-04

Authors

Chakrabarty, Anish

Publisher

Indian Statistical Institute, Kolkata

Abstract

Generative modeling focuses on the task of producing new data samples that closely resemble those drawn from an original, unknown distribution. Despite being well-known in statistical estimation theory, the approach has gained substantial traction in recent years, driven by groundbreaking results in areas such as image synthesis, natural language generation, and network modeling. The complexity of modern-era data domains and the ensuing adaptations that suitable models must undergo have presented new challenges. These advances raise several fundamental questions, the first of which is: When do generative models accurately approximate the true data distribution? One may also ask: How well do these models perform under contaminated data? This work explores these questions through the lens of generative modeling frameworks that, by design, involve distinct data spaces. We focus on two major classes of such models that blend optimal transport and representation learning in their objectives: Wasserstein autoencoders (WAE) and Cycle-consistent cross-domain translators. WAE, on its way to regeneration, learns a latent code, which in turn aids the simulation of newer pseudo-random replicates. By providing statistical characterizations of the latent distribution and the transforms inducing a dimensionality reduction in the process, we present a detailed error analysis underlying WAEs. From a non-parametric density estimation perspective, we establish deterministic bounds on the latent and reconstruction errors that adapt to the intrinsic dimensions of input data. We also study the extent of distortion that WAE-generated samples suffer when learned using contaminated data. Key takeaways for practitioners from our analysis include specific architectural suggestions that foster near-perfect sampling. The framework developed thus far fittingly extends to unpaired cycle-consistent cross-domain models. We show that the sufficient conditions for successful data translation under Sobolev and H¨older-smooth distributions resemble those in the case of WAEs. Our analysis also suggests error upper bounds due to ill-posed transformations and validates the choice of divergences used in objectives. Finally, in search of a consolidated solution to the robustification problem, we present parallel formulations based on the Gromov-Wasserstein (GW) distance. Due to the equivalence of Gromov-Monge samplers (GW), following GW, and cross-domain translation models, including WAE and GWAE, this answers the second question. We study the robust recovery guarantees, concentration, and tractable computational properties of the newly introduced distance measures under diverse contamination scenarios. We substantiate all our findings based on real-world data in varying generative tasks.

Description

This thesis is under the supervision of Prof. Swagatam Das and Prof. Probal Chaudhuri

Keywords

Deep Generative Models, Robustness, Optimal Transport

Citation

182p.

URI

http://hdl.handle.net/10263/7646

Collections

Theses

Full item page

Statistical Guarantees of Deep Generative Models Involving Diverse Spaces: Generation Consistency and Robustness

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By