A Survey of Multimodal Controllable Diffusion Models
-
Abstract
Diffusion models have recently emerged as powerful generative models, producing high-fidelity samples across domains. Despite this, they have two key challenges, including improving the time-consuming iterative generation process and controlling and steering the generation process. Existing surveys provide broad overviews of diffusion model advancements. However, they lack comprehensive coverage specifically centered on techniques for controllable generation. This survey seeks to address this gap by providing a comprehensive and coherent review on controllable generation in diffusion models. We provide a detailed taxonomy defining controlled generation for diffusion models. Controllable generation is categorized based on the formulation, methodologies, and evaluation metrics. By enumerating the range of methods researchers have developed for enhanced control, we aim to establish controllable diffusion generation as a distinct subfield warranting dedicated focus. With this survey, we contextualize recent results, provide the dedicated treatment of controllable diffusion model generation, and outline limitations and future directions. To demonstrate applicability, we highlight controllable diffusion techniques for major computer vision tasks application. By consolidating methods and applications for controllable diffusion models, we hope to catalyze further innovations in reliable and scalable controllable generation.
-
-