
Architecture of (a) Image Space Diffusion Model (DM), (b) Latent Diffusion Model (LDM), and (c) Cascade Diffusion Model (CDM).
Figures of the Article
-
Overview of multimodal controllable diffusion models.
-
Diffusion models alter the data by adding noise to it, and then generate new data from the noise through the inverse process. In the reverse process, each denoising step requires estimating the transition kernel.
-
Architecture of (a) Image Space Diffusion Model (DM), (b) Latent Diffusion Model (LDM), and (c) Cascade Diffusion Model (CDM).
-
Example of semantic control, spatial control, ID control, and style control.
-
Example of trade-offs control. (a) Fidelity-diversity trade-off. (b) Faithfulness-realism trade-off. (c) Speed-fidelity trade-off.
-
Image restoration results from RePaint[146]. Restoration type: (a) deblur, (b) super-resolution, (c) inpainting, (d) colorization, (e) low-light image enhancement, (f) non-linear enhancement, and (g) multiple-guidance enhancement.
-
Class-to-image results from DiT[57]. Resolution: (a) 512, (b) 256, and (c) 64.
-
Text-to-Image results from [22, 83]. Condition: (a) text only, (b) text and single condition, and (c) text and multiple conditions.
-
Problems with video generation between consecutive frames. (a) ID loss. (b) Temporal inconsistency.
-
Text-to-3D results from Dream3D[215]. Text: (a) an orangutan making a clay bowl on a throwing wheel, (b) a bulldozer clearing away a pile of snow, (c) a corgi taking a selfie, (d) a raccoon astronaut holding his helmet, (e) a table with dim sum on it, and (f) a jay standing on a basket of macarons.
-
Methods for personalization. (a) Textual inversion[111]. (b) Dreambooth[112]. (c) LoRA[113]. (d) HyperNetwork[164]. Wemb: learnable text encoding. W: model parameters. Wi: lora parameters for layer i. Whyper: hypernetwork parameters.
-
Personalization results of single concept object from (a) DreamBooth[112], (b) single concept style from Custom Diffusion[25], and (c) multi-concepts from Mix of show[167].
Others
-
External link to attachment
https://rdcu.be/dODHh -
DOCX format
Chinese Information 37KB -
PDF format
2024-3-2-3814-Highlights 152KB -
Compressed file
2024-3-2-3814-Highlights 540KB