Continual Visual Concept Learning with Scene Graph Expansion
-
Abstract
In this work, we tackle two key challenges in lifelong visual concept learning: catastrophic forgetting and semantic misalignment. Existing methods typically incorporate new visual concepts by updating the parameters of vision-language models (VLMs), but fail to capture high-level semantic knowledge (e.g., object layout) associated with those concepts. To address this issue, this paper introduces a novel framework, which simultaneously updates the parameters of VLM and expands the scene graph when processing images of new concepts. The VLM is built upon a novel Mixture-of-Experts (MoE) architecture designed to alleviate catastrophic forgetting by minimizing unnecessary interference among experts. During inference, by leveraging the expanded scene graph, we streamline prompt design while ensuring the generated images accurately reflect the scene layout derived from the graph. Extensive experimental results demonstrate that our approach significantly improves both generative diversity and text/image alignment (TA/IA). Our code is available at https://github.com/learninginvision/Lifelong-SG2I.
-
-