Multimodal Interactive Network for Sequential Recommendation
-
Abstract
Building an effective sequential recommendation system is still a challenging task due to limited interactions among users and items. Recent work has shown the effectiveness of incorporating textual or visual information into sequential recommendation to alleviate the data sparse problem. The data sparse problem now is attracting a lot of attention in both industry and academic community. However, considering interactions among modalities on a sequential scenario is an interesting yet challenging task because of multimodal heterogeneity. In this paper, we introduce a novel recommendation approach of considering both textual and visual information, namely Multimodal Interactive Network (MIN). The advantage of MIN lies in designing a learning framework to leverage the interactions among modalities from both the item level and the sequence level for building an efficient system. Firstly, an item-wise interactive layer based on the encoder-decoder mechanism is utilized to model the item-level interactions among modalities to select the informative information. Secondly, a sequence interactive layer based on the attention strategy is designed to capture the sequence-level preference of each modality. MIN seamlessly incorporates interactions among modalities from both the item level and the sequence level for sequential recommendation. It is the first time that interactions in each modality have been explicitly discussed and utilized in sequential recommenders. Experimental results on four real-world datasets show that our approach can significantly outperform all the baselines in sequential recommendation task.
-
-