Advanced search
Add to list

Align MacridVAE : multimodal alignment for disentangled recommendations

Author
Organization
Abstract
Explaining why items are recommended to users is challenging, especially when these items are described by multimodal data. Most recommendation systems fail to leverage more than one modality, preferring textual or tabular data. In this work, a new model, Align MacridVAE, that considers the complementarity of visual and textual item descriptions for item recommendation is proposed. This model projects both modalities onto a shared latent space, and a dedicated loss function aligns the text and image of the same item. The aspects of the item are then jointly disentangled for both modalities at a macro level to learn interpretable categorical information about items and at a micro level to model user preferences on each of those categories. Experiments are conducted on six item recommendation datasets, and recommendation performance is compared against multiple baseline methods. The results demonstrate that our model increases recommendation accuracy by 18% in terms of NCDG on average in the studied datasets and allows us to visualise user preference by item aspect across modalities and the learned concept allocation (The code implementation is available at https:// github.com/igui/Align-MacridVAE).
Keywords
Multimodal Recommender System, Disentangled Representation Learning, Contrastive Learning

Citation

Please use this url to cite or link to this publication:

MLA
Avas, Ignacio, et al. “Align MacridVAE : Multimodal Alignment for Disentangled Recommendations.” ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, edited by Nazli Goharian et al., vol. 14608, Springer, 2024, pp. 73–89, doi:10.1007/978-3-031-56027-9_5.
APA
Avas, I., Allein, L., Laenen, K., & Moens, M.-F. (2024). Align MacridVAE : multimodal alignment for disentangled recommendations. In N. Goharian, N. Tonellotto, Y. He, A. Lipani, G. McDonald, C. Macdonald, & I. Ounis (Eds.), ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I (Vol. 14608, pp. 73–89). https://doi.org/10.1007/978-3-031-56027-9_5
Chicago author-date
Avas, Ignacio, Liesbeth Allein, Katrien Laenen, and Marie-Francine Moens. 2024. “Align MacridVAE : Multimodal Alignment for Disentangled Recommendations.” In ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, edited by Nazli Goharian, Nicola Tonellotto, Yulan He, Aldo Lipani, Graham McDonald, Craig Macdonald, and Iadh Ounis, 14608:73–89. Cham: Springer. https://doi.org/10.1007/978-3-031-56027-9_5.
Chicago author-date (all authors)
Avas, Ignacio, Liesbeth Allein, Katrien Laenen, and Marie-Francine Moens. 2024. “Align MacridVAE : Multimodal Alignment for Disentangled Recommendations.” In ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, ed by. Nazli Goharian, Nicola Tonellotto, Yulan He, Aldo Lipani, Graham McDonald, Craig Macdonald, and Iadh Ounis, 14608:73–89. Cham: Springer. doi:10.1007/978-3-031-56027-9_5.
Vancouver
1.
Avas I, Allein L, Laenen K, Moens M-F. Align MacridVAE : multimodal alignment for disentangled recommendations. In: Goharian N, Tonellotto N, He Y, Lipani A, McDonald G, Macdonald C, et al., editors. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I. Cham: Springer; 2024. p. 73–89.
IEEE
[1]
I. Avas, L. Allein, K. Laenen, and M.-F. Moens, “Align MacridVAE : multimodal alignment for disentangled recommendations,” in ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, Glasgow, UK, 2024, vol. 14608, pp. 73–89.
@inproceedings{01J5T06YMHV55KNNA6D0XRWNAK,
  abstract     = {{Explaining why items are recommended to users is challenging, especially when these items are described by multimodal data. Most recommendation systems fail to leverage more than one modality, preferring textual or tabular data. In this work, a new model, Align MacridVAE, that considers the complementarity of visual and textual item descriptions for item recommendation is proposed. This model projects both modalities onto a shared latent space, and a dedicated loss function aligns the text and image of the same item. The aspects of the item are then jointly disentangled for both modalities at a macro level to learn interpretable categorical information about items and at a micro level to model user preferences on each of those categories. Experiments are conducted on six item recommendation datasets, and recommendation performance is compared against multiple baseline methods. The results demonstrate that our model increases recommendation accuracy by 18% in terms of NCDG on average in the studied datasets and allows us to visualise user preference by item aspect across modalities and the learned concept allocation (The code implementation is available at https:// github.com/igui/Align-MacridVAE).}},
  author       = {{Avas, Ignacio and Allein, Liesbeth and Laenen, Katrien and Moens, Marie-Francine}},
  booktitle    = {{ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I}},
  editor       = {{Goharian, Nazli and Tonellotto, Nicola and He, Yulan and Lipani, Aldo and McDonald, Graham and Macdonald, Craig and Ounis, Iadh}},
  isbn         = {{9783031560262}},
  issn         = {{0302-9743}},
  keywords     = {{Multimodal Recommender System,Disentangled Representation Learning,Contrastive Learning}},
  language     = {{eng}},
  location     = {{Glasgow, UK}},
  pages        = {{73--89}},
  publisher    = {{Springer}},
  title        = {{Align MacridVAE : multimodal alignment for disentangled recommendations}},
  url          = {{http://doi.org/10.1007/978-3-031-56027-9_5}},
  volume       = {{14608}},
  year         = {{2024}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: