BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4//
BEGIN:VEVENT
UID:20250729T190220EDT-3114aTOuHH@132.216.98.100
DTSTAMP:20250729T230220Z
DESCRIPTION:LIVIA seminar \n\nSpeaker: Gnana Praveen Rajasekar\, Ph.D. cand
 idate at the LIVIA\n\nAbstract: Automatic emotion recognition (ER) has rec
 ently gained a lot of interest due to its potential in many real-world app
 lications. In this context\, multimodal approaches have been shown to impr
 ove performance (over unimodal approaches) by combining diverse and comple
 mentary sources of information\, providing some robustness to noisy and mi
 ssing modalities. We focus on dimensional ER based on the fusion of facial
  and vocal modalities extracted from videos\, where complementary audio-vi
 sual (A-V) relationships are explored to predict an individual's emotional
  states in valence-arousal space. Most state-of-the-art fusion techniques 
 rely on recurrent networks or conventional attention mechanisms that do no
 t effectively leverage the complementary nature of A-V modalities. To addr
 ess this problem\, we introduce a joint cross-attentional model for A-V fu
 sion that extracts the salient features across A-V modalities\, that allow
 s to effectively leverage the inter-modal relationships\, while retaining 
 the intra-modal relationships. In particular\, it computes the cross-atten
 tion weights based on correlation between the joint feature representation
  and that of the individual modalities. By deploying the joint A-V feature
  representation into the cross-attention module\, it helps to simultaneous
 ly leverage both the intra and inter modal relationships\, thereby signifi
 cantly improving the performance of the system over the vanilla cross-atte
 ntion module. The effectiveness of our proposed approach is validated expe
 rimentally on challenging videos from the RECOLA and AffWild2 datasets. Re
 sults indicate that our joint cross-attentional A-V fusion model provides 
 a cost-effective solution that can outperform state-of-the-art approaches\
 , even when the modalities are noisy or absent.\n\n \n\nhttps://arxiv.org/
 pdf/2209.09068.pdf\n
DTSTART:20221102T160000Z
DTEND:20221102T160000Z
LOCATION:CA\, ZOOM
SUMMARY:Joint Attention for Dimensional Emotion Recognition using Audio Vis
 ual Fusion
URL:/cim/channels/event/joint-attention-dimensional-em
 otion-recognition-using-audio-visual-fusion-351841
END:VEVENT
END:VCALENDAR