Uğur Güdükbay's Publications

Sorted by DateClassified by Publication TypeClassified by Research Category

Multimodal Assessment of Apparent Personality Using Feature Attention and Error Consistency Constraint

Süleyman Aslan, Uğur Güdükbay, and Hamdi Dibeklioğlu. Multimodal Assessment of Apparent Personality Using Feature Attention and Error Consistency Constraint. Image and Vision Computing, 110:Article no. 104163, 9 pages, June 2021.

Download

[PDF] [gzipped postscript] [postscript] [HTML] 

Abstract

Personality computing and affective computing, where the recognition of personality traits is essential, have gained increasing interest and attention in many research areas recently. We propose a novel approach to recognize the Big Five personality traits of people from videos. To this end, we use four different modalities, namely, ambient appearance (scene), facial appearance, voice, and transcribed speech. Through a specialized subnetwork for each of these modalities, our model learns reliable modality-specific representations and fuse them using an attention mechanism that re-weights each dimension of these representations to obtain an optimal combination of multimodal information. A novel loss function is employed to enforce the proposed model to give an equivalent importance for each of the personality traits to be estimated through a consistency constraint that keeps the trait-specific errors as close as possible. To further enhance the reliability of our model, we employ (pre-trained) state-of-the-art architectures (i.e., ResNet, VGGish, ELMo) as the backbones of the modality-specific subnetworks, which are complemented by multilayered Long Short-Term Memory networks to capture temporal dynamics. To minimize the computational complexity of multimodal optimization, we use two-stage modeling, where the modality-specific subnetworks are first trained individually, and the whole network is then fine-tuned to jointly model multimodal data. On the large scale ChaLearn First Impressions V2 challenge dataset, we evaluate the reliability of our model as well as investigating the informativeness of the considered modalities. Experimental results show the effectiveness of the proposed attention mechanism and the error consistency constraint. While the best performance is obtained using facial information among individual modalities, with the use of all four modalities, our model achieves a mean accuracy of 91.8%, improving the state of the art in automatic personality analysis.

BibTeX

@Article{Aslan2021,
	author	  =	{S{\"u}leyman Aslan and U{\^g}ur G{\"u}d{\"u}kbay and Hamdi Dibeklio{\^g}lu},
	title     = {Multimodal Assessment of Apparent Personality Using Feature Attention and Error Consistency Constraint},
	journal   = {Image and Vision Computing},
	volume    = {110},
	pages     = {Article no. 104163, 9 pages},
	articleno = {104163},
	numpages  = {9},
	year      = {2021},
	month     = {June},
	issn      = {0262-8856},
	doi       = {https://doi.org/10.1016/j.imavis.2021.104163},
	url       = {https://www.sciencedirect.com/science/article/pii/S0262885621000688},
	keywords  = {Deep learning, Apparent personality, Multimodal modeling, Information fusion, Feature attention, Error consistency},
	abstract  = {Personality computing and affective computing, where the recognition of personality traits
	             is essential, have gained increasing interest and attention in many research areas recently. 
	             We propose a novel approach to recognize the Big Five personality traits of people from videos. 
	             To this end, we use four different modalities, namely, ambient appearance (scene), facial 
	             appearance, voice, and transcribed speech. Through a specialized subnetwork for each of 
	             these modalities, our model learns reliable modality-specific representations and fuse 
	             them using an attention mechanism that re-weights each dimension of these representations
	             to obtain an optimal combination of multimodal information. A novel loss function is employed
	             to enforce the proposed model to give an equivalent importance for each of the personality 
	             traits to be estimated through a consistency constraint that keeps the trait-specific errors
	             as close as possible. To further enhance the reliability of our model, we employ (pre-trained)
	             state-of-the-art architectures (i.e., ResNet, VGGish, ELMo) as the backbones of the modality-specific
	             subnetworks, which are complemented by multilayered Long Short-Term Memory networks to capture 
	             temporal dynamics. To minimize the computational complexity of multimodal optimization, we use 
	             two-stage modeling, where the modality-specific subnetworks are first trained individually, 
	             and the whole network is then fine-tuned to jointly model multimodal data. On the large scale 
	             ChaLearn First Impressions V2 challenge dataset, we evaluate the reliability of our model 
	             as well as investigating the informativeness of the considered modalities. Experimental results
	             show the effectiveness of the proposed attention mechanism and the error consistency constraint. 
	             While the best performance is obtained using facial information among individual modalities, 
	             with the use of all four modalities, our model achieves a mean accuracy of 91.8%, improving 
	             the state of the art in automatic personality analysis.}
	bib2html_dl_pdf = {http://www.cs.bilkent.edu.tr/~gudukbay/publications/papers/journal_articles/Aslan_Et_Al_IMAVIS_2021.pdf},
	bib2html_pubtype = {Refereed Journal Articles},
	bib2html_rescat = {Computer Graphics}
}  

Generated by bib2html.pl (written by Patrick Riley ) on Sun Apr 21, 2024 11:32:41