Violence detection in Hollywood movies by the fusion of visual and mid-level audio cues

Acar, Esra; Hopfgartner, Frank; Albayrak, Sahin

Violence detection in Hollywood movies by the fusion of visual and mid-level audio cues

Acar, Esra; Hopfgartner, Frank; Albayrak, Sahin

Detecting violent scenes in movies is an important video content understanding functionality e.g., for providing automated youth pro- tection services. One key issue in designing algorithms for violence detection is the choice of discriminative features. In this paper, we employ mid-level audio features and compare their discriminative power against low-level audio and visual features. We fuse these mid-level audio cues with low-level visual ones at the decision level in order to further improve the performance of violence detection. We use Mel-Frequency Cepstral Coefficients (MFCC) as audio and average motion as visual features. In order to learn a violence model, we choose two-class support vector machines (SVMs). Our experimental results on detecting violent video shots in Hollywood movies show that mid-level audio features are more discriminative and provide more precise results than low-level ones. The detection performance is further enhanced by fusing the mid-level audio cues with low-level visual ones using an SVM-based decision fusion.

algorithms performance experimentation bag-of-audio-words mel-frequency cepstral coefficients,motion decision fusion support vector machine

2013_acar_etal.pdf

Adobe PDF — 1010.28 KB

Published in: Proceedings of the 21st ACM international conference on Multimedia - MM ’13, 10.1145/2502081.2502187, ACM

Full item page

🆕 Date Issued:	2013
🗄 In DepositOnce:	2018-04-17

Violence detection in Hollywood movies by the fusion of visual and mid-level audio cues

Acar, Esra; Hopfgartner, Frank; Albayrak, Sahin

FG Agententechnologien in betrieblichen Anwendungen und der Telekommunikation (AOT)

2013_acar_etal.pdf