Detecting violent content in Hollywood movies by mid-level audio representations

Acar, EsraHopfgartner, FrankAlyabrak, Sahin2018-04-172018-04-172013978-1-4799-0956-81949-3991https://depositonce.tu-berlin.de/handle/11303/7609http://dx.doi.org/10.14279/depositonce-6799Movie violent content detection e.g., for providing automated youth protection services is a valuable video content analysis functionality. Choosing discriminative features for the representation of video segments is a key issue in designing violence detection algorithms. In this paper, we employ mid-level audio features which are based on a Bag-of-Audio Words (BoAW) method using Mel-Frequency Cepstral Coefficients (MFCC). BoAW representations are constructed with two different meth- ods, namely the vector quantization-based (VQ-based) method and the sparse coding-based (SC-based) method. We choose two- class support vector machines (SVMs) for classifying video shots as (non-)violent. Our experimental results on detecting violent video shots in Hollywood movies show that the mid-level audio features provide promising results. Additionally, we establish that the SC-based method outperforms the VQ-based one. More importantly, the SC-based method outperforms the unimodal submissions in the MediaEval Violent Scenes Detection (VSD) task except one visual-based method in terms of average precision.en000 Informatik, Informationswissenschaft, allgemeine Werkevideo codingfeature extractionmel frequency cepstral coefficientvisualizationsupport vector machinesdictionariestrainingDetecting violent content in Hollywood movies by mid-level audio representationsConference Object