Violence detection in Hollywood movies by the fusion of visual and mid-level audio cues
dc.contributor.author | Acar, Esra | |
dc.contributor.author | Hopfgartner, Frank | |
dc.contributor.author | Albayrak, Sahin | |
dc.date.accessioned | 2018-04-17T08:41:35Z | |
dc.date.available | 2018-04-17T08:41:35Z | |
dc.date.issued | 2013 | |
dc.description.abstract | Detecting violent scenes in movies is an important video content understanding functionality e.g., for providing automated youth pro- tection services. One key issue in designing algorithms for violence detection is the choice of discriminative features. In this paper, we employ mid-level audio features and compare their discriminative power against low-level audio and visual features. We fuse these mid-level audio cues with low-level visual ones at the decision level in order to further improve the performance of violence detection. We use Mel-Frequency Cepstral Coefficients (MFCC) as audio and average motion as visual features. In order to learn a violence model, we choose two-class support vector machines (SVMs). Our experimental results on detecting violent video shots in Hollywood movies show that mid-level audio features are more discriminative and provide more precise results than low-level ones. The detection performance is further enhanced by fusing the mid-level audio cues with low-level visual ones using an SVM-based decision fusion. | en |
dc.identifier.isbn | 978-1-4503-2404-5 | |
dc.identifier.uri | https://depositonce.tu-berlin.de/handle/11303/7608 | |
dc.identifier.uri | http://dx.doi.org/10.14279/depositonce-6798 | |
dc.language.iso | en | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject.ddc | 000 Informatik, Informationswissenschaft, allgemeine Werke | de |
dc.subject.other | algorithms | en |
dc.subject.other | performance | en |
dc.subject.other | experimentation | en |
dc.subject.other | bag-of-audio-words | en |
dc.subject.other | mel-frequency cepstral coefficients, | en |
dc.subject.other | motion | en |
dc.subject.other | decision fusion | en |
dc.subject.other | support vector machine | en |
dc.title | Violence detection in Hollywood movies by the fusion of visual and mid-level audio cues | en |
dc.type | Conference Object | en |
dc.type.version | acceptedVersion | en |
dcterms.bibliographicCitation.doi | 10.1145/2502081.2502187 | en |
dcterms.bibliographicCitation.originalpublishername | ACM | en |
dcterms.bibliographicCitation.originalpublisherplace | New York, NY, USA | en |
dcterms.bibliographicCitation.pageend | 720 | en |
dcterms.bibliographicCitation.pagestart | 717 | en |
dcterms.bibliographicCitation.proceedingstitle | Proceedings of the 21st ACM international conference on Multimedia - MM ’13 | en |
dcterms.bibliographicCitation.volume | 2013 | en |
tub.accessrights.dnb | domain | en |
tub.affiliation | Fak. 4 Elektrotechnik und Informatik::Inst. Wirtschaftsinformatik und Quantitative Methoden::FG Agententechnologien in betrieblichen Anwendungen und der Telekommunikation (AOT) | de |
tub.affiliation.faculty | Fak. 4 Elektrotechnik und Informatik | de |
tub.affiliation.group | FG Agententechnologien in betrieblichen Anwendungen und der Telekommunikation (AOT) | de |
tub.affiliation.institute | Inst. Wirtschaftsinformatik und Quantitative Methoden | de |
tub.publisher.universityorinstitution | Technische Universität Berlin | en |