Please use this identifier to cite or link to this item: http://dx.doi.org/10.14279/depositonce-10860
For citation please use:
Main Title: SD-RSIC: Summarization-Driven Deep Remote Sensing Image Captioning
Author(s): Sumbul, Gencer
Nayak, Sonali
Demir, Begüm
Type: Article
Language Code: en
Abstract: Deep neural networks (DNNs) have been recently found popular for image captioning problems in remote sensing (RS). Existing DNN-based approaches rely on the availability of a training set made up of a high number of RS images with their captions. However, captions of training images may contain redundant information (they can be repetitive or semantically similar to each other), resulting in information deficiency while learning a mapping from the image domain to the language domain. To overcome this limitation, in this article, we present a novel summarization-driven RS image captioning (SD-RSIC) approach. The proposed approach consists of three main steps. The first step obtains the standard image captions by jointly exploiting convolutional neural networks (CNNs) with long short-term memory (LSTM) networks. The second step, unlike the existing RS image captioning methods, summarizes the ground-truth captions of each training image into a single caption by exploiting sequence to sequence neural networks and eliminates the redundancy present in the training set. The third step automatically defines the adaptive weights associated with each RS image to combine the standard captions with the summarized captions based on the semantic content of the image. This is achieved by a novel adaptive weighting strategy defined in the context of LSTM networks. Experimental results obtained on the RSCID, UCM-Captions, and Sydney-Captions data sets show the effectiveness of the proposed approach compared with the state-of-the-art RS image captioning approaches. The code of the proposed approach is publicly available at https://gitlab.tubit.tu-berlin.de/rsim/SD-RSIC.
URI: https://depositonce.tu-berlin.de/handle/11303/11978
http://dx.doi.org/10.14279/depositonce-10860
Issue Date: 26-Oct-2020
Date Available: 16-Nov-2020
DDC Class: 006 Spezielle Computerverfahren
Subject(s): caption summarization
deep learning
image captioning
remote sensing
RS
Sponsor/Funder: EC/H2020/759764/EU/Accurate and Scalable Processing of Big Data in Earth Observation/BigEarth
License: http://rightsstatements.org/vocab/InC/1.0/
Journal Title: IEEE Transactions on Geoscience and Remote Sensing
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Publisher Place: New York, NY
Publisher DOI: 10.1109/TGRS.2020.3031111
EISSN: 1558-0644
ISSN: 0196-2892
Appears in Collections:FG Remote Sensing Image Analysis Group » Publications

Files in This Item:
sumbul_etal_2020.pdf

Accepted manuscript

Format: Adobe PDF | Size: 3.53 MB
DownloadShow Preview
Thumbnail

Item Export Bar

Items in DepositOnce are protected by copyright, with all rights reserved, unless otherwise indicated.