面向数字文旅的图像文本跨模态检索方法 Digital Cultural Travel-oriented Image and Text Cross-modal Retrieval Method期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

面向数字文旅的图像文本跨模态检索方法

引用本文：	高蕴梅.面向数字文旅的图像文本跨模态检索方法[J].情报资料工作,2022(1):71-80.

作者姓名：	高蕴梅

作者单位：	常熟理工学院图书馆

基金项目：	教育部人文社会科学研究青年基金项目“学术信息网络中的核心边缘结构测度研究”(项目编号:18YJC870011);江苏高校哲学社会科学研究项目“视频内容分析的360度教学评价研究”(项目编号:2020SJA1425);苏州市图书馆学会2021年重点项目“视频资源跨模态智慧检索研究”(项目编号:21-A-02);常熟理工学院高等教育研究项目“视频内容分析的360度教学评价”(项目编号:GJ1905)的研究成果之一。

摘要：	目的/意义]图像文本跨模态检索应用对最大化利用数字文旅资源具有重要意义。然而,数字文旅领域的图像文本跨模态检索方法面临长文本挑战、数据缺失、内存资源有限等问题。为此,我们提出了一种新的基于Transformers和MobileNet V3模型的数字文旅图像文本跨模态方法。方法/过程]首先,提出了基于自注意力机制的双层多组Transformers模型从标题、正文和评论等文本中学习具有互补性的文本特征;其次,设计了FastR-CNN和MobileNet V3模型学习图像局部细粒度特征;最后,提出了多元线性回归方法在共享子空间补全缺失数据。构建以图搜文和以文搜图的双向三元损失函数学习模型参数。结果/结论]在标准数据集Flickr30k、自建数据集CulTour-Sha和有数据缺失的数据集Flickr30k-1与CulTour-Sha-1上的大量实验结果表明,我们的方法在召回率、内存需求和计算速度等方面优于当前几种先进的跨模态检索方法。
关键词：	数字文旅跨模态检索深度学习特征双向三元组损失函数精细特征
Digital Cultural Travel-oriented Image and Text Cross-modal Retrieval Method

Gao Yunmei.Digital Cultural Travel-oriented Image and Text Cross-modal Retrieval Method[J].Information and Documentation Services,2022(1):71-80.

Authors:	Gao Yunmei

Institution:	(Library of Changshu Institute of Technology,Jiangsu,215500)

Abstract:	Purpose/significance]It is important for us to maximize use of digital culture and tourism resources by Image-TextCross Modal Retrieval(IT-CMR).Related methods used in the field of digital culture and tourism resources have the challenges of long text, limited memory and some missed data. In order to address those problems, we proposed a new method ofIT-CMR using Transformer and MobileNet V3 models for digital culture and tourism resources.Method/process]Two Layer Multi-group Transformers(TLMT) model based on attention network is proposed to learn the complemental text features from the title, main text and comments. Local fine-grained image features are learned using Fast R-CNN and MobileNet V3 models. Multiple linear regression model is proposed to synthesize the missed data in the shared sub-space. Bi-directional triplet loss function for searching images by text and searching text by image is constructed to learn the parameters of network.Result/conclusion] Extensive experimental results on standard benchmark Flickr30 k,our own dataset CulTour-Sha, and two datasets Flickr30 k-1 and CulTour-Sha-1 including some missed data demonstrate that: our method has better recall, need less memory space and has faster computing speed than several state-of-art methods of ITCMR.

Keywords:	digital culture and tourism cross modal retrieval deep learning feature bi-directional triplet loss function fine-grained features
本文献已被维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏