首页 | 本学科首页   官方微博 | 高级检索  
     检索      

面向数字文旅的图像文本跨模态检索方法
引用本文:高蕴梅.面向数字文旅的图像文本跨模态检索方法[J].情报资料工作,2022(1):71-80.
作者姓名:高蕴梅
作者单位:常熟理工学院图书馆
基金项目:教育部人文社会科学研究青年基金项目“学术信息网络中的核心边缘结构测度研究”(项目编号:18YJC870011);江苏高校哲学社会科学研究项目“视频内容分析的360度教学评价研究”(项目编号:2020SJA1425);苏州市图书馆学会2021年重点项目“视频资源跨模态智慧检索研究”(项目编号:21-A-02);常熟理工学院高等教育研究项目“视频内容分析的360度教学评价”(项目编号:GJ1905)的研究成果之一。
摘    要:目的/意义]图像文本跨模态检索应用对最大化利用数字文旅资源具有重要意义。然而,数字文旅领域的图像文本跨模态检索方法面临长文本挑战、数据缺失、内存资源有限等问题。为此,我们提出了一种新的基于Transformers和MobileNet V3模型的数字文旅图像文本跨模态方法。方法/过程]首先,提出了基于自注意力机制的双层多组Transformers模型从标题、正文和评论等文本中学习具有互补性的文本特征;其次,设计了FastR-CNN和MobileNet V3模型学习图像局部细粒度特征;最后,提出了多元线性回归方法在共享子空间补全缺失数据。构建以图搜文和以文搜图的双向三元损失函数学习模型参数。结果/结论]在标准数据集Flickr30k、自建数据集CulTour-Sha和有数据缺失的数据集Flickr30k-1与CulTour-Sha-1上的大量实验结果表明,我们的方法在召回率、内存需求和计算速度等方面优于当前几种先进的跨模态检索方法。

关 键 词:数字文旅  跨模态检索  深度学习特征  双向三元组损失函数  精细特征

Digital Cultural Travel-oriented Image and Text Cross-modal Retrieval Method
Gao Yunmei.Digital Cultural Travel-oriented Image and Text Cross-modal Retrieval Method[J].Information and Documentation Services,2022(1):71-80.
Authors:Gao Yunmei
Institution:(Library of Changshu Institute of Technology,Jiangsu,215500)
Abstract:Purpose/significance]It is important for us to maximize use of digital culture and tourism resources by Image-TextCross Modal Retrieval(IT-CMR).Related methods used in the field of digital culture and tourism resources have the challenges of long text, limited memory and some missed data. In order to address those problems, we proposed a new method ofIT-CMR using Transformer and MobileNet V3 models for digital culture and tourism resources.Method/process]Two Layer Multi-group Transformers(TLMT) model based on attention network is proposed to learn the complemental text features from the title, main text and comments. Local fine-grained image features are learned using Fast R-CNN and MobileNet V3 models. Multiple linear regression model is proposed to synthesize the missed data in the shared sub-space. Bi-directional triplet loss function for searching images by text and searching text by image is constructed to learn the parameters of network.Result/conclusion] Extensive experimental results on standard benchmark Flickr30 k,our own dataset CulTour-Sha, and two datasets Flickr30 k-1 and CulTour-Sha-1 including some missed data demonstrate that: our method has better recall, need less memory space and has faster computing speed than several state-of-art methods of ITCMR.
Keywords:digital culture and tourism  cross modal retrieval  deep learning feature  bi-directional triplet loss function  fine-grained features
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号