首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
As an information medium, video offers many possible retrieval and browsing modalities, far more than text, image or audio. Some of these, like searching the text of the spoken dialogue, are well developed, others like keyframe browsing tools are in their infancy, and others not yet technically achievable. For those modalities for browsing and retrieval which we cannot yet achieve we can only speculate as to how useful they will actually be, but we do not know for sure. In our work we have created a system to support multiple modalities for video browsing and retrieval including text search through the spoken dialogue, image matching against shot keyframes and object matching against segmented video objects. For the last of these, automatic segmentation and tracking of video objects is a computationally demanding problem which is not yet solved for generic natural video material, and when it is then it is expected to open up possibilities for user interaction with objects in video, including searching and browsing. In this paper we achieve object segmentation by working in a closed domain of animated cartoons. We describe an interactive user experiment on a medium-sized corpus of video where we were able to measure users’ use of video objects versus other modes of retrieval during multiple-iteration searching. Results of this experiment show that although object searching is used far less than text searching in the first iteration of a user’s search it is a popular and useful search type once an initial set of relevant shots have been found.  相似文献   

2.
To address the inability of current ranking systems to support subtopic retrieval, two main post-processing techniques of search results have been investigated: clustering and diversification. In this paper we present a comparative study of their performance, using a set of complementary evaluation measures that can be applied to both partitions and ranked lists, and two specialized test collections focusing on broad and ambiguous queries, respectively. The main finding of our experiments is that diversification of top hits is more useful for quick coverage of distinct subtopics whereas clustering is better for full retrieval of single subtopics, with a better balance in performance achieved through generating multiple subsets of diverse search results. We also found that there is little scope for improvement over the search engine baseline unless we are interested in strict full-subtopic retrieval, and that search results clustering methods do not perform well on queries with low divergence subtopics, mainly due to the difficulty of generating discriminative cluster labels.  相似文献   

3.
This paper focuses on temporal retrieval of activities in videos via sentence queries. Given a sentence query describing an activity, temporal moment retrieval aims at localizing the temporal segment within the video that best describes the textual query. This is a general yet challenging task as it requires the comprehending of both video and language. Existing research predominantly employ coarse frame-level features as the visual representation, obfuscating the specific details (e.g., the desired objects “girl”, “cup” and action “pour”) within the video which may provide critical cues for localizing the desired moment. In this paper, we propose a novel Spatial and Language-Temporal Tensor Fusion (SLTF) approach to resolve those issues. Specifically, the SLTF method first takes advantage of object-level local features and attends to the most relevant local features (e.g., the local features “girl”, “cup”) by spatial attention. Then we encode the sequence of the local features on consecutive frames by employing LSTM network, which can capture the motion information and interactions among these objects (e.g., the interaction “pour” involving these two objects). Meanwhile, language-temporal attention is utilized to emphasize the keywords based on moment context information. Thereafter, a tensor fusion network learns both the intra-modality and inter-modality dynamics, which can enhance the learning of moment-query representation. Therefore, our proposed two attention sub-networks can adaptively recognize the most relevant objects and interactions in the video, and simultaneously highlight the keywords in the query for retrieving the desired moment. Experimental results on three public benchmark datasets (obtained from TACOS, Charades-STA, and DiDeMo) show that the SLTF model significantly outperforms current state-of-the-art approaches, and demonstrate the benefits produced by new technologies incorporated into SLTF.  相似文献   

4.
基于内容的图像信息检索综述   总被引:13,自引:0,他引:13  
刘伟成  孙吉红 《情报科学》2002,20(4):431-433,437
基于内容的图像检索技术,即从大量的静止或活动视频图像库中检索包含目标物体的图像(或视频片段),在高度信息化的今天,已成为内容图像库中图像信息组织和管理不可缺少的技术,本文介绍了基于内容检索技术的进展,并对其主要方法如基于颜色、形状、纹理等静止图像检索技术以及视频检索技术进行了讨论。  相似文献   

5.
The problem of content-based video retrieval continues to pose a challenge to the research community, the performance of video retrieval systems being low due to the semantic gap. In this paper we consider whether taking advantage of context can aid the video retrieval process by making the prediction of relevance easier, i.e. if it is easier for a classification system to predict the relevance of a video shot under a given context, then that context has potential in also improving retrieval, since the underlying features better differentiate relevant from non-relevant video shots. We use an operational definition of context, where datasets can be split into disjoint sub-collections which reflect a particular context. Contexts considered include task difficulty and user expertise, among others. In the classification process, four main types of features are used to represent video-shots: conventional low-level visual features representing physical properties of the video shots, behavioral features which are based on user interaction with the video shots, and two different bag-of-words features obtained from the Automatic Speech Recognition from the audio of the video.  相似文献   

6.
Abnormal event detection in videos plays an essential role for public security. However, most weakly supervised learning methods ignore the relationship between the complicated spatial correlations and the dynamical trends of temporal pattern in video data. In this paper, we provide a new perspective, i.e., spatial similarity and temporal consistency are adopted to construct Spatio-Temporal Graph-based CNNs (STGCNs). For the feature extraction, we use Inflated 3D (I3D) convolutional networks to extract features which can better capture appearance and motion dynamics in videos. For the spatio graph and temporal graph, each video segment is regarded as a vertex in the graph, and attention mechanism is introduced to allocate attention for each segment. For the spatial-temporal fusion graph, we propose a self-adapting weighting to fuse them. Finally, we build ranking loss and classification loss to improve the robustness of STGCNs. We evaluate the performance of STGCNs on UCF-Crime datasets (total 128 h) and ShanghaiTech datasets (total 317,398 frames) with the AUC score 84.2% and 92.3%, respectively. The experimental results also show the effectiveness and robustness with other evaluation metrics.  相似文献   

7.
In this paper, we present ViGOR (Video Grouping, Organisation and Recommendation), an exploratory video retrieval system. Exploratory video retrieval tasks are hampered by the lack of semantics associated to video and the overwhelming amount of video items stored in these types of collections (e.g. YouTube, MSN video, etc.). In order to help facilitate these exploratory video search tasks we present a system that utilises two complementary approaches: the first a new search paradigm that allows the semantic grouping of videos and the second the exploitation of past usage history in order to provide video recommendations. We present two types of recommendation techniques adapted to the grouping search paradigm: the first is a global recommendation, which couples the multi-faceted nature of explorative video retrieval tasks with the current user need of information in order to provide recommendations, and second is a local recommendation, which exploits the organisational features of ViGOR in order to provide more localised recommendations based on a specific aspect of the user task. Two user evaluations were carried out in order to (1) validate the new search paradigm provided by ViGOR, characterised by the grouping functionalities and (2) evaluate the usefulness of the proposed recommendation approaches when integrated into ViGOR. The results of our evaluations show (1) that the grouping, organisational and recommendation functionalities can result in an improvement in the users’ search performance without adversely impacting their perceptions of the system and (2) that both recommendation approaches are relevant to the users at different stages of their search, showing the importance of using multi-faceted recommendations for video retrieval systems and also illustrating the many uses of collaborative recommendations for exploratory video search tasks.  相似文献   

8.
We propose in this paper an architecture for near-duplicate video detection based on: (i) index and query signature based structures integrating temporal and perceptual visual features and (ii) a matching framework computing the logical inference between index and query documents. As far as indexing is concerned, instead of concatenating low-level visual features in high-dimensional spaces which results in curse of dimensionality and redundancy issues, we adopt a perceptual symbolic representation based on color and texture concepts. For matching, we propose to instantiate a retrieval model based on logical inference through the coupling of an N-gram sliding window process and theoretically-sound lattice-based structures. The techniques we cover are robust and insensitive to general video editing and/or degradation, making it ideal for re-broadcasted video search. Experiments are carried out on large quantities of video data collected from the TRECVID 02, 03 and 04 collections and real-world video broadcasts recorded from two German TV stations. An empirical comparison over two state-of-the-art dynamic programming techniques is encouraging and demonstrates the advantage and feasibility of our method.  相似文献   

9.
This paper presents an investigation about how to automatically formulate effective queries using full or partial relevance information (i.e., the terms that are in relevant documents) in the context of relevance feedback (RF). The effects of adding relevance information in the RF environment are studied via controlled experiments. The conditions of these controlled experiments are formalized into a set of assumptions that form the framework of our study. This framework is called idealized relevance feedback (IRF) framework. In our IRF settings, we confirm the previous findings of relevance feedback studies. In addition, our experiments show that better retrieval effectiveness can be obtained when (i) we normalize the term weights by their ranks, (ii) we select weighted terms in the top K retrieved documents, (iii) we include terms in the initial title queries, and (iv) we use the best query sizes for each topic instead of the average best query size where they produce at most five percentage points improvement in the mean average precision (MAP) value. We have also achieved a new level of retrieval effectiveness which is about 55–60% MAP instead of 40+% in the previous findings. This new level of retrieval effectiveness was found to be similar to a level using a TREC ad hoc test collection that is about double the number of documents in the TREC-3 test collection used in previous works.  相似文献   

10.
The current study addresses the problem of retrieving a specific moment from an untrimmed video by a sentence query. Existing methods have achieved high performance by designing various structures to match visual-text relations. Yet, these methods tend to return an interval starting from 0s, which we named “0s bias”. In this paper, we propose a Circular Co-Teaching (CCT) mechanism using a captioner to improve an existing retrieval model (localizer) from two aspects: biased annotations and easy samples. Correspondingly, CCT contains two processes: (1) Pseudo Query Generation (captioner to localizer), aiming at transferring the knowledge from generated queries to the localizer to balance annotations; (2) Competence-based Curriculum Learning (localizer to captioner), training the captioner in an easy-to-hard fashion guided by localization results, making pairs of the false-positive moment and pseudo query become easy samples for the localizer. Extensive experiments show that our CCT can alleviate “0s bias” with even 4% improvement for existing approaches on average in two public datasets (ActivityNet-Captions, and Charades-STA), in terms of R@1,IoU=0.7. Notably, our method also outperforms baselines in an out-of-distribution scenario. We also quantitatively validate CCT’s ability to cope with “0s bias” by a proposed metric, DM. Our study not only theoretically contributes to detecting “0s bias”, but also provides a highly effective tool for video moment retrieval by alleviating such bias.  相似文献   

11.
In the web environment, most of the queries issued by users are implicit by nature. Inferring the different temporal intents of this type of query enhances the overall temporal part of the web search results. Previous works tackling this problem usually focused on news queries, where the retrieval of the most recent results related to the query are usually sufficient to meet the user's information needs. However, few works have studied the importance of time in queries such as “Philip Seymour Hoffman” where the results may require no recency at all. In this work, we focus on this type of queries named “time-sensitive queries” where the results are preferably from a diversified time span, not necessarily the most recent one. Unlike related work, we follow a content-based approach to identify the most important time periods of the query and integrate time into a re-ranking model to boost the retrieval of documents whose contents match the query time period. For that purpose, we define a linear combination of topical and temporal scores, which reflects the relevance of any web document both in the topical and temporal dimensions, thus contributing to improve the effectiveness of the ranked results across different types of queries. Our approach relies on a novel temporal similarity measure that is capable of determining the most important dates for a query, while filtering out the non-relevant ones. Through extensive experimental evaluation over web corpora, we show that our model offers promising results compared to baseline approaches. As a result of our investigation, we publicly provide a set of web services and a web search interface so that the system can be graphically explored by the research community.  相似文献   

12.
With the increasing growth of video data, especially in cyberspace, video captioning or the representation of video data in the form of natural language has been receiving an increasing amount of interest in several applications like video retrieval, action recognition, and video understanding, to name a few. In recent years, deep neural networks have been successfully applied for the task of video captioning. However, most existing methods describe a video clip using only one sentence that may not correctly cover the semantic content of the video clip. In this paper, a new multi-sentence video captioning algorithm is proposed using a content-oriented beam search approach and a multi-stage refining method. We use a new content-oriented beam search algorithm to update the probabilities of words generated by the trained deep networks. The proposed beam search algorithm leverages the high-level semantic information of an input video using an object detector and the structural dictionary of sentences. We also use a multi-stage refining approach to remove structurally wrong sentences as well as sentences that are less related to the semantic content of the video. To this intent, a new two-branch deep neural network is proposed to measure the relevance score between a sentence and a video. We evaluated the performance of the proposed method with two popular video captioning databases and compared the results with the results of some state-of-the-art approaches. The experiments showed the superior performance of the proposed algorithm. For instance, in the MSVD database, the proposed method shows an enhancement of 6% for the best-1 sentences in comparison to the best state-of-the-art alternative.  相似文献   

13.
The advancements of search engines for traditional text documents have enabled the effective retrieval of massive textual information in a resource-efficient manner. However, such conventional search methodologies often suffer from poor retrieval accuracy especially when documents exhibit unique properties that behoove specialized and deeper semantic extraction. Recently, AlgorithmSeer, a search engine for algorithms has been proposed, that extracts pseudo-codes and shallow textual metadata from scientific publications and treats them as traditional documents so that the conventional search engine methodology could be applied. However, such a system fails to facilitate user search queries that seek to identify algorithm-specific information, such as the datasets on which algorithms operate, the performance of algorithms, and runtime complexity, etc. In this paper, a set of enhancements to the previously proposed algorithm search engine are presented. Specifically, we propose a set of methods to automatically identify and extract algorithmic pseudo-codes and the sentences that convey related algorithmic metadata using a set of machine-learning techniques. In an experiment with over 93,000 text lines, we introduce 60 novel features, comprising content-based, font style based and structure-based feature groups, to extract algorithmic pseudo-codes. Our proposed pseudo-code extraction method achieves 93.32% F1-score, outperforming the state-of-the-art techniques by 28%. Additionally, we propose a method to extract algorithmic-related sentences using deep neural networks and achieve an accuracy of 78.5%, outperforming a Rule-based model and a support vector machine model by 28% and 16%, respectively.  相似文献   

14.
基于内容的视频检索关键技术研究   总被引:5,自引:0,他引:5  
李宗民  于广斌  刘玉杰 《情报科学》2004,22(7):850-852,861
基于内容的视频检索是当前计算机视觉与知识挖掘等领域研究的热点之一。本文较系统地介绍了该研究方向的现状,镜头检测,基于运动特征的视频检索,镜头聚类等关键技术,并提出了今后研究的方向。  相似文献   

15.
时态信息抽取和检索是Web领域中时态信息处理的两个关键问题.本文首先分析了时态信息对于Web应用的意义,然后对Web领域中时态信息抽取和检索的相关现状进行了深入讨论.在此基础上讨论了Web时态信息的本体表示问题.最后,预测了Web时态信息抽取与检索的若干未来发展方向.  相似文献   

16.
本文基于31个省区市2001—2010年的面板数据,从整体、阶段、区域三个视角实证研究了腐败影响因素与腐败程度的关系。研究发现除公务员薪酬始终与腐败程度保持显著负相关关系外,财政分权、开放程度等其它7个自变量随着时间推移和空间变化在影响方向、影响程度上都呈现出明显时空差异。因此,有必要通过增加公务员薪酬待遇、制定分类反腐政策、加强综合反腐等措施来治理腐败。  相似文献   

17.
The estimation of query model is an important task in language modeling (LM) approaches to information retrieval (IR). The ideal estimation is expected to be not only effective in terms of high mean retrieval performance over all queries, but also stable in terms of low variance of retrieval performance across different queries. In practice, however, improving effectiveness can sacrifice stability, and vice versa. In this paper, we propose to study this tradeoff from a new perspective, i.e., the bias–variance tradeoff, which is a fundamental theory in statistics. We formulate the notion of bias–variance regarding retrieval performance and estimation quality of query models. We then investigate several estimated query models, by analyzing when and why the bias–variance tradeoff will occur, and how the bias and variance can be reduced simultaneously. A series of experiments on four TREC collections have been conducted to systematically evaluate our bias–variance analysis. Our approach and results will potentially form an analysis framework and a novel evaluation strategy for query language modeling.  相似文献   

18.
With the rapid development of digital equipment and the continuous upgrading of online media, a growing number of people are willing to post videos on the web to share their daily lives (Jelodar, Paulius, & Sun, 2019). Generally, not all video segments are popular with audiences, some of which may be boring. If we can predict which segment in a newly generated video stream would be popular, the audiences can only enjoy this segment rather than watch the whole video to find the funny point. And if we can predict the emotions that the audiences would induce when they watch a video, this must be helpful for video analysis and for guiding the video-makers to improve their videos. In recent years, crowd-sourced time-sync video comments have emerged worldwide, supporting further research on temporal video labeling. In this paper, we propose a novel framework to achieve the following goal: Predicting which segment in a newly generated video stream (hasn’t been commented with the time-sync comments) will be popular among the audiences. At last, experimental results on real-world data demonstrate the effectiveness of the proposed framework and justify the idea of predicting the popularities of segments in a video exploiting crowd-sourced time-sync comments as a bridge to analyze videos.  相似文献   

19.
Both general and domain-specific search engines have adopted query suggestion techniques to help users formulate effective queries. In the specific domain of literature search (e.g., finding academic papers), the initial queries are usually based on a draft paper or abstract, rather than short lists of keywords. In this paper, we investigate phrasal-concept query suggestions for literature search. These suggestions explicitly specify important phrasal concepts related to an initial detailed query. The merits of phrasal-concept query suggestions for this domain are their readability and retrieval effectiveness: (1) phrasal concepts are natural for academic authors because of their frequent use of terminology and subject-specific phrases and (2) academic papers describe their key ideas via these subject-specific phrases, and thus phrasal concepts can be used effectively to find those papers. We propose a novel phrasal-concept query suggestion technique that generates queries by identifying key phrasal-concepts from pseudo-labeled documents and combines them with related phrases. Our proposed technique is evaluated in terms of both user preference and retrieval effectiveness. We conduct user experiments to verify a preference for our approach, in comparison to baseline query suggestion methods, and demonstrate the effectiveness of the technique with retrieval experiments.  相似文献   

20.
The text retrieval method using latent semantic indexing (LSI) technique with truncated singular value decomposition (SVD) has been intensively studied in recent years. The SVD reduces the noise contained in the original representation of the term–document matrix and improves the information retrieval accuracy. Recent studies indicate that SVD is mostly useful for small homogeneous data collections. For large inhomogeneous datasets, the performance of the SVD based text retrieval technique may deteriorate. We propose to partition a large inhomogeneous dataset into several smaller ones with clustered structure, on which we apply the truncated SVD. Our experimental results show that the clustered SVD strategies may enhance the retrieval accuracy and reduce the computing and storage costs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号