首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
Automatic review assignment can significantly improve the productivity of many people such as conference organizers, journal editors and grant administrators. A general setup of the review assignment problem involves assigning a set of reviewers on a committee to a set of documents to be reviewed under the constraint of review quota so that the reviewers assigned to a document can collectively cover multiple topic aspects of the document. No previous work has addressed such a setup of committee review assignments while also considering matching multiple aspects of topics and expertise. In this paper, we tackle the problem of committee review assignment with multi-aspect expertise matching by casting it as an integer linear programming problem. The proposed algorithm can naturally accommodate any probabilistic or deterministic method for modeling multiple aspects to automate committee review assignments. Evaluation using a multi-aspect review assignment test set constructed using ACM SIGIR publications shows that the proposed algorithm is effective and efficient for committee review assignments based on multi-aspect expertise matching.  相似文献   

2.
Traditional information retrieval techniques that primarily rely on keyword-based linking of the query and document spaces face challenges such as the vocabulary mismatch problem where relevant documents to a given query might not be retrieved simply due to the use of different terminology for describing the same concepts. As such, semantic search techniques aim to address such limitations of keyword-based retrieval models by incorporating semantic information from standard knowledge bases such as Freebase and DBpedia. The literature has already shown that while the sole consideration of semantic information might not lead to improved retrieval performance over keyword-based search, their consideration enables the retrieval of a set of relevant documents that cannot be retrieved by keyword-based methods. As such, building indices that store and provide access to semantic information during the retrieval process is important. While the process for building and querying keyword-based indices is quite well understood, the incorporation of semantic information within search indices is still an open challenge. Existing work have proposed to build one unified index encompassing both textual and semantic information or to build separate yet integrated indices for each information type but they face limitations such as increased query process time. In this paper, we propose to use neural embeddings-based representations of term, semantic entity, semantic type and documents within the same embedding space to facilitate the development of a unified search index that would consist of these four information types. We perform experiments on standard and widely used document collections including Clueweb09-B and Robust04 to evaluate our proposed indexing strategy from both effectiveness and efficiency perspectives. Based on our experiments, we find that when neural embeddings are used to build inverted indices; hence relaxing the requirement to explicitly observe the posting list key in the indexed document: (a) retrieval efficiency will increase compared to a standard inverted index, hence reduces the index size and query processing time, and (b) while retrieval efficiency, which is the main objective of an efficient indexing mechanism improves using our proposed method, retrieval effectiveness also retains competitive performance compared to the baseline in terms of retrieving a reasonable number of relevant documents from the indexed corpus.  相似文献   

3.
Cost optimization continues to be a critical concern for many human resources departments. The key is to balance between costs and business value. In particular, computer science organizations prefer to hire people who are expert in only one skill area and have a slight superficial knowledge in other areas that gives them the ability to collaborate across different aspects of project. Community Question Answering networks provide good platforms for people and organizations to share knowledge and find experts. An important issue in expert finding is that an expert has to constantly update his knowledge after being saturated in his field of expertise to still be identified as expert. A person who fails to preserve his expertise is likely to lose his expertise. This work justifies this question that does take the concept of time into account improve the quality of expertise retrieval. We propose a new method for T-shaped expert finding that is based on temporal expert profiling. The proposed method takes the temporal property of expertise into account to mine the shape of expertise for each candidate expert based on his profile. To this end, for each candidate expert, we take snapshots of his expertise trees at regular time intervals and learn the relation between temporal changes in different expertise trees and candidates’ profile. Finally, we use a filtering technique that is applied on top of the profiling method, to find shape of expertise for candidate experts. Experimental results on a large test collection show the superiority of the proposed method in terms of quality of results in comparison with state-of-the-art.  相似文献   

4.
The purpose of this study is to provide automatic new topic identification of search engine query logs, and estimate the effect of statistical characteristics of search engine queries on new topic identification. By applying multiple linear regression and multi-factor ANOVA on a sample data log from the Excite search engine, we demonstrated that the statistical characteristics of Web search queries, such as time interval, search pattern and position of a query in a user session, are effective on shifting to a new topic. Multiple linear regression is also a successful tool for estimating topic shifts and continuations. The findings of this study provide statistical proof for the relationship between the non-semantic characteristics of Web search queries and the occurrence of topic shifts and continuations.  相似文献   

5.
6.
Assigning paper to suitable reviewers is of great significance to ensure the accuracy and fairness of peer review results. In the past three decades, many researchers have made a wealth of achievements on the reviewer assignment problem (RAP). In this survey, we provide a comprehensive review of the primary research achievements on reviewer assignment algorithm from 1992 to 2022. Specially, this survey first discusses the background and necessity of automatic reviewer assignment, and then systematically summarize the existing research work from three aspects, i.e., construction of candidate reviewer database, computation of matching degree between reviewers and papers, and reviewer assignment optimization algorithm, with objective comments on the advantages and disadvantages of the current algorithms. Afterwards, the evaluation metrics and datasets of reviewer assignment algorithm are summarized. To conclude, we prospect the potential research directions of RAP. Since there are few comprehensive survey papers on reviewer assignment algorithm in the past ten years, this survey can serve as a valuable reference for the related researchers and peer review organizers.  相似文献   

7.
In the context of social media, users usually post relevant information corresponding to the contents of events mentioned in a Web document. This information posses two important values in that (i) it reflects the content of an event and (ii) it shares hidden topics with sentences in the main document. In this paper, we present a novel model to capture the nature of relationships between document sentences and post information (comments or tweets) in sharing hidden topics for summarization of Web documents by utilizing relevant post information. Unlike previous methods which are usually based on hand-crafted features, our approach ranks document sentences and user posts based on their importance to the topics. The sentence-user-post relation is formulated in a share topic matrix, which presents their mutual reinforcement support. Our proposed matrix co-factorization algorithm computes the score of each document sentence and user post and extracts the top ranked document sentences and comments (or tweets) as a summary. We apply the model to the task of summarization on three datasets in two languages, English and Vietnamese, of social context summarization and also on DUC 2004 (a standard corpus of the traditional summarization task). According to the experimental results, our model significantly outperforms the basic matrix factorization and achieves competitive ROUGE-scores with state-of-the-art methods.  相似文献   

8.
Information management is the management of organizational processes, technologies, and people which collectively create, acquire, integrate, organize, process, store, disseminate, access, and dispose of the information. Information management is a vast, multi-disciplinary domain that syndicates various subdomains and perfectly intermingles with other domains. This study aims to provide a comprehensive overview of the information management domain from 1970 to 2019. Drawing upon the methodology from statistical text analysis research, this study summarizes the evolution of knowledge in this domain by examining the publication trends as per authors, institutions, countries, etc. Further, this study proposes a probabilistic generative model based on structural topic modeling to understand and extract the latent themes from the research articles related to information management. Furthermore, this study graphically visualizes the variations in the topic prevalences over the period of 1970 to 2019. The results highlight that the most common themes are data management, knowledge management, environmental management, project management, service management, and mobile and web management. The findings also identify themes such as knowledge management, environmental management, project management, and social communication as academic hotspots for future research.  相似文献   

9.
Recent developments have shown that entity-based models that rely on information from the knowledge graph can improve document retrieval performance. However, given the non-transitive nature of relatedness between entities on the knowledge graph, the use of semantic relatedness measures can lead to topic drift. To address this issue, we propose a relevance-based model for entity selection based on pseudo-relevance feedback, which is then used to systematically expand the input query leading to improved retrieval performance. We perform our experiments on the widely used TREC Web corpora and empirically show that our proposed approach to entity selection significantly improves ad hoc document retrieval compared to strong baselines. More concretely, the contributions of this work are as follows: (1) We introduce a graphical probability model that captures dependencies between entities within the query and documents. (2) We propose an unsupervised entity selection method based on the graphical model for query entity expansion and then for ad hoc retrieval. (3) We thoroughly evaluate our method and compare it with the state-of-the-art keyword and entity based retrieval methods. We demonstrate that the proposed retrieval model shows improved performance over all the other baselines on ClueWeb09B and ClueWeb12B, two widely used Web corpora, on the [email protected], and [email protected] metrics. We also show that the proposed method is most effective on the difficult queries. In addition, We compare our proposed entity selection with a state-of-the-art entity selection technique within the context of ad hoc retrieval using a basic query expansion method and illustrate that it provides more effective retrieval for all expansion weights and different number of expansion entities.  相似文献   

10.
Many existing systems for analyzing and summarizing customer reviews about products or service are based on a number of prominent review aspects. Conventionally, the prominent review aspects of a product type are determined manually. This costly approach cannot scale to large and cross-domain services such as Amazon.com, Taobao.com or Yelp.com where there are a large number of product types and new products emerge almost everyday. In this paper, we propose a novel method empowered by knowledge sources such as Probase and WordNet, for extracting the most prominent aspects of a given product type from textual reviews. The proposed method, ExtRA (Extraction of Prominent Review Aspects), (i) extracts the aspect candidates from text reviews based on a data-driven approach, (ii) builds an aspect graph utilizing the Probase to narrow the aspect space, (iii) separates the space into reasonable aspect clusters by employing a set ofproposed algorithms and finally (iv) generates K most prominent aspect terms or phrases which do not overlap semantically automatically without supervision from those aspect clusters. ExtRA extracts high-quality prominent aspects as well as aspect clusters with little semantic overlap by exploring knowledge sources. ExtRA can extract not only words but also phrases as prominent aspects. Furthermore, it is general-purpose and can be applied to almost any type of product and service. Extensive experiments show that ExtRA is effective and achieves the state-of-the-art performance on a dataset consisting of different product types.  相似文献   

11.
Topic models are widely used for thematic structure discovery in text. But traditional topic models often require dedicated inference procedures for specific tasks at hand. Also, they are not designed to generate word-level semantic representations. To address the limitations, we propose a neural topic modeling approach based on the Generative Adversarial Nets (GANs), called Adversarial-neural Topic Model (ATM) in this paper. To our best knowledge, this work is the first attempt to use adversarial training for topic modeling. The proposed ATM models topics with dirichlet prior and employs a generator network to capture the semantic patterns among latent topics. Meanwhile, the generator could also produce word-level semantic representations. Besides, to illustrate the feasibility of porting ATM to tasks other than topic modeling, we apply ATM for open domain event extraction. To validate the effectiveness of the proposed ATM, two topic modeling benchmark corpora and an event dataset are employed in the experiments. Our experimental results on benchmark corpora show that ATM generates more coherence topics (considering five topic coherence measures), outperforming a number of competitive baselines. Moreover, the experiments on event dataset also validate that the proposed approach is able to extract meaningful events from news articles.  相似文献   

12.
Relation extraction aims at finding meaningful relationships between two named entities from within unstructured textual content. In this paper, we define the problem of information extraction as a matrix completion problem where we employ the notion of universal schemas formed as a collection of patterns derived from open information extraction systems as well as additional features derived from grammatical clause patterns and statistical topic models. One of the challenges with earlier work that employ matrix completion methods is that such approaches require a sufficient number of observed relation instances to be able to make predictions. However, in practice there is often insufficient number of explicit evidence supporting each relation type that could be used within the matrix model. Hence, existing work suffer from a low recall. In our work, we extend the work in the state of the art by proposing novel ways of integrating two sets of features, i.e., topic models and grammatical clause structures, for alleviating the low recall problem. More specifically, we propose that it is possible to (1) employ grammatical clause information from textual sentences to serve as an implicit indication of relation type and argument similarity. The basis for this is that it is likely that similar relation types and arguments are observed within similar grammatical structures, and (2) benefit from statistical topic models to determine similarity between relation types and arguments. We employ statistical topic models to determine relation type and argument similarity based on their co-occurrence within the same topics. We have performed extensive experiments based on both gold standard and silver standard datasets. The experiments show that our approach has been able to address the low recall problem in existing methods, by showing an improvement of 21% on recall and 8% on f-measure over the state of the art baseline.  相似文献   

13.
Professional work is often regulated by procedures that shape the information seeking involved in performing a task. Yet, research on professionals’ information seeking tends to bypass procedures and depict information seeking as an informal activity. In this study we analyze two healthcare tasks governed by procedures: triage and timeouts. While information seeking is central to both procedures, we find that the coordinating nurses rarely engage in information seeking when they triage patients. Inversely, the physicians value convening for timeouts to seek information. To explain these findings we distinguish between junior and expert professionals and between uncertain and equivocal tasks. The triage procedure specifies which information to retrieve but expert professionals such as the coordinating nurses tend to perform triage, which is an uncertain task, by holistic pattern recognition rather than information seeking. For timeouts, which target an equivocal task, the procedure facilitates information seeking by creating a space for open-ended collaborative reflection. Both junior and expert physicians temporarily suspend patient treatment in favor of this opportunity to reflect on their actions, though partly for different reasons. We discuss implications for models of professionals’ information seeking.  相似文献   

14.
Although deep learning breakthroughs in NLP are based on learning distributed word representations by neural language models, these methods suffer from a classic drawback of unsupervised learning techniques. Furthermore, the performance of general-word embedding has been shown to be heavily task-dependent. To tackle this issue, recent researches have been proposed to learn the sentiment-enhanced word vectors for sentiment analysis. However, the common limitation of these approaches is that they require external sentiment lexicon sources and the construction and maintenance of these resources involve a set of complexing, time-consuming, and error-prone tasks. In this regard, this paper proposes a method of sentiment lexicon embedding that better represents sentiment word's semantic relationships than existing word embedding techniques without manually-annotated sentiment corpus. The major distinguishing factor of the proposed framework was that joint encoding morphemes and their POS tags, and training only important lexical morphemes in the embedding space. To verify the effectiveness of the proposed method, we conducted experiments comparing with two baseline models. As a result, the revised embedding approach mitigated the problem of conventional context-based word embedding method and, in turn, improved the performance of sentiment classification.  相似文献   

15.
In this paper, for solving future equation systems, two novel discrete-time advanced zeroing neural network models are proposed, analyzed and investigated. First of all, by using integral-type error function and twice zeroing neural network (or termed, Zhang neural network) formula, as the preliminaries and bases of future problems solving, two continuous-time advanced zeroing neural network models are presented for solving continuous time-variant equation systems. Secondly, a one-step-ahead numerical differentiation rule termed 5-instant discretization formula is presented for the first-order derivative approximation with higher computational precision. By exploiting the presented 5-instant discretization formula to discretize the continuous-time advanced zeroing neural network models, two novel discrete-time advanced zeroing neural network models are proposed. Theoretical analyses on the convergence and precision of the discrete-time advanced zeroing neural network models are proposed. In addition, in the presence of disturbance, the proposed discrete-time advanced zeroing neural network models still possess excellent performance. Comparative numerical experimental results further substantiate the efficacy and superiority of the proposed discrete-time advanced zeroing neural network models for solving the future equation systems.  相似文献   

16.
This paper presents a semantically rich document representation model for automatically classifying financial documents into predefined categories utilizing deep learning. The model architecture consists of two main modules including document representation and document classification. In the first module, a document is enriched with semantics using background knowledge provided by an ontology and through the acquisition of its relevant terminology. Acquisition of terminology integrated to the ontology extends the capabilities of semantically rich document representations with an in depth-coverage of concepts, thereby capturing the whole conceptualization involved in documents. Semantically rich representations obtained from the first module will serve as input to the document classification module which aims at finding the most appropriate category for that document through deep learning. Three different deep learning networks each belonging to a different category of machine learning techniques for ontological document classification using a real-life ontology are used.Multiple simulations are carried out with various deep neural networks configurations, and our findings reveal that a three hidden layer feedforward network with 1024 neurons obtain the highest document classification performance on the INFUSE dataset. The performance in terms of F1 score is further increased by almost five percentage points to 78.10% for the same network configuration when the relevant terminology integrated to the ontology is applied to enrich document representation. Furthermore, we conducted a comparative performance evaluation using various state-of-the-art document representation approaches and classification techniques including shallow and conventional machine learning classifiers.  相似文献   

17.
In addressing persistent gaps in existing theories, recent advances in data-driven research approaches offer novel perspectives and exciting insights across a spectrum of scientific fields concerned with technological change and the socio-economic impact thereof. The present investigation suggests a novel approach to identify and analyze the evolution of technology sectors, in this case, information and communications technology (ICT), considering international collaboration patterns and knowledge flows and spillovers via information inputs derived from patent documents.The objective is to utilize and explore information regarding inventors’ geo-location, technology sector classifications, and patent citation records to construct various types of networks. This, in turn, will open up avenues to discover the nature of evolutionary pathways in ICT trajectories and will also provide evidence of how the overall ICT knowledge space, as well as directional knowledge flows within the ICT space, have evolved differently. It is expected that this data-driven inquiry will deliver intuitive results for decision makers seeking evidence for future resource allocation and who are interested in identifying well-suited collaborators for the development of potential next-generation technologies. Further, it will equip researchers in technology management, economic geography, or similar fields with a systematic approach to analyze evolutionary pathways of technological advancements and further enable exploitation and development of new theories regarding technological change and its socio-economic consequences.  相似文献   

18.
19.
This study introduces a data-driven approach for assessing the practices and effectiveness of digital diplomacy, using the cases of South Korea and Japan. The study compared the networking power of public diplomacy organizations based on social media use, engagement with the public, interaction patterns among the public, and public perceptions and attitudes toward organizations. This was accomplished through a three-step method employing social network analysis and topic modeling. The network analysis found that the Korean public diplomacy organization generated a larger, more loosely connected, and decentralized comment network than the Japanese organization, which presented a “small-world” connectivity pattern with highly interconnected actors. The findings also suggest that, compared to the Japanese organization, the Korean organization was successful in not only enhancing its soft power through social media but also building international networks among the foreign public.  相似文献   

20.
This paper studies event-triggered synchronization control problem for delayed neural networks with quantization and actuator saturation. Firstly, in order to reduce the load of network meanwhile retain required performance of system, the event-triggered scheme is adopted to determine if the sampled signal will be transmitted to the quantizer. Secondly, a synchronization error model is constructed to describe the master-slave synchronization system with event-triggered scheme, quantization and input saturation in a unified framework. Thirdly, on the basis of Lyapunov–Krasovskii functional, sufficient conditions for stabilization are derived which can ensure synchronization of the master system and slave system; particularly, a co-designed parameters of controller and the corresponding event-triggered parameters are obtained under the above stability condition. Lastly, two numerical examples are employed to illustrate the effectiveness of the proposed approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号