首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到13条相似文献,搜索用时 15 毫秒
1.
Question-answering has become one of the most popular information retrieval applications. Despite that most question-answering systems try to improve the user experience and the technology used in finding relevant results, many difficulties are still faced because of the continuous increase in the amount of web content. Questions Classification (QC) plays an important role in question-answering systems, with one of the major tasks in the enhancement of the classification process being the identification of questions types. A broad range of QC approaches has been proposed with the aim of helping to find a solution for the classification problems; most of these are approaches based on bag-of-words or dictionaries. In this research, we present an analysis of the different type of questions based on their grammatical structure. We identify different patterns and use machine learning algorithms to classify them. A framework is proposed for question classification using a grammar-based approach (GQCC) which exploits the structure of the questions. Our findings indicate that using syntactic categories related to different domain-specific types of Common Nouns, Numeral Numbers and Proper Nouns enable the machine learning algorithms to better differentiate between different question types. The paper presents a wide range of experiments the results show that the GQCC using J48 classifier has outperformed other classification methods with 90.1% accuracy.  相似文献   

2.
Consumers evaluate products through online reviews, in addition to sharing their product experiences. Online reviews affect product marketing, and companies use online reviews to investigate consumer attitudes and perceptions of their products. However, when analyzing a review, it is often the case that specific contexts are not taken into consideration and meaningful information is not obtained from the analysis results. This study suggests a methodology for analyzing reviews in the context of comparing two competing products. In addition, by analyzing the discriminative attributes of competing products, we were able to derive more specific information than an overall product analysis. Analyzing the discriminative attributes in the context of comparing competing products provides clarity on analyzing the strengths and weaknesses of competitive products and provides realistic information that can help the company's management activities. Considering this purpose, this study collected a review of the BB Cream product line in the cosmetics field. The analysis was sequentially carried out in three stages. First, we extracted words that represent discriminative attributes by analyzing the percentage difference of words. Second, different attribute words were classified according to the meaning used in the review by using latent semantic analysis. Finally, the polarity of discriminative attribute words was analyzed using Labeled-LDA. This analysis method can be used as a market research method as it can extract more information than a traditional survey or interview method, and can save cost and time through the automation of the program.  相似文献   

3.
Transductive classification is a useful way to classify texts when labeled training examples are insufficient. Several algorithms to perform transductive classification considering text collections represented in a vector space model have been proposed. However, the use of these algorithms is unfeasible in practical applications due to the independence assumption among instances or terms and the drawbacks of these algorithms. Network-based algorithms come up to avoid the drawbacks of the algorithms based on vector space model and to improve transductive classification. Networks are mostly used for label propagation, in which some labeled objects propagate their labels to other objects through the network connections. Bipartite networks are useful to represent text collections as networks and perform label propagation. The generation of this type of network avoids requirements such as collections with hyperlinks or citations, computation of similarities among all texts in the collection, as well as the setup of a number of parameters. In a bipartite heterogeneous network, objects correspond to documents and terms, and the connections are given by the occurrences of terms in documents. The label propagation is performed from documents to terms and then from terms to documents iteratively. Nevertheless, instead of using terms just as means of label propagation, in this article we propose the use of the bipartite network structure to define the relevance scores of terms for classes through an optimization process and then propagate these relevance scores to define labels for unlabeled documents. The new document labels are used to redefine the relevance scores of terms which consequently redefine the labels of unlabeled documents in an iterative process. We demonstrated that the proposed approach surpasses the algorithms for transductive classification based on vector space model or networks. Moreover, we demonstrated that the proposed algorithm effectively makes use of unlabeled documents to improve classification and it is faster than other transductive algorithms.  相似文献   

4.
Queries submitted to search engines can be classified according to the user goals into three distinct categories: navigational, informational, and transactional. Such classification may be useful, for instance, as additional information for advertisement selection algorithms and for search engine ranking functions, among other possible applications. This paper presents a study about the impact of using several features extracted from the document collection and query logs on the task of automatically identifying the users’ goals behind their queries. We propose the use of new features not previously reported in literature and study their impact on the quality of the query classification task. Further, we study the impact of each feature on different web collections, showing that the choice of the best set of features may change according to the target collection.  相似文献   

5.
Scholarly communication is undergoing transformation under the confluence of many forces. The purpose of this article is to explore trends in transforming scholarly publishing and their implications. It examines how collaboration and volume of information production were changed over the past century. It also explores how older documents are used under today’s network environment where new information is easily accessible. Understanding these trends would help us design more effective electronic scholarly publishing systems and digital libraries, and serve the needs of scholars more responsively.  相似文献   

6.
Data availability and access to various platforms, is changing the nature of Information Systems (IS) studies. Such studies often use large datasets, which may incorporate structured and unstructured data, from various platforms. The questions that such papers address, in turn, may attempt to use methods from computational science like sentiment mining, text mining, network science and image analytics to derive insights. However, there is often a weak theoretical contribution in many of these studies. We point out the need for such studies to contribute back to the IS discipline, whereby findings can explain more about the phenomenon surrounding the interaction of people with technology artefacts and the ecosystem within which these contextual usage is situated. Our opinion paper attempts to address this gap and provide insights on the methodological adaptations required in “big data studies” to be converted into “IS research” and contribute to theory building in information systems.  相似文献   

7.
Automatic text classification is the task of organizing documents into pre-determined classes, generally using machine learning algorithms. Generally speaking, it is one of the most important methods to organize and make use of the gigantic amounts of information that exist in unstructured textual format. Text classification is a widely studied research area of language processing and text mining. In traditional text classification, a document is represented as a bag of words where the words in other words terms are cut from their finer context i.e. their location in a sentence or in a document. Only the broader context of document is used with some type of term frequency information in the vector space. Consequently, semantics of words that can be inferred from the finer context of its location in a sentence and its relations with neighboring words are usually ignored. However, meaning of words, semantic connections between words, documents and even classes are obviously important since methods that capture semantics generally reach better classification performances. Several surveys have been published to analyze diverse approaches for the traditional text classification methods. Most of these surveys cover application of different semantic term relatedness methods in text classification up to a certain degree. However, they do not specifically target semantic text classification algorithms and their advantages over the traditional text classification. In order to fill this gap, we undertake a comprehensive discussion of semantic text classification vs. traditional text classification. This survey explores the past and recent advancements in semantic text classification and attempts to organize existing approaches under five fundamental categories; domain knowledge-based approaches, corpus-based approaches, deep learning based approaches, word/character sequence enhanced approaches and linguistic enriched approaches. Furthermore, this survey highlights the advantages of semantic text classification algorithms over the traditional text classification algorithms.  相似文献   

8.
The great contemporary organizational challenge for enterprises is to create a conceptual and methodological framework allows the management of knowledge by means of networks designed for social interaction. This statement is based on the premise that the competitive drive and sustainable success of the company depend on the introduction of new forms of production innovative processes, which can only be ensured through integrated approaches to knowledge management and the incorporation information technologies (IT). This is a reality that has already been accepted by the Brazilian Agricultural Research Corporation (Embrapa, its acronym in Portuguese), a Brazilian research, development, and innovation (RD&I) institution supporting agricultural sector. For some years now, Embrapa has been incorporating what it has learned about knowledge management into its strategic planning process. In this paper, we present a new approach to managing knowledge and information, and we analyze the need for research institutions to administer the knowledge they produce through an RD&I management model based multi- and inter-disciplinary teams, and multi-institutional research networks.  相似文献   

9.
Trust has been shown to play an important role in the adoption of information and communication technology (ICT) on an individual and firm level, but has received relatively little attention on a national level. In this paper we examine the impact of generalized trust, as measured by the World Value Survey, on the adoption of ICT products and related phenomena (e.g., such issues as Telecommuting and services such as E-Government Readiness), at a national level, while controlling for a nation's wealth. Because national trust levels having changed over time, we also examine how the rate of change in trust has impacted the adoption of ICT and ICT related phenomenon. Our findings provide strong empirical support for the argument that trust impacts national level adoption. The results are robust as we consider multiple variables and data sources. We also show that changes in trust rates are generally associated with corresponding changes in ICT adoptions.  相似文献   

10.
Over the past few years, data mining has moved from corporations to other organizations. This paper looks at the integration of data mining in digital library services. First, bibliomining, or the combination of bibliometrics and data mining techniques to understand library services, is defined and the concept explored. Second, the conceptual frameworks for bibliomining from the viewpoint of the library decision-maker and the library researcher are presented and compared. Finally, a research agenda to resolve many of the common bibliomining issues and to move the field forward in a mindful manner is developed. The result is not only a roadmap for understanding the integration of data mining in digital library services, but also a template for other cross-discipline data mining researchers to follow for systematic exploration in their own subject domains.  相似文献   

11.
In recent years, electronic journals are in common use in scholarly communication and we can interpret this situation in various ways. On the one hand, we can say that scholarly communication is now much dependent on electronic resources. On the other hand, it would be too simplistic to say that scholarly communication is now greatly dependent on electronic resources because researchers seldom use other electronic resources. The purpose of this article is to show the position of electronic journals in scholarly communication based on Japanese researchers’ information behavior and estimation. The main focus is on distinguishing the function of scholarly journal and the electronic form. A questionnaire was sent to 1427 physicists, 1026 chemists and 1276 pathologists in universities and other research institutes all over Japan, of whom 775 (54.3%), 494 (48.1%) and 541 (42.4%), respectively, supplied answers. The main results are as follows. Japanese researchers in STM fields use electronic journals as a matter of course, and other electronic resources to some extent, for accessing information; but this shift to electronic resources seemed to be not a transformation but a modification of traditional patterns of use. Researchers still rely on traditional scholarly journals for accessing information and publication, although their recognition has begun to change.  相似文献   

12.
In this paper we present the relevance ranking algorithm named PolarityRank. This algorithm is inspired in PageRank, the webpage relevance calculus method used by Google, and generalizes it to deal with graphs having not only positive but also negative weighted arcs. Besides the definition of our algorithm, this paper includes the algebraic justification, the convergence demonstration and an empirical study in which PolarityRank is applied to two unrelated tasks where a graph with positive and negative weights can be built: the calculation of word semantic orientation and instance selection from a learning dataset.  相似文献   

13.
后发国家技术追赶历程可以描述为追赶企业、领先企业及后发国家政府之间多主体、多阶段的博弈.具体过程包括结对学习”与结对竞争”、二次深度学习”与多重竞争”的两个承接阶段.本文通过构建博弈模型及我国大型风电设备制造产业的案例分析,揭示了追赶企业、领先企业如何依据二者技术势差、追赶企业的学习速度、市场机遇进行行动策略选择;也揭示了后发国家政府根据领先企业的转移意愿、领先企业之间达成技术保守联盟的情况,如何适时调整产业政策.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号