首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Question-answering has become one of the most popular information retrieval applications. Despite that most question-answering systems try to improve the user experience and the technology used in finding relevant results, many difficulties are still faced because of the continuous increase in the amount of web content. Questions Classification (QC) plays an important role in question-answering systems, with one of the major tasks in the enhancement of the classification process being the identification of questions types. A broad range of QC approaches has been proposed with the aim of helping to find a solution for the classification problems; most of these are approaches based on bag-of-words or dictionaries. In this research, we present an analysis of the different type of questions based on their grammatical structure. We identify different patterns and use machine learning algorithms to classify them. A framework is proposed for question classification using a grammar-based approach (GQCC) which exploits the structure of the questions. Our findings indicate that using syntactic categories related to different domain-specific types of Common Nouns, Numeral Numbers and Proper Nouns enable the machine learning algorithms to better differentiate between different question types. The paper presents a wide range of experiments the results show that the GQCC using J48 classifier has outperformed other classification methods with 90.1% accuracy.  相似文献   

Consumers evaluate products through online reviews, in addition to sharing their product experiences. Online reviews affect product marketing, and companies use online reviews to investigate consumer attitudes and perceptions of their products. However, when analyzing a review, it is often the case that specific contexts are not taken into consideration and meaningful information is not obtained from the analysis results. This study suggests a methodology for analyzing reviews in the context of comparing two competing products. In addition, by analyzing the discriminative attributes of competing products, we were able to derive more specific information than an overall product analysis. Analyzing the discriminative attributes in the context of comparing competing products provides clarity on analyzing the strengths and weaknesses of competitive products and provides realistic information that can help the company's management activities. Considering this purpose, this study collected a review of the BB Cream product line in the cosmetics field. The analysis was sequentially carried out in three stages. First, we extracted words that represent discriminative attributes by analyzing the percentage difference of words. Second, different attribute words were classified according to the meaning used in the review by using latent semantic analysis. Finally, the polarity of discriminative attribute words was analyzed using Labeled-LDA. This analysis method can be used as a market research method as it can extract more information than a traditional survey or interview method, and can save cost and time through the automation of the program.  相似文献   

This paper investigates the role of process coordination dynamics and information exchanges in maritime logistics. To this aim, a case study in a mid-sized port supported by a Port Community System (PCS) was developed. Exploiting data retrieved from the PCS, the methodology combined three data-driven techniques – Process Mining (PM), Social Network Analysis (SNA) and Text Mining – to draw handover social networks among the port logistics players, and to assess the export process efficiency and significant process deviations. Then, two sets of regression models were developed to explore the effects of network dynamics on process performances. Preliminary results point out that the process fragmentation and the frequent communication switching among the port actors could negatively affect the export process efficiency and effectiveness. Finally, the study proposes practical solutions for reducing process fragmentation and improving information exchange among port actors.  相似文献   

Transductive classification is a useful way to classify texts when labeled training examples are insufficient. Several algorithms to perform transductive classification considering text collections represented in a vector space model have been proposed. However, the use of these algorithms is unfeasible in practical applications due to the independence assumption among instances or terms and the drawbacks of these algorithms. Network-based algorithms come up to avoid the drawbacks of the algorithms based on vector space model and to improve transductive classification. Networks are mostly used for label propagation, in which some labeled objects propagate their labels to other objects through the network connections. Bipartite networks are useful to represent text collections as networks and perform label propagation. The generation of this type of network avoids requirements such as collections with hyperlinks or citations, computation of similarities among all texts in the collection, as well as the setup of a number of parameters. In a bipartite heterogeneous network, objects correspond to documents and terms, and the connections are given by the occurrences of terms in documents. The label propagation is performed from documents to terms and then from terms to documents iteratively. Nevertheless, instead of using terms just as means of label propagation, in this article we propose the use of the bipartite network structure to define the relevance scores of terms for classes through an optimization process and then propagate these relevance scores to define labels for unlabeled documents. The new document labels are used to redefine the relevance scores of terms which consequently redefine the labels of unlabeled documents in an iterative process. We demonstrated that the proposed approach surpasses the algorithms for transductive classification based on vector space model or networks. Moreover, we demonstrated that the proposed algorithm effectively makes use of unlabeled documents to improve classification and it is faster than other transductive algorithms.  相似文献   

Queries submitted to search engines can be classified according to the user goals into three distinct categories: navigational, informational, and transactional. Such classification may be useful, for instance, as additional information for advertisement selection algorithms and for search engine ranking functions, among other possible applications. This paper presents a study about the impact of using several features extracted from the document collection and query logs on the task of automatically identifying the users’ goals behind their queries. We propose the use of new features not previously reported in literature and study their impact on the quality of the query classification task. Further, we study the impact of each feature on different web collections, showing that the choice of the best set of features may change according to the target collection.  相似文献   

Scholarly communication is undergoing transformation under the confluence of many forces. The purpose of this article is to explore trends in transforming scholarly publishing and their implications. It examines how collaboration and volume of information production were changed over the past century. It also explores how older documents are used under today’s network environment where new information is easily accessible. Understanding these trends would help us design more effective electronic scholarly publishing systems and digital libraries, and serve the needs of scholars more responsively.  相似文献   

Data availability and access to various platforms, is changing the nature of Information Systems (IS) studies. Such studies often use large datasets, which may incorporate structured and unstructured data, from various platforms. The questions that such papers address, in turn, may attempt to use methods from computational science like sentiment mining, text mining, network science and image analytics to derive insights. However, there is often a weak theoretical contribution in many of these studies. We point out the need for such studies to contribute back to the IS discipline, whereby findings can explain more about the phenomenon surrounding the interaction of people with technology artefacts and the ecosystem within which these contextual usage is situated. Our opinion paper attempts to address this gap and provide insights on the methodological adaptations required in “big data studies” to be converted into “IS research” and contribute to theory building in information systems.  相似文献   

Social media have been adopted by many businesses. More and more companies are using social media tools such as Facebook and Twitter to provide various services and interact with customers. As a result, a large amount of user-generated content is freely available on social media sites. To increase competitive advantage and effectively assess the competitive environment of businesses, companies need to monitor and analyze not only the customer-generated content on their own social media sites, but also the textual information on their competitors’ social media sites. In an effort to help companies understand how to perform a social media competitive analysis and transform social media data into knowledge for decision makers and e-marketers, this paper describes an in-depth case study which applies text mining to analyze unstructured text content on Facebook and Twitter sites of the three largest pizza chains: Pizza Hut, Domino's Pizza and Papa John's Pizza. The results reveal the value of social media competitive analysis and the power of text mining as an effective technique to extract business value from the vast amount of available social media data. Recommendations are also provided to help companies develop their social media competitive analysis strategy.  相似文献   

Quickly and accurately summarizing representative opinions is a key step for assessing microblog sentiments. The Ortony-Clore-Collins (OCC) model of emotion can offer a rule-based emotion export mechanism. In this paper, we propose an OCC model and a Convolutional Neural Network (CNN) based opinion summarization method for Chinese microblogging systems. We test the proposed method using real world microblog data. We then compare the accuracy of manual sentiment annotation to the accuracy using our OCC-based sentiment classification rule library. Experimental results from analyzing three real-world microblog datasets demonstrate the efficacy of our proposed method. Our study highlights the potential of combining emotion cognition with deep learning in sentiment analysis of social media data.  相似文献   

Connoisseur consumption is continuing to grow in popularity, with more niche retailers and specialty firms servicing increasingly discerning consumers. Despite the wealth of consumer data from social media platforms, there has been little empirical focus on how consumers make sense of their experiences after interacting with cultural interlocutors from niche industries with highly specialized knowledge. In order to scrutinize the process of distinction making in practice and reception, this study employs a mixed methods approach to triangulate the production, reception, and practice of taste-making at four coffee fairs held in Toronto, Ontario, and Hamilton, Ontario. Through ethnographic fieldwork, conventional content analysis, and a discourse network analysis of social media usage from attendees, this study finds that there are important contextual differences that affect which discourses are present in-person and appear online.  相似文献   

Automatic text classification is the task of organizing documents into pre-determined classes, generally using machine learning algorithms. Generally speaking, it is one of the most important methods to organize and make use of the gigantic amounts of information that exist in unstructured textual format. Text classification is a widely studied research area of language processing and text mining. In traditional text classification, a document is represented as a bag of words where the words in other words terms are cut from their finer context i.e. their location in a sentence or in a document. Only the broader context of document is used with some type of term frequency information in the vector space. Consequently, semantics of words that can be inferred from the finer context of its location in a sentence and its relations with neighboring words are usually ignored. However, meaning of words, semantic connections between words, documents and even classes are obviously important since methods that capture semantics generally reach better classification performances. Several surveys have been published to analyze diverse approaches for the traditional text classification methods. Most of these surveys cover application of different semantic term relatedness methods in text classification up to a certain degree. However, they do not specifically target semantic text classification algorithms and their advantages over the traditional text classification. In order to fill this gap, we undertake a comprehensive discussion of semantic text classification vs. traditional text classification. This survey explores the past and recent advancements in semantic text classification and attempts to organize existing approaches under five fundamental categories; domain knowledge-based approaches, corpus-based approaches, deep learning based approaches, word/character sequence enhanced approaches and linguistic enriched approaches. Furthermore, this survey highlights the advantages of semantic text classification algorithms over the traditional text classification algorithms.  相似文献   

According to Freud “words were originally magic and to this day words have retained much of their ancient magical power”. By words, behaviors are transformed and problems are solved. The way we use words reveals our intentions, goals and values. Novel tools for text analysis help understand the magical power of words. This power is multiplied, if it is combined with the study of social networks, i.e. with the analysis of relationships among social units. This special issue of the International Journal of Information Management, entitled “Combining Social Network Analysis and Text Mining: from Theory to Practice”, includes heterogeneous and innovative research at the nexus of text mining and social network analysis. It aims to enrich work at the intersection of these fields, which still lags behind in theoretical, empirical, and methodological foundations. The nine articles accepted for inclusion in this special issue all present methods and tools that have business applications. They are summarized in this editorial introduction.  相似文献   

提出一种基于LDA主题模型的科技新闻主题分析方法,选取2009—2018年中、澳、英、美4国极地科考新闻数据,从主题类型和主题强度角度进行主题演化分析。在中文新闻中,极地测绘等主题的热度上升,极地冰川科考主题的热度下降;在英文新闻中,热门主题为极地冰川科考与极地海洋科考;其余主题热度相对稳定。研究结果表明,该方法可以有效识别科技新闻主题并揭示其演化趋势,可以有效改善网络环境下科技情报分析的自动化程度。  相似文献   

The great contemporary organizational challenge for enterprises is to create a conceptual and methodological framework allows the management of knowledge by means of networks designed for social interaction. This statement is based on the premise that the competitive drive and sustainable success of the company depend on the introduction of new forms of production innovative processes, which can only be ensured through integrated approaches to knowledge management and the incorporation information technologies (IT). This is a reality that has already been accepted by the Brazilian Agricultural Research Corporation (Embrapa, its acronym in Portuguese), a Brazilian research, development, and innovation (RD&I) institution supporting agricultural sector. For some years now, Embrapa has been incorporating what it has learned about knowledge management into its strategic planning process. In this paper, we present a new approach to managing knowledge and information, and we analyze the need for research institutions to administer the knowledge they produce through an RD&I management model based multi- and inter-disciplinary teams, and multi-institutional research networks.  相似文献   

朱秀华 《现代情报》2009,29(5):163-165
针对信息挖掘中的网页自动分类问题,提出了一种基于向量空间模型和并联BP网络的分类方法。该网络由并行连接的多个子网络组成,每个子网络负责一类模式特征的提取,多个子网并行处理所有模式,将分类结果在总输出层表现出来。以因特网上旅游网页分类为例验证了该方法的有效性。  相似文献   

Vendors of mobile communication applications/services (apps) aim at improve their designs to attract and retain users, and thus achieve the critical mass needed to ensure the success of their services. Despite the significant number of prior mobile service studies, few works have examined the effects of inertia and satisfaction on the users’ continuance intention with regard to specific mobile communication apps from a mobile-service-quality perspective. By integrating the mobile service quality framework, inertia, and user satisfaction, this study develops a model for interpreting the development of the continuance intention of users of mobile communication apps. Data collected from 238 users of such apps provided support for the model. The results indicated that interaction quality, environment quality, inertia, and user satisfaction are key determinants of continuance intention, while outcome quality is not. The theoretical and practical implications of this work are discussed.  相似文献   

Social media data have recently attracted considerable attention as an emerging voice of the customer as it has rapidly become a channel for exchanging and storing customer-generated, large-scale, and unregulated voices about products. Although product planning studies using social media data have used systematic methods for product planning, their methods have limitations, such as the difficulty of identifying latent product features due to the use of only term-level analysis and insufficient consideration of opportunity potential analysis of the identified features. Therefore, an opportunity mining approach is proposed in this study to identify product opportunities based on topic modeling and sentiment analysis of social media data. For a multifunctional product, this approach can identify latent product topics discussed by product customers in social media using topic modeling, thereby quantifying the importance of each product topic. Next, the satisfaction level of each product topic is evaluated using sentiment analysis. Finally, the opportunity value and improvement direction of each product topic from a customer-centered view are identified by an opportunity algorithm based on product topics’ importance and satisfaction. We expect that our approach for product planning will contribute to the systematic identification of product opportunities from large-scale customer-generated social media data and will be used as a real-time monitoring tool for changing customer needs analysis in rapidly evolving product environments.  相似文献   

With the rapid development of information technology, customers not only shop online—they also post reviews on social media. This user-generated content (UGC) can be useful to understand customers’ shopping experiences and influence future customers’ purchase intentions. Therefore, business intelligence and analytics are increasingly being advocated as a way to analyze customers’ UGC in social media and support firms’ marketing activities. However, because of its open structure, UGC such as customer reviews can be difficult to analyze, and firms find it challenging to harness UGC. To fill this gap, this study aims to examine customer satisfaction and dissatisfaction toward attributes of hotel products and services based on online customer textual reviews. Using a text mining approach, latent semantic analysis (LSA), we identify the key attributes driving customer satisfaction and dissatisfaction toward hotel products and service attributes. Additionally, using a regression approach, we examine the effects of travel purposes, hotel types, star level, and editor recommendations on customers’ perceptions of attributes of hotel products and services. This study bridges customer online textual reviews with customers’ perceptions to help business managers better understand customers’ needs through UGC.  相似文献   

Multimodal sentiment analysis aims to judge the sentiment of multimodal data uploaded by the Internet users on various social media platforms. On one hand, existing studies focus on the fusion mechanism of multimodal data such as text, audio and visual, but ignore the similarity of text and audio, text and visual, and the heterogeneity of audio and visual, resulting in deviation of sentiment analysis. On the other hand, multimodal data brings noise irrelevant to sentiment analysis, which affects the effectness of fusion. In this paper, we propose a Polar-Vector and Strength-Vector mixer model called PS-Mixer, which is based on MLP-Mixer, to achieve better communication between different modal data for multimodal sentiment analysis. Specifically, we design a Polar-Vector (PV) and a Strength-Vector (SV) for judging the polar and strength of sentiment separately. PV is obtained from the communication of text and visual features to decide the sentiment that is positive, negative, or neutral sentiment. SV is gained from the communication between the text and audio features to analyze the sentiment strength in the range of 0 to 3. Furthermore, we devise an MLP-Communication module (MLP-C) composed of several fully connected layers and activation functions to make the different modal features fully interact in both the horizontal and the vertical directions, which is a novel attempt to use MLP for multimodal information communication. Finally, we mix PV and SV to obtain a fusion vector to judge the sentiment state. The proposed PS-Mixer is tested on two publicly available datasets, CMU-MOSEI and CMU-MOSI, which achieves the state-of-the-art (SOTA) performance on CMU-MOSEI compared with baseline methods. The codes are available at: https://github.com/metaphysicser/PS-Mixer.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号