首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Question categorization, which suggests one of a set of predefined categories to a user’s question according to the question’s topic or content, is a useful technique in user-interactive question answering systems. In this paper, we propose an automatic method for question categorization in a user-interactive question answering system. This method includes four steps: feature space construction, topic-wise words identification and weighting, semantic mapping, and similarity calculation. We firstly construct the feature space based on all accumulated questions and calculate the feature vector of each predefined category which contains certain accumulated questions. When a new question is posted, the semantic pattern of the question is used to identify and weigh the important words of the question. After that, the question is semantically mapped into the constructed feature space to enrich its representation. Finally, the similarity between the question and each category is calculated based on their feature vectors. The category with the highest similarity is assigned to the question. The experimental results show that our proposed method achieves good categorization precision and outperforms the traditional categorization methods on the selected test questions.  相似文献   

2.
This study theorized and validated a model of knowledge sharing continuance in a special type of online community, the online question answering (Q&A) community, in which knowledge exchange is reflected mainly by asking and answering specific questions. We created a model that integrated knowledge sharing factors and knowledge self-efficacy into the expectation confirmation theory. The hypotheses derived from this model were empirically validated using an online survey conducted among users of a famous online Q&A community in China, “Yahoo! Answers China”. The results suggested that users’ intention to continue sharing knowledge (i.e., answering questions) was directly influenced by users’ ex-post feelings as consisting of two dimensions: satisfaction, and knowledge self-efficacy. Based on the obtained results, we also found that knowledge self-efficacy and confirmation mediated the relationship between benefits and satisfaction.  相似文献   

3.
The purpose of the current study is to identify the user criteria and data-driven features, both textual and non-textual, for assessing the quality of answers posted on social questioning and answering sites (social Q&A) across four different knowledge domains—Science, Technology, Art and Recreation. A comprehensive review of literature on quality assessment of information produced in social contexts was carried out to develop the theoretical framework for the current study. A total of 23 user criteria and 24 data features were proposed and tested with high-quality answers obtained from four social Q&A sites in Stack Exchange. Findings indicate that content-related criteria and user and review features were the most frequently used in quality assessments, while the importance of user criteria and data features was variable across the knowledge domains. In the Technology Q&A site containing mostly self-help questions, the utility class was the most frequently used group of criteria. The popularity of the socio-emotional class was more apparent in discussion-oriented topic categories such as Art and Recreation, where people seek others’ opinions or advice. Users of Art and Recreation Q&A sites in Stack Exchange appear to place more value on answerers’ efforts and time, good attitudes or manners, personal experience, and the same taste. The importance of user features and the emphasis on answerer's expertise on the Science Q&A site was observed. Examining the connection or gap between user quality criteria and data features across the knowledge domains could help to better understand users’ evaluation behaviors for their preferred answers, and identify the potential of social Q&A for user education/intervention in answer quality evaluation. This examination also offers practical guidance for designing more effective social Q&A platforms, considering how to customize community support systems, motivate contributions, and control content quality.  相似文献   

4.
Currently, many software companies are looking to assemble a team of experts who can collaboratively carry out an assigned project in an agile manner. The most ideal members for an agile team are T-shaped experts, who not only have expertise in one skill-area but also have general knowledge in a number of related skill-areas. Existing related methods have only used some heuristic non-machine learning models to form an agile team from candidates, while machine learning has been successful in similar tasks. In addition, they have only used the number of candidates’ documents in various skill-areas as a resource to estimate the candidates’ T-shaped knowledge to work in an agile team, while the content of their documents is also very important. To this end, we propose a multi-step method that rectifies the drawbacks mentioned. In this method, we first pick out the best possible candidates using a state-of-the-art model, then we re-estimate their relevant knowledge for working in the team with the help of a deep learning model, which uses the content of the candidates’ posts on StackOverflow. Finally, we select the best possible members for the given agile team from among these candidates using an integer linear programming model. We perform our experiments on two large datasets C# and Java, which comprise 2,217,366 and 2,320,883 posts from StackOverflow, respectively. On datasets C# and Java, our method selects, respectively, 68.6% and 55.2% of the agile team members from among T-shaped experts, while the best baseline method only selects, respectively, 49.1% and 40.2% of the agile team members from among T-shaped experts. In addition, the results show that our method outperforms the best baseline method by 8.1% and 11.4% in terms of F-measure on datasets C# and Java, respectively.  相似文献   

5.
Tables in documents are a widely-available and rich source of information, but not yet well-utilised computationally because of the difficulty in automatically extracting their structure and data content. There has been a plethora of systems proposed to solve the problem, but current methods present low usability and accuracy and lack precision in detecting data from diverse layouts. We propose a component-based design and implementation of table processing concepts which can offer flexibility and re-usability as well as high performance on a wide range of table types. In this paper, we describe a system named TEXUS which is a fully automated table processing system that takes a PDF document and detects tables in a layout independent manner. We introduce TEXUS’s own table processing specific document model and the two-phased processing pipeline design. Through an extensive evaluation on a dataset comprised of complex financial tables, we show the performance of the system on different table types.  相似文献   

6.
7.
Due to the worldwide accessibility to the Internet along with the continuous advances in mobile technologies, physical and digital worlds have become completely blended, and the proliferation of social media platforms has taken a leading role over this evolution. In this paper, we undertake a thorough analysis towards better visualising and understanding the factors that characterise and differentiate social media users affected by mental disorders. We perform different experiments studying multiple dimensions of language, including vocabulary uniqueness, word usage, linguistic style, psychometric attributes, emotions’ co-occurrence patterns, and online behavioural traits, including social engagement and posting trends.Our findings reveal significant differences on the use of function words, such as adverbs and verb tense, and topic-specific vocabulary, such as biological processes. As for emotional expression, we observe that affected users tend to share emotions more regularly than control individuals on average. Overall, the monthly posting variance of the affected groups is higher than the control groups. Moreover, we found evidence suggesting that language use on micro-blogging platforms is less distinguishable for users who have a mental disorder than other less restrictive platforms. In particular, we observe on Twitter less quantifiable differences between affected and control groups compared to Reddit.  相似文献   

8.
针对平原区园地与居民点落叶林不易区分的特点,提出适合于中分辨遥感影像的平原区园地信息提取特征指标——平原区园地指数.利用影像的多时相特征,结合面向对象的分类方法构建平原区园地信息提取模式.安徽省砀山县园地提取实验表明,该方法简单易行,有效避免了“椒盐现像”,提高了分类精度,对于准确地确定平原区园地面积及其分布情况具有重要的实际应用价值.  相似文献   

9.
Named Entity Recognition (NER) aims to automatically extract specific entities from the unstructured text. Compared with performing NER in English, Chinese NER is more challenging in recognizing entity boundaries because there are no explicit delimiters between Chinese characters. However, most previous researches focused on the semantic information of the Chinese language on the character level but ignored the importance of the phonetic characteristics. To address these issues, we integrated phonetic features of Chinese characters with the lexicon information to help disambiguate the entity boundary recognition by fully exploring the potential of Chinese as a pictophonetic language. In addition, a novel multi-tagging-scheme learning method was proposed, based on the multi-task learning paradigm, to alleviate the data sparsity and error propagation problems that occurred in the previous tagging schemes, by separately annotating the segmentation information of entities and their corresponding entity types. Extensive experiments performed on four Chinese NER benchmark datasets: OntoNotes4.0, MSRA, Resume, and Weibo, show that our proposed method consistently outperforms the existing state-of-the-art baseline models. The ablation experiments further demonstrated that the introduction of the phonetic feature and the multi-tagging-scheme has a significant positive effect on the improvement of the Chinese NER task.  相似文献   

10.
In Mongolian, two different alphabets are used, Cyrillic and Mongolian. In this paper, we focus solely on the Mongolian language using the Cyrillic alphabet, in which a content word can be inflected when concatenated with one or more suffixes. Identifying the original form of content words is crucial for natural language processing and information retrieval. We propose a lemmatization method for Mongolian. The advantage of our lemmatization method is that it does not rely on noun dictionaries, enabling us to lemmatize out-of-dictionary words. We also apply our method to indexing for information retrieval. We use newspaper articles and technical abstracts in experiments that show the effectiveness of our method. Our research is the first significant exploration of the effectiveness of lemmatization for information retrieval in Mongolian.  相似文献   

11.
Recording search histories, presenting them to the searcher, and building additional interface tools on them offer many opportunities for supporting user tasks in information seeking and use. This study investigated the use of search history information in legal information seeking. Qualitative methods were used to explore how attorneys and law librarians used their memory and external memory aids while searching for information and in transferring to information use. Based on the findings, interface design recommendations were made for information systems.  相似文献   

12.
Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.  相似文献   

13.
南药巴戟天多糖提取方法的比较研究   总被引:10,自引:0,他引:10  
以巴戟天(Morinda Offcinalis)为原料,运用多种理化方法,提取巴戟天粗多糖,经红外吸收光谱、紫外吸收光谱的比较和鉴定确认提取物中有多糖特征吸收.测定粗多糖中多糖、蛋白质组成物质的含量,以多糖的纯度和提取率为主要指标,对提取方法予以比较,结果表明蒸馏水处理为较理想的提取方法.  相似文献   

14.
Qualitative researchers in information management research often need to evaluate inter-coder reliability (ICR) to test the trustworthiness of their content analysis. A suitable method of evaluating ICR enables researchers to rigorously assess the degree of agreement among two or more independent qualitative coders. This allows researchers to identify mistakes in the content analysis before the codes are used in developing and testing a theory or a measurement model and avoid any associated time, effort and financial cost. Different methods have been proposed, but little guidance is available on which approach to evaluating ICR should be used. In this paper, we review and compare leading ICR methods that are suitable for qualitative information management research. We propose an approach for selecting and using an ICR method, supported by an illustrative example. The five steps in our proposed approach include: selecting an ICR method based on its characteristics and requirements of a project; developing a coding scheme; selecting and training independent coders; calculating the ICR coefficient and resolving discrepancies; and reporting the process of evaluating ICR and its results.  相似文献   

15.
16.
Modelling is a classic approach to understanding complex problems that can be achieved diagrammatically to visualise concepts, and mathematically to analyse attributes of concepts. An organisation as a communicating entity is a made up of constructs in which people can have access to information and speak to each other. Modelling information flow for organisations is a challenging task that enables analysts and managers to better understand how to: organise and coordinate processes, eliminate redundant information flows and processes, minimise the duplication of information and manage the sharing of intra- and inter-organisational information.  相似文献   

17.
Abstract

Although stigma is widely accepted to be a multidimensional construct, the implications of its dimensions for social support warrant greater consideration. We conducted a meta-analysis of 31 content analyses to investigate the association between specific dimensions of stigma and the types of social support messages shared in health-related contexts online. Among health conditions where character stigma was greater, information, network, and tangible support were more prevalent. Physical stigma was associated with a higher prevalence of esteem support. Information, emotional, network, and tangible support were more prevalent among health conditions where concealable stigma was greater. Among health conditions where visible stigma was greater, information, and esteem support were more prevalent. Our study contributes to stigma and social support research by providing evidence that health-related stigma has multiple dimensions each with distinct implications for social support. More broadly, this project offers a framework that can be used to examine the ways in which social meanings of health conditions may be translated into digital behavior.  相似文献   

18.
兰小筠  胡奕  易曙 《情报科学》2001,19(7):681-685
本文通过对长沙市信息产业的调研,分析了中小型城市信息产业的现状,探讨了中小型城市信息产业的发展策略。  相似文献   

19.
This paper is concerned with the problem of adaptive disturbance attenuation for a class of nonlinear systems. The traditional adaptive methods are almost impossible to compensate the time-varying unknown disturbance by designing parameter adaptive laws without a priori knowledge about the bounds of external disturbances. To solve the problem, a new strategy is proposed by constructing an augmented system where the external disturbance is considered as another component of the augmented state vector. Based on this, a double-gain nonlinear observer is employed to estimate the state of the augmented nonlinear system. Further, an output feedback control strategy is designed, and it is proved that the proposed strategy ensures that all the signals are bounded and the tracking error exponentially converges to an adjustable compact set. Finally, an example is performed to demonstrate the validity of the proposed scheme.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号