首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A challenge for sentence categorization and novelty mining is to detect not only when text is relevant to the user’s information need, but also when it contains something new which the user has not seen before. It involves two tasks that need to be solved. The first is identifying relevant sentences (categorization) and the second is identifying new information from those relevant sentences (novelty mining). Many previous studies of relevant sentence retrieval and novelty mining have been conducted on the English language, but few papers have addressed the problem of multilingual sentence categorization and novelty mining. This is an important issue in global business environments, where mining knowledge from text in a single language is not sufficient. In this paper, we perform the first task by categorizing Malay and Chinese sentences, then comparing their performances with that of English. Thereafter, we conduct novelty mining to identify the sentences with new information. Experimental results on TREC 2004 Novelty Track data show similar categorization performance on Malay and English sentences, which greatly outperform Chinese. In the second task, it is observed that we can achieve similar novelty mining results for all three languages, which indicates that our algorithm is suitable for novelty mining of multilingual sentences. In addition, after benchmarking our results with novelty mining without categorization, it is learnt that categorization is necessary for the successful performance of novelty mining.  相似文献   

2.
杜辉  陈德敏 《资源科学》2012,34(1):58-64
《矿产资源法》在立法理念、框架体系和核心制度上存在的问题给我国矿产资源开发利用和保护的法治化造成了阻碍。在法律模式上,修订后的《矿法》应当是一部能够兼容公共利益和私权利,兼具民法和行政法属性的完整性法律。厘清包含在矿产资源开发利用中三组法理关系,即所有权与用益物权关系、利益分享关系、资源开发与资源安全和环境保护关系,是《矿产资源法》修改的前提。矿业权、矿产资源税费、矿产资源规划、矿产资源储量、矿山环境保护和法律责任等方面是《矿法》修改的重心。  相似文献   

3.
文本挖掘在网络舆情信息分析中的应用   总被引:15,自引:0,他引:15  
网络舆情已成为社会情报的一种重要表现形式.挖掘技术为网上大量以非结构化数据形式出现的舆情信息分析提供了方法和技术支持.介绍了网络舆情的特点与作用,分析了文本挖掘技术的主要功能,提出网络舆情信息挖掘分析模型,并以实例说明文本挖掘在网络舆情分析中的应用.  相似文献   

4.
The SALOMON system automatically summarizes Belgian criminal cases in order to improve access to the large number of existing and future court decisions. SALOMON extracts relevant text units from the case text to form a case summary. Such a case profile facilitates the rapid determination of the relevance of the case or may be employed in text search. In a first important abstracting step SALOMON performs an initial categorization of legal criminal cases and structures the case text into separate legally relevant and irrelevant components. A text grammar represented as a semantic network is used to automatically determine the category of the case and its components. In this way, we are able to extract from the case general data and to identify text portions relevant for further abstracting. It is argued that prior knowledge of the text structure and its indicative cues may support automatic abstracting. A text grammar is a promising form for representing the knowledge involved.  相似文献   

5.
介绍网络舆情监测的概念及研究现状,从信息提取、文本挖掘处理、舆情分类、文本表示与主题发现、舆情意见挖掘和观点分析5个方面,介绍网络舆情监测的有关技术,并将网络舆情监测技术中涉及的各关键挖掘技术进行整合,讨论了网络舆情监测技术在实践上的应用及其意义。  相似文献   

6.
Since changes in job characteristics in areas such as Industry 4.0 are rapid, fast tool for analysis of job advertisements is needed. Current knowledge about competencies required in Industry 4.0 is scarce. The goal of this paper is to develop a profile of Industry 4.0 job advertisements, using text mining on publicly available job advertisements, which are often used as a channel for collecting relevant information about the required knowledge and skills in rapid-changing industries. We searched website, which publishes job advertisements, related to Industry 4.0, and performed text mining analysis on the data collected from those job advertisements. Analysis of the job advertisements revealed that most of them were for full time entry; associate and mid-senior level management positions and mainly came from the United States and Germany. Text mining analysis resulted in two groups of job profiles. The first group of job profiles was focused solely on the knowledge related to Industry 4.0: cyberphysical systems and the Internet of things for robotized production; and smart production design and production control. The second group of job profiles was focused on more general knowledge areas, which are adapted to Industry 4.0: supply change management, customer satisfaction, and enterprise software. Topic mining was conducted on the extracted phrases generating various multidisciplinary job profiles. Higher educational institutions, human resources professionals, as well as experts that are already employed or aspire to be employed in Industry 4.0 organizations, would benefit from the results of our analysis.  相似文献   

7.
在公众科学项目展示中,研究者需要对项目预算进行详细说明,包括总体融资金额、预算条目、预算描述等。为了深入分析预算对公众科学参与意愿的影响,采用来自知名公众科学平台Experiment上的850个公众科学项目及其3,356条预算数据,本文进行了三项研究:(1)预算金额设置;(2)预算条目设置;(3)预算的文本描述。使用数据挖掘、文本挖掘和计量方法,对公众科学项目的预算及对公众参与意愿的影响进行分析。实证研究表明,预算金额对公众参与意愿具有正面影响;预算条目越丰富,越能吸引更多公众参与项目。详细的预算文本描述扩大了公众科学参与者规模以及融得的资金量,但是这种效用呈现倒U型影响。而预算文本的易读性显著提升了公众参与公众科学项目的意愿。主观性描述(相对客观性描述)降低了项目的吸引力,并具有倒U型影响。相对预测性描述,在预算中采用事实性描述能够显著提升项目的吸引力。本文展示了公众科学项目预算的影响,为公众科学项目的研究者设置和撰写有吸引力的项目预算提供了理论依据和实践参考。  相似文献   

8.
Since previous studies in cognitive psychology show that individuals’ affective states can help analyze and predict their future behaviors, researchers have explored emotion mining for predicting online activities, firm profitability, and so on. Existing emotion mining methods are divided into two categories: feature-based approaches that rely on handcrafted annotations and deep learning-based methods that thrive on computational resources and big data. However, neither category can effectively detect emotional expressions captured in text (e.g., social media postings). In addition, the utilization of these methods in downstream explanatory and predictive applications is also rare. To fill the aforementioned research gaps, we develop a novel deep learning-based emotion detector named DeepEmotionNet that can simultaneously leverage contextual, syntactic, semantic, and document-level features and lexicon-based linguistic knowledge to bootstrap the overall emotion detection performance. Based on three emotion detection benchmark corpora, our experimental results confirm that DeepEmotionNet outperforms state-of-the-art baseline methods by 4.9% to 29.8% in macro-averaged F-score. For the downstream application of DeepEmotionNet to a real-world financial application, our econometric analysis highlights that top executives’ emotions of fear and anger embedded in their social media postings are significantly associated with corporate financial performance. Furthermore, these two emotions can significantly improve the predictive power of corporate financial performance when compared to sentiments. To the best of our knowledge, this is the first study to develop a deep learning-based emotion detection method and successfully apply it to enhance corporate performance prediction.  相似文献   

9.
[目的/意义]掌握和了解微博环境下高校舆情情感的演化规律,对相关部门加强高校舆情监测监管,使高校适时采取措施应对负面舆情事件的恶性传播具有十分重要的意义。[方法/过程]本文通过文本挖掘并利用词云可视化展示对文本特征进行分析;基于朴素贝叶斯分类器将网络用户评论文本进行情感分类;结合用户情感演化与舆情事件发展周期的分析动态展示高校舆情情感演化图谱。[结果/结论]网民负向情感的占比在舆情蔓延期达到顶峰,中性情感的占比在舆情蔓延期最低,正向情感的占比在舆情周期中几乎没有变化。通过对微博环境下高校舆情情感演化图谱进行研究,为微博环境下高校舆情的研究提供新的理论支撑,在实践层面为舆情监管部门及时监测和有效引导高校舆情走向起到针对性的作用。  相似文献   

10.
Automated legal text classification is a prominent research topic in the legal field. It lays the foundation for building an intelligent legal system. Current literature focuses on international legal texts, such as Chinese cases, European cases, and Australian cases. Little attention is paid to text classification for U.S. legal texts. Deep learning has been applied to improving text classification performance. Its effectiveness needs further exploration in domains such as the legal field. This paper investigates legal text classification with a large collection of labeled U.S. case documents through comparing the effectiveness of different text classification techniques. We propose a machine learning algorithm using domain concepts as features and random forests as the classifier. Our experiment results on 30,000 full U.S. case documents in 50 categories demonstrated that our approach significantly outperforms a deep learning system built on multiple pre-trained word embeddings and deep neural networks. In addition, applying only the top 400 domain concepts as features for building the random forests could achieve the best performance. This study provides a reference to select machine learning techniques for building high-performance text classification systems in the legal domain or other fields.  相似文献   

11.
12.
Social emotion refers to the emotion evoked to the reader by a textual document. In contrast to the emotion cause extraction task which analyzes the cause of the author's sentiments based on the expressions in text, identifying the causes of social emotion evoked to the reader from text has not been explored previously. Social emotion mining and its cause analysis is not only an important research topic in Web-based social media analytics and text mining but also has a number of applications in multiple domains. As the focus of social emotion cause identification is on analyzing the causes of the reader's emotions elicited by a text that are not explicitly or implicitly expressed, it is a challenging task fundamentally different from the previous research. To tackle this, it also needs a deeper level understanding of the cognitive process underlying the inference of social emotion and its cause analysis. In this paper, we propose the new task of social emotion cause identification (SECI). Inspired by the cognitive structure of emotions (OCC) theory, we present a Cognitive Emotion model Enhanced Sequential (CogEES) method for SECI. Specifically, based on the implications of the OCC model, our method first establishes the correspondence between words/phrases in text and emotional dimensions identified in OCC and builds the emotional dimension lexicons with 1,676 distinct words/phrases. Then, our method utilizes lexicons information and discourse coherence for the semantic segmentation of document and the enhancement of clause representation learning. Finally, our method combines text segmentation and clause representation into a sequential model for cause clause prediction. We construct the SECI dataset for this new task and conduct experiments to evaluate CogEES. Our method outperforms the baselines and achieves over 10% F1 improvement on average, with better interpretability of the prediction results.  相似文献   

13.
[目的/意义]准确把握公众微博评论中所反映的公众观点并总结舆论焦点,有助于及时获取和引导社会舆情态势,对政府公信力、快速响应能力及执行力提升具有支撑作用。[方法/过程]文章针对当前政府微博评论社会功能发挥的现实要求和其文本特征挖掘的技术需求,从基于深度学习的文本智能语义理解和挖掘出发,提出了适用的细粒度四元组标注策略,构建了政府微博评论观点抽取与焦点呈现的深度学习模型POF-BiLSTM-CRF,即通过细粒度标注策略确定、Word2vec训练词向量、BiLSTM评论特征学习进行标签及其概率输出、CRF学习上下文实现微博评论标注优化,以及观点聚类和主题词提取后最终呈现舆论焦点。[结果/结论]针对"中国警方在线"微博评论的实验表明,文章所提研究框架和模型能够有效进行舆论观点的智能化提取,为快速把握公众观点及为政府决策提供了参考。  相似文献   

14.
我国农村公共产品供给的制度困境及对策选择   总被引:6,自引:0,他引:6  
苏晓艳  范兆斌 《软科学》2005,19(2):51-53,59
农村公共产品供给效率低下是我国三农问题领域的一个长期症结,既体现了公共产品供给的一般问题,也反映了我国农村公共产品供给的特殊矛盾。从供给制度的角度剖析了我国农村公共产品供给的困境,并初步探讨了相应的配套改革措施。  相似文献   

15.
卢明纯 《现代情报》2010,30(7):34-38
在研究分析国内外法律知识库的成果基础上,结合中国的法律法规构建了基于OWL本体的法律知识库原型系统。在知识库原型中加入了国内部分法规涉及的行为及处罚,实现了法律知识的表示和推理。  相似文献   

16.
邓凯英  彭超 《现代情报》2013,33(11):38-41
网络舆情作为一种重要的舆情形式,具有形成速度快,受众人群广等特点,对国家和社会的影响越来越重大。互联网用户可以自由地在微博、论坛、博客等中发表有关社会中各类现实问题的态度和意见。监测网络舆情的主要手段就是利用网络爬虫对目标网络的页面数据进行挖掘,然后对挖掘的数据进行分类处理,并科学地统计舆情信息。本文主要分析网络舆情的特征和处理对策,并利用网络爬虫、全文检索、关键词评分、以及科学数理统计等手段对网络舆情监测系统的原理进行探索与系统实现。  相似文献   

17.
基于MFA和DEA的东北地区矿业城市可持续发展能力评价   总被引:2,自引:1,他引:1  
仇方道  李博  佟连军 《资源科学》2009,31(11):1898-1906
本文应用MFA和DEA模型,以14个典型矿业城市为案例,对1995年~2006年东北地区矿业城市的可持续发展能力演变特征及影响因素进行了初步探讨.研究发现:①东北地区矿业城市可持续发展能力平均水平较高,且整体呈提升趋势;②从资源类型看,煤炭城市可持续发展能力迅速提升,而石油、冶金、综合3类城市则在降低;从发展阶段看,老年阶段矿业城市可持续发展能力大幅提升,中、幼年阶段呈下降趋势,尤以幼年阶段显著;从城市规模看,特大型和中等矿业城市可持续发展能力呈下降之势,而大型矿业城市则显著提升;从地区分布看,辽宁矿业城市可持续发展能力变化比较平稳,吉林省呈下降趋势,而黑龙江省则呈较大幅度提高之势;③纯技术效率是驱动矿业城市可持续发展能力提高的主导因素,而规模无效是造成矿业城市DEA无效的主要原因,合理调控投入规模成为促动东北矿业城市可持续发展的路径选择.  相似文献   

18.
行业科协在整合行业科技资源、促进企业技术创新与行业整体发展方面发挥着越来越重要的作用,然而,当前行业科协还未取得合法地位,无法独立开展活动。在分析行业科协内涵与概况的基础上,着重探讨了行业科协的法律地位,通过与行业协会的比较分析,指出行业科协应该成为独立的社团法人,最后分析了在当前条件下行业科协开展活动可以采取的几种模式,以期为实践提供参考借鉴。  相似文献   

19.
孙瑞英  李杰茹 《情报科学》2021,39(11):157-166
【 目的/意义】个人信息保护政策作为公民个人信息保护的法律保障依据,具有重要的研究价值,将词频分 析、社会网络分析法、内容分析法用于政策解读与分析,推动我国个人信息保护工作的进一步开展。【方法/过程】本 文以《中华人民共和国个人信息保护法(二审稿草案)》(2021年4月29日发布)的政策文本为研究对象,运用词频分 析、社会网络分析法、内容分析的方法,对该法律草案文本进行研究,从而达到以更多视角挖掘分析法律政策条款 内涵的目的,以法律文本的分析为依据完成对我国个人信息推进现状的描述。【结果/结论】通过定量与定性相结合 的研究方法,解读《中华人民共和国个人信息保护法(二审稿草案)》,归纳并揭示法律文本中所蕴含的个人信息保 护工作的运行机理,明确我国个人信息保护立法的进程。【创新/局限】综合运用词频分析、社会网络分析法及内容分 析法解读《中华人民共和国个人信息保护法(二审稿草案)》的内容。但本文只针对我国个人信息保护的现状进行 了分析,并未涉及策略、改进方法等更进一步的讨论。  相似文献   

20.
春秋时代作为中国古代历史发展的重要转型时期,经济、政治、文化等各领域都发生了急剧的变革。研究这一时期社会变迁,对于阐释中华文化的历史渊源、发展脉络、基本走向,建构中国特色社会主义传统文化观皆有非常重要的意义和价值。文章以《左传》为语料来源,以文本挖掘为手段分析春秋社会演变规律,借助社会变迁相关理论对春秋时代进行结构、表现及动力等不同维度的描述,进而从文本分析的角度构建对应的量化指标。通过融合词频分析、聚类分析、时间序列分析、社区结构挖掘等多种文本挖掘技术,实现各项量化指标的计算。实验结果表明,研究设计的文本计算方法较好地描述了春秋社会结构演变、演变动力及演变表现,与人文学者研究结果基本一致。对于人文计算的开展具有一定理论价值与实践意义,但在模型构建、特征挖掘的方法以及结果评价方面仍有待进一步提升。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号