![]() ![]() ![]()
EI Compendex Source List(2022年1月)
EI Compendex Source List(2020年1月)
EI Compendex Source List(2019年5月)
EI Compendex Source List(2018年9月)
EI Compendex Source List(2018年5月)
EI Compendex Source List(2018年1月)
中国科学引文数据库来源期刊列
CSSCI(2017-2018)及扩展期刊目录
2017年4月7日EI检索目录(最新)
2017年3月EI检索目录
最新公布北大中文核心期刊目录
SCI期刊(含影响因子)
![]() ![]() ![]()
论文范文
1. Introduction There are numerous information extraction projects that use a variety of techniques to extract knowledge from large text corpora and World Wide Web [1]. Example projects include YAGO [2], DBPedia [3], NELL [4], open information extraction [5], and knowledge vault [6]. These projects provide automatically constructed knowledge bases (KBs) with massive collections of entities and facts, where each entity or fact has a confidence score. However, machine-constructed knowledge bases contain noisy and unreliable facts due to the variable quality of information and the limited accuracy of extractors. Transforming these candidate facts into useful knowledge is a formidable challenge [7]. To alleviate the amount of noise in automatically extracted facts, these projects often employ ad hoc heuristics to reason about uncertainty and contradictoriness due to the large scale of the facts. There exists significant work in developing effective algorithms to perform joint probabilistic inference over candidate facts [7, 8]. Automated approaches have been improved in terms of quality but remain far from perfect. Therefore, effective methods to obtain high quality knowledge are desired. It is easy for human experts to determine whether a fact is correct or not. However, it is impossible to hire experts to correct all of them. Recently, due to the availability of Internet platforms like Amazon Mechanical Turk (MTurk), which enables the participation of human workers in a large scale, crowdsourcing has been proven to be a viable and cost-effective alternative solution. Crowdsourcing is normally used to create labelled datasets to apply machine learning algorithms and becomes an effective way to handle computer-hard tasks [9–11], such as sentiment analysis [12], image classification [13], and entity resolution [14]. The limitations of machine-based approaches and the availability of easily accessible crowdsourcing platforms inspire us to exploit crowdsourcing to improve the quality of automatically extracted knowledge bases. In this paper, we study the problem of refining knowledge bases using crowdsourcing. Specifically, given a collection of noisy extractions (entities and their relationships) and a budget, we can obtain a set of high quality facts from these extractions via crowdsourcing. In particular, there are two subproblems to address in this study: error Detection: how can we effectively detect potential erroneous candidate facts which need to be verified by the crowd? Information extraction systems are able to extract massive collections of interrelated facts. Some facts are correct, while others are clearly incorrect and contradictory. Asking humans to verify all candidate facts is generally not feasible due to the large size of extractions. Hence, one of key challenges is to determine which subset of knowledge should be presented to the crowd for verification. Knowledge Inference: how can we accurately infer consistent knowledge based on crowd feedbacks? Errors introduced from the extraction process cause inconsistencies in the knowledge base, which may contain duplicate entities and violate key ontological constraints such as subsumption, mutual exclusion, inverse, and domain and range constraints. To address these problems, we first introduce a concept of semantic constraints, which is similar to integrity constraints in data cleaning. Then we propose rank-based and graph-based algorithms to judiciously select candidate facts to conduct crowdsourcing based on semantic constraints. Our method automatically assigns the most “beneficial” task to the crowd and infers the answers of some candidate facts based on crowd feedbacks. Experiments on NELL’s knowledge base show that our method can significantly improve the quality of knowledge and outperform state-of-the-art automatic methods under a reasonable crowdsourcing cost. ![]() |
|