Record:   Prev Next
作者 Ren, Xiang, author
書名 Mining structures of factual knowledge from text : an effort-light approach / Xiang Ren, Jiawei Han
出版項 [San Rafael, California] : Morgan & Claypool, 2018
國際標準書號 9781681733937 ebook
9781681733944 hardcover
9781681733920 paperback
國際標準號碼 10.2200/S00860ED1V01Y201806DMK015 doi
book jacket
說明 1 online resource (xv, 183 pages) : illustrations
text rdacontent
electronic isbdmedia
online resource rdacarrier
系列 Synthesis lectures on data mining and knowledge discovery, 2151-0075 ; # 15
Synthesis digital library of engineering and computer science
Synthesis lectures on data mining and knowledge discovery ; # 15. 2151-0075
附註 Part of: Synthesis digital library of engineering and computer science
Includes bibliographical references (pages 167-181)
1. Introduction -- 1.1 Overview of the book -- 1.1.1 Part I: Identifying typed entities -- 1.1.2 Part II: Extracting typed entity relationships -- 1.1.3 Part III: Toward automated factual structure mining -- 2. Background -- 2.1 Entity structures -- 2.2 Relation structures -- 2.3 Distant supervision from knowledge bases -- 2.4 Mining entity and relation structures -- 2.5 Common notations -- 3. Literature review -- 3.1 Hand-crafted methods -- 3.2 Traditional supervised learning methods -- 3.2.1 Sequence labeling methods -- 3.2.2 Supervised relation extraction methods -- 3.3 Weakly supervised extraction methods -- 3.3.1 Semi-supervised learning -- 3.3.2 Pattern-based bootstrapping -- 3.4 Distantly supervised learning methods -- 3.5 Learning with noisy labeled data -- 3.6 Open-domain information extraction --
Part I. Identifying typed entities -- 4. Entity recognition and typing with knowledge bases -- 4.1 Overview and motivation -- 4.2 Problem definition -- 4.3 Relation phrase-based graph construction -- 4.3.1 Candidate generation -- 4.3.2 Mention-name subgraph -- 4.3.3 Name-relation phrase subgraph -- 4.3.4 Mention correlation subgraph -- 4.4 Clustering-integrated type propagation on graphs -- 4.4.1 Seed mention generation -- 4.4.2 Relation phrase clustering -- 4.4.3 The joint optimization problem -- 4.4.4 The ClusType algorithm -- 4.4.5 Computational complexity analysis -- 4.5 Experiments -- 4.5.1 Data preparation -- 4.5.2 Experimental settings -- 4.5.3 Experiments and performance study -- 4.6 Discussion -- 4.7 Summary -- 5. Fine-grained entity typing with knowledge bases -- 5.1 Overview and motivation -- 5.2 Preliminaries -- 5.3 The AFET framework -- 5.3.1 Text feature generation -- 5.3.2 Training set partition -- 5.3.3 The joint mention-type model -- 5.3.4 Modeling type correlation -- 5.3.5 Modeling noisy type labels -- 5.3.6 Hierarchical partial-label embedding -- 5.4 Experiments -- 5.4.1 Data preparation -- 5.4.2 Evaluation settings -- 5.4.3 Performance comparison and analyses -- 5.5 Discussion and case analysis -- 5.6 Summary -- 6. Synonym discovery from large corpus / Meng Qu -- 6.1 Overview and motivation -- 6.1.1 Challenges -- 6.1.2 Proposed solution -- 6.2 The DPE framework -- 6.2.1 Synonym seed collection -- 6.2.2 Joint optimization problem -- 6.2.3 Distributional module -- 6.2.4 Pattern module -- 6.3 Experiment -- 6.4 Summary --
Part II. Extracting typed relationships -- 7. Joint extraction of typed entities and relationships -- 7.1 Overview and motivation -- 7.2 Preliminaries -- 7.3 The CoType framework -- 7.3.1 Candidate generation -- 7.3.2 Joint entity and relation embedding -- 7.3.3 Model learning and type inference -- 7.4 Experiments -- 7.4.1 Data preparation and experiment setting -- 7.4.2 Experiments and performance study -- 7.5 Discussion -- 7.6 Summary -- 8. Pattern-enhanced embedding learning for relation extraction / Meng Qu -- 8.1 Overview and motivation -- 8.1.1 Challenges -- 8.1.2 Proposed solution -- 8.2 The REPEL framework -- 8.3 Experiment -- 8.4 Summary -- 9. Heterogeneous supervision for relation extraction / Liyuan Liu -- 9.1 Overview and motivation -- 9.2 Preliminaries -- 9.2.1 Relation extraction -- 9.2.2 Heterogeneous supervision -- 9.2.3 Problem definition -- 9.3 The REHession framework -- 9.3.1 Modeling relation mention -- 9.3.2 True label discovery -- 9.3.3 Modeling relation type -- 9.3.4 Model learning -- 9.3.5 Relation type inference -- 9.4 Experiments -- 9.5 Summary -- 10. Indirect supervision: leveraging knowledge from auxiliary tasks / Zeqiu Wu -- 10.1 Overview and motivation -- 10.1.1 Challenges -- 10.1.2 Proposed solution -- 10.2 The proposed approach -- 10.2.1 Heterogeneous network construction -- 10.2.2 Joint RE and QA embedding -- 10.2.3 Type inference -- 10.3 Experiments -- 10.4 Summary --
Part III. Toward automated factual structure mining -- 11. Mining entity attribute values with meta patterns / Meng Jiang -- 11.1 Overview and motivation -- 11.1.1 Challenges -- 11.1.2 Proposed solution -- 11.1.3 Problem formulation -- 11.2 The MetaPAD framework -- 11.2.1 Generating meta patterns by context-aware segmentation -- 11.2.2 Grouping synonymous meta patterns -- 11.2.3 Adjusting type levels for preciseness -- 11.3 Summary -- 12. Open information extraction with global structure cohesiveness / Qi Zhu -- 12.1 Overview and motivation -- 12.1.1 Proposed solution -- 12.2 The ReMine framework -- 12.2.1 The joint optimization problem -- 12.3 Summary -- 13. Applications -- 13.1 Structuring life science papers: the Life-iNet system -- 13.2 Extracting document facets from technical corpora -- 13.3 Comparative document analysis -- 14. Conclusions -- 14.1 Effort-light StructMine: summary -- 14.2 Conclusion -- 15. Vision and future work -- 15.1 Extracting implicit patterns from massive unlabeled corpora -- 15.2 Enriching factual structure representation --
Bibliography -- Authors' biographies
Abstract freely available; full-text restricted to subscribers or individual document purchasers
Compendex
INSPEC
Google scholar
Google book search
Mode of access: World Wide Web
System requirements: Adobe Acrobat Reader
The real-world data, though massive, is largely unstructured, in the form of natural-language text. It is challenging but highly desirable to mine structures from massive text data, without extensive human annotation and labeling. In this book, we investigate the principles and methodologies of mining structures of factual knowledge (e.g., entities and their relationships) from massive, unstructured text corpora. Departing from many existing structure extraction methods that have heavy reliance on human annotated data for model training, our effort-light approach leverages human-curated facts stored in external knowledge bases as distant supervision and exploits rich data redundancy in large text corpora for context understanding. This effort-light mining approach leads to a series of new principles and powerful methodologies for structuring text corpora, including: (1) entity recognition, typing, and synonym discovery; (2) entity relation extraction; and (3) open-domain attribute-value mining and information extraction. This book introduces this new research frontier and points out some promising research directions
Also available in print
Title from PDF title page (viewed on August 1, 2018)
鏈接 Print version: 9781681733920 9781681733944
主題 Electronic information resource searching
Data mining
Data structures (Computer science)
mining factual structures
information extraction
knowledge bases
entity recognition and typing
relation extraction
entity synonym mining
distant supervision
effort-light approach
classification
clustering
real-world applications
scalable algorithms
Alt Author Han, Jiawei, author
Record:   Prev Next