Short Description

Text-technology, computational linguistics and computer science have – partly in close cooperation – modelled textual data in order to grasp natural language texts as the prime information unit of written communication. The book “Modelling, Learning and Processing of Text-Technological Data Structures” deals with this challenging information unit. It focuses on the theoretical foundations of representing these units and on the procedures operating thereon. Following this procedural stance, the book integrates a wide range of topics regarding the processing of textual data. This relates to the learning of ontologies from natural language texts, their complex annotation and automatic parsing as well as to the detection and tracking of topics in texts and hypertexts. Special emphasis is put on learning document structures which challenge present approaches to structure mining because of the specific complexity raised by natural language texts and semi-structured data derived thereof. In a nutshell, the present book is unique as it brings together a wide range of approaches to procedural aspects of text-technology as an emerging scientific discipline. It includes contributions to the following topics:

• lexical and terminological resources of information modelling
• ontology mining
• document structure learning
• document classification and categorization
• text and hypertext parsing
• topic detection, tracking and chaining

The book addresses researchers who want to become acquainted with theoretical developments, computational models and their empirical evaluation in this field of research. It is intended for those who are interested in standards of representing textual data structures, their usage in various fields of application (such as topic tracking, ontology learning and document classification) and their formal-mathematical modelling. In this sense, the book concerns readers from many disciplines such as text and language technology, natural language processing, computational linguistics and computer science.