LEADER 00000cam a2200769Mi 4500 001 ocn779828976 003 OCoLC 005 20160527040429.3 006 m o d 007 cr |n|---||||| 008 120312s2012 ne ob 001 0 eng d 019 787847218|a794545620|a817078477 020 9789027274991|q(electronic book) 020 9027274991|q(electronic book) 020 1280497661 020 9781280497667 020 |z9789027203540|q(alkaline paper) 035 (OCoLC)779828976|z(OCoLC)787847218|z(OCoLC)794545620 |z(OCoLC)817078477 040 EBLCP|beng|epn|cEBLCP|dOCLCQ|dN$T|dOCLCQ|dIDEBK|dCDX |dYDXCP|dE7B|dOCLCQ|dOCLCA 043 e-no--- 049 RIDW 050 4 PD2914|b.E97 2012 072 7 FOR|x039000|2bisacsh 072 7 FOR|x022000|2bisacsh 072 7 CFX|2bicssc 082 04 439.8/20188|223 090 PD2914|b.E97 2012 245 00 Exploring newspaper language :|busing the web to create and investigate a large corpus of modern Norwegian / |cedited by Gisele Andersen. 264 1 Amsterdam ;|aPhiladelphia :|bJohn Benjamins Pub. Co., |c2012. 300 1 online resource (362 pages). 336 text|btxt|2rdacontent 337 computer|bc|2rdamedia 338 online resource|bcr|2rdacarrier 347 text file|2rdaft 490 1 Studies in corpus linguistics,|x1388-0373 ;|vv. 49 500 6. Data and experimental evaluation. 504 Includes bibliographical references and indexes. 505 0 Exploring Newspaper Language; Editorial page; Titla page; LCC data; Table of contents; Building a large corpus based on newspapers from the web; 1. Introduction; 2. An overview of the Norwegian Newspaper Corpus and its system architecture; 2.1 Text harvesting; 2.2 Boilerplate and duplicate removal; 2.3 Language classification; 2.4 Text annotation; 2.4.1 Annotation of source, date and author information; 2.4.2 Topic classification; 2.4.3 Part-of- speech tagging; 2.5 Search system and user interface; 2.5.1 Corpus WorkBench; 2.5.2 Corpuscle; 2.6 Extraction of new words. 505 8 2.7 Classification of new words2.7.1 Anglicism detection; 2.8 Frequency profiling and lexical database entry; 2.9 Identification of multiword expressions; 3. The content of the research contributions to this book; 4. Concluding remarks; References; Part II. Exploiting the web as a corpus -- Methods and tools; Corpuscle -- a new corpus management platform for annotated corpora; 1. Introduction; 2. Design principles; 3. Querying the corpus; 4. API and Web interface; 4.1 The API; 4.2 The Web interface; 5. Editing and manual annotation; 6. Evaluation and concluding remarks; References; OBT+stat. 505 8 1. Introduction2. Background; 2.1 The history of the Oslo- Bergen Tagger; 2.2 State of the art for Norwegian POS taggers; 3. The architecture of the Oslo-Bergen Constraint Grammar Tagger; 4. Methodology of improvements to the Oslo -Bergen Tagger; 5. Dealing with left-over ambiguities in the Oslo-Bergen Tagger; 5.1 Morphological ambiguities; 5.2 Lemma ambiguities; 6. Statistical disambiguation; 7. Modelling challenges and engineering concerns; 8. Evaluation of the statistical module; 8.1 How to evaluate; 8.2 Evaluation results; 9. Conclusion; References. 505 8 Exploring corpora through syntactic annotation1. Introduction; 2. Treebanking; 3. INESS -- the Norwegian treebanking infrastructure; 4. Searching for complex syntactic constructions in a treebank; 4.1 Passive constructions; 4.2 Relative clauses; 5. Conclusion; References; Collocations and statistical analysis of n- grams; 1. Introduction; 2. Background; 2.1 Multiword Expressions (MWEs); 2.2 Collocations; 3. Methodology; 3.1 Data and n-gram extraction; 3.2 Post-processing of n-gram lists; 3.3 Contingency tables; 3.3.1 Bigram Contingency Tables; 3.3.2 Trigram Contingency Tables. 505 8 3.4 Bigram Association Measures3.5 Trigram Association Measures; 4. Results; 4.1 Bigrams; 4.2 Trigrams; 5. Conclusion and Future Work; References; Automatic topic classi?cation of a large newspaper corpus; 1. Introduction; 2. Background and related work; 2.1 The rule -based approach; 2.2 The pattern-matching approach; 2.3 Promising results; 3. Material; 3.1 Manual annotation; 3.2 Feature extraction; 3.3 Cleaning the text; 3.4 The gold standard; 4. Overview of our final approach; 5. Our approach in detail; 5.1 Hypothesis; 5.2 De?ning categories; 5.3 Tools; 5.4 Programming and experimenting. 520 This book describes new methodological and technological approaches to corpus building and presents recent research based on the Norwegian Newspaper Corpus. This is a large monitor corpus of contemporary Norwegian language, compiled through daily harvesting of web newspapers. The book gives an overview of the corpus and its system architecture, and presents tools used for tasks such as text harvesting, annotation, topic classification and extraction and frequency profiling of new words and phrases. Among the innovative technologies is Corpuscle, a corpus query engine and management system whic. 588 0 Print version record. 590 eBooks on EBSCOhost|bEBSCO eBook Subscription Academic Collection - North America 650 0 Norwegian language (Nynorsk)|0https://id.loc.gov/ authorities/subjects/sh85092723|xUsage.|0https:// id.loc.gov/authorities/subjects/sh2002006425 650 0 Norwegian language (Nynorsk)|0https://id.loc.gov/ authorities/subjects/sh85092723|xSyntax.|0https:// id.loc.gov/authorities/subjects/sh99005599 650 0 Newspapers|zNorway.|0https://id.loc.gov/authorities/ subjects/sh2010015442 650 0 Mass media|0https://id.loc.gov/authorities/subjects/ sh85081863|zNorway.|0https://id.loc.gov/authorities/names/ n79021528-781 650 0 Information technology|0https://id.loc.gov/authorities/ subjects/sh87002293|zNorway.|0https://id.loc.gov/ authorities/names/n79021528-781 650 7 Norwegian language (Nynorsk)|2fast|0https:// id.worldcat.org/fast/1039437 650 7 Grammar, Comparative and general|xSyntax.|2fast|0https:// id.worldcat.org/fast/946258 650 7 Newspapers.|2fast|0https://id.worldcat.org/fast/1037111 650 7 Mass media.|2fast|0https://id.worldcat.org/fast/1011219 650 7 Information technology.|2fast|0https://id.worldcat.org/ fast/973089 651 7 Norway.|2fast|0https://id.worldcat.org/fast/1204556 655 4 Electronic books. 700 1 Andersen, Gisle.|0https://id.loc.gov/authorities/names/ n00024950 776 08 |iPrint version:|aAndersen, Gisle.|tExploring Newspaper Language : Using the web to create and investigate a large corpus of modern Norwegian.|dAmsterdam/Philadelphia : John Benjamins Publishing Company, ©2012|z9789027203540 830 0 Studies in corpus linguistics ;|0https://id.loc.gov/ authorities/names/n98023070|vv. 49.|x1388-0373 856 40 |uhttps://rider.idm.oclc.org/login?url=http:// search.ebscohost.com/login.aspx?direct=true&scope=site& db=nlebk&AN=439344|zOnline eBook. Access restricted to current Rider University students, faculty, and staff. 856 42 |3Instructions for reading/downloading this eBook|uhttp:// guides.rider.edu/ebooks/ebsco 901 MARCIVE 20231220 948 |d20160607|cEBSCO|tebscoebooksacademic|lridw 994 92|bRID