Rider University Library / All Locations

You are not logged in |Login
LEADER 00000cam a2200769Mi 4500 
001    ocn779828976 
003    OCoLC 
005    20160527040429.3 
006    m     o  d         
007    cr |n|---||||| 
008    120312s2012    ne      ob    001 0 eng d 
019    787847218|a794545620|a817078477 
020    9789027274991|q(electronic book) 
020    9027274991|q(electronic book) 
020    1280497661 
020    9781280497667 
020    |z9789027203540|q(alkaline paper) 
035    (OCoLC)779828976|z(OCoLC)787847218|z(OCoLC)794545620
       |z(OCoLC)817078477 
040    EBLCP|beng|epn|cEBLCP|dOCLCQ|dN$T|dOCLCQ|dIDEBK|dCDX
       |dYDXCP|dE7B|dOCLCQ|dOCLCA 
043    e-no--- 
049    RIDW 
050  4 PD2914|b.E97 2012 
072  7 FOR|x039000|2bisacsh 
072  7 FOR|x022000|2bisacsh 
072  7 CFX|2bicssc 
082 04 439.8/20188|223 
090    PD2914|b.E97 2012 
245 00 Exploring newspaper language :|busing the web to create 
       and investigate a large corpus of modern Norwegian /
       |cedited by Gisele Andersen. 
264  1 Amsterdam ;|aPhiladelphia :|bJohn Benjamins Pub. Co.,
       |c2012. 
300    1 online resource (362 pages). 
336    text|btxt|2rdacontent 
337    computer|bc|2rdamedia 
338    online resource|bcr|2rdacarrier 
347    text file|2rdaft 
490 1  Studies in corpus linguistics,|x1388-0373 ;|vv. 49 
500    6. Data and experimental evaluation. 
504    Includes bibliographical references and indexes. 
505 0  Exploring Newspaper Language; Editorial page; Titla page; 
       LCC data; Table of contents; Building a large corpus based
       on newspapers from the web; 1. Introduction; 2. An 
       overview of the Norwegian Newspaper Corpus and its system 
       architecture; 2.1 Text harvesting; 2.2 Boilerplate and 
       duplicate removal; 2.3 Language classification; 2.4 Text 
       annotation; 2.4.1 Annotation of source, date and author 
       information; 2.4.2 Topic classification; 2.4.3 Part-of-
       speech tagging; 2.5 Search system and user interface; 
       2.5.1 Corpus WorkBench; 2.5.2 Corpuscle; 2.6 Extraction of
       new words. 
505 8  2.7 Classification of new words2.7.1 Anglicism detection; 
       2.8 Frequency profiling and lexical database entry; 2.9 
       Identification of multiword expressions; 3. The content of
       the research contributions to this book; 4. Concluding 
       remarks; References; Part II. Exploiting the web as a 
       corpus -- Methods and tools; Corpuscle -- a new corpus 
       management platform for annotated corpora; 1. 
       Introduction; 2. Design principles; 3. Querying the 
       corpus; 4. API and Web interface; 4.1 The API; 4.2 The Web
       interface; 5. Editing and manual annotation; 6. Evaluation
       and concluding remarks; References; OBT+stat. 
505 8  1. Introduction2. Background; 2.1 The history of the Oslo-
       Bergen Tagger; 2.2 State of the art for Norwegian POS 
       taggers; 3. The architecture of the Oslo-Bergen Constraint
       Grammar Tagger; 4. Methodology of improvements to the Oslo
       -Bergen Tagger; 5. Dealing with left-over ambiguities in 
       the Oslo-Bergen Tagger; 5.1 Morphological ambiguities; 5.2
       Lemma ambiguities; 6. Statistical disambiguation; 7. 
       Modelling challenges and engineering concerns; 8. 
       Evaluation of the statistical module; 8.1 How to evaluate;
       8.2 Evaluation results; 9. Conclusion; References. 
505 8  Exploring corpora through syntactic annotation1. 
       Introduction; 2. Treebanking; 3. INESS -- the Norwegian 
       treebanking infrastructure; 4. Searching for complex 
       syntactic constructions in a treebank; 4.1 Passive 
       constructions; 4.2 Relative clauses; 5. Conclusion; 
       References; Collocations and statistical analysis of n-
       grams; 1. Introduction; 2. Background; 2.1 Multiword 
       Expressions (MWEs); 2.2 Collocations; 3. Methodology; 3.1 
       Data and n-gram extraction; 3.2 Post-processing of n-gram 
       lists; 3.3 Contingency tables; 3.3.1 Bigram Contingency 
       Tables; 3.3.2 Trigram Contingency Tables. 
505 8  3.4 Bigram Association Measures3.5 Trigram Association 
       Measures; 4. Results; 4.1 Bigrams; 4.2 Trigrams; 5. 
       Conclusion and Future Work; References; Automatic topic 
       classi?cation of a large newspaper corpus; 1. 
       Introduction; 2. Background and related work; 2.1 The rule
       -based approach; 2.2 The pattern-matching approach; 2.3 
       Promising results; 3. Material; 3.1 Manual annotation; 3.2
       Feature extraction; 3.3 Cleaning the text; 3.4 The gold 
       standard; 4. Overview of our final approach; 5. Our 
       approach in detail; 5.1 Hypothesis; 5.2 De?ning 
       categories; 5.3 Tools; 5.4 Programming and experimenting. 
520    This book describes new methodological and technological 
       approaches to corpus building and presents recent research
       based on the Norwegian Newspaper Corpus. This is a large 
       monitor corpus of contemporary Norwegian language, 
       compiled through daily harvesting of web newspapers. The 
       book gives an overview of the corpus and its system 
       architecture, and presents tools used for tasks such as 
       text harvesting, annotation, topic classification and 
       extraction and frequency profiling of new words and 
       phrases. Among the innovative technologies is Corpuscle, a
       corpus query engine and management system whic. 
588 0  Print version record. 
590    eBooks on EBSCOhost|bEBSCO eBook Subscription Academic 
       Collection - North America 
650  0 Norwegian language (Nynorsk)|0https://id.loc.gov/
       authorities/subjects/sh85092723|xUsage.|0https://
       id.loc.gov/authorities/subjects/sh2002006425 
650  0 Norwegian language (Nynorsk)|0https://id.loc.gov/
       authorities/subjects/sh85092723|xSyntax.|0https://
       id.loc.gov/authorities/subjects/sh99005599 
650  0 Newspapers|zNorway.|0https://id.loc.gov/authorities/
       subjects/sh2010015442 
650  0 Mass media|0https://id.loc.gov/authorities/subjects/
       sh85081863|zNorway.|0https://id.loc.gov/authorities/names/
       n79021528-781 
650  0 Information technology|0https://id.loc.gov/authorities/
       subjects/sh87002293|zNorway.|0https://id.loc.gov/
       authorities/names/n79021528-781 
650  7 Norwegian language (Nynorsk)|2fast|0https://
       id.worldcat.org/fast/1039437 
650  7 Grammar, Comparative and general|xSyntax.|2fast|0https://
       id.worldcat.org/fast/946258 
650  7 Newspapers.|2fast|0https://id.worldcat.org/fast/1037111 
650  7 Mass media.|2fast|0https://id.worldcat.org/fast/1011219 
650  7 Information technology.|2fast|0https://id.worldcat.org/
       fast/973089 
651  7 Norway.|2fast|0https://id.worldcat.org/fast/1204556 
655  4 Electronic books. 
700 1  Andersen, Gisle.|0https://id.loc.gov/authorities/names/
       n00024950 
776 08 |iPrint version:|aAndersen, Gisle.|tExploring Newspaper 
       Language : Using the web to create and investigate a large
       corpus of modern Norwegian.|dAmsterdam/Philadelphia : John
       Benjamins Publishing Company, ©2012|z9789027203540 
830  0 Studies in corpus linguistics ;|0https://id.loc.gov/
       authorities/names/n98023070|vv. 49.|x1388-0373 
856 40 |uhttps://rider.idm.oclc.org/login?url=http://
       search.ebscohost.com/login.aspx?direct=true&scope=site&
       db=nlebk&AN=439344|zOnline eBook. Access restricted to 
       current Rider University students, faculty, and staff. 
856 42 |3Instructions for reading/downloading this eBook|uhttp://
       guides.rider.edu/ebooks/ebsco 
901    MARCIVE 20231220 
948    |d20160607|cEBSCO|tebscoebooksacademic|lridw 
994    92|bRID
Library Links

Search Tools