LEADER 00000cam a2200601Ii 4500 001 on1019128795 003 OCoLC 005 20200717185543.1 006 m o d 007 cr unu|||||||| 008 180111s2017 enka o 000 0 eng d 015 GBB820007|2bnb 016 7 018649654|2Uk 020 1788294114 020 1788296400 020 9781788296403 020 9781788294119|q(electronic book) 035 (OCoLC)1019128795 037 CL0500000927|bSafari Books Online 040 UMI|beng|erda|epn|cUMI|dIDEBK|dTOH|dNLE|dSTF|dCEF|dOCLCF |dKSU|dDEBBG|dUKMGB|dG3B|dLVT|dS9I|dUAB|dUKAHL|dN$T 049 RIDW 050 4 QA76.9.D343 082 04 005.1|223 090 QA76.9.D343 100 1 Weise, Thomas,|eauthor. 245 10 Learning Apache Apex :|bReal-time streaming applications with Apex /|cThomas Weise, Munagala V. Ramanath, David Yan, Kenneth Knowles. 264 1 Birmingham, UK :|bPackt Publishing,|c2017. 300 1 online resource (1 volume) :|billustrations 336 text|btxt|2rdacontent 337 computer|bc|2rdamedia 338 online resource|bcr|2rdacarrier 347 data file|2rda 505 0 Cover -- Title Page -- Copyright -- Credits -- About the Authors -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Introduction to Apex -- Unbounded data and continuous processing -- Stream processing -- Stream processing systems -- What is Apex and why is it important? -- Use cases and case studies -- Real-time insights for Advertising Tech (PubMatic) -- Industrial IoT applications (GE) -- Real-time threat detection (Capital One) -- Silver Spring Networks (SSN) -- Application Model and API -- Directed Acyclic Graph (DAG) -- Apex DAG Java API -- High-level Stream Java API -- SQL -- JSON -- Windowing and time -- Value proposition of Apex -- Low latency and stateful processing -- Native streaming versus micro-batch -- Performance -- Where Apex excels -- Where Apex is not suitable -- Summary -- Chapter 2: Getting Started with Application Development -- Development process and methodology -- Setting up the development environment -- Creating a new Maven project -- Application specifications -- Custom operator development -- The Apex operator model -- CheckpointListener/ CheckpointNotificationListener -- ActivationListener -- IdleTimeHandler -- Application configuration -- Testing in the IDE -- Writing the integration test -- Running the application on YARN -- Execution layer components -- Installing Apex Docker sandbox -- Running the application -- Working on the cluster -- YARN web UI -- Apex CLI -- Logging -- Dynamically adjusting logging levels -- Summary -- Chapter 3: The Apex Library -- An overview of the library -- Integrations -- Apache Kafka -- Kafka input -- Kafka output -- Other streaming integrations -- JMS (ActiveMQ, SQS, and so on) -- Kinesis streams -- Files -- File input -- File splitter and block reader -- File writer -- Databases -- JDBC input -- JDBC output -- Other databases. 505 8 Transformations -- Parser -- Filter -- Enrichment -- Map transform -- Custom functions -- Windowed transformations -- Windowing -- Global Window -- Time Windows -- Sliding Time Windows -- Session Windows -- Window propagation -- State -- Accumulation -- Accumulation Mode -- State storage -- Watermarks -- Allowed lateness -- Triggering -- Merging of streams -- The windowing example -- Dedup -- Join -- State Management -- Summary -- Chapter 4: Scalability, Low Latency, and Performance -- Partitioning and how it works -- Elasticity -- Partitioning toolkit -- Configuring and triggering partitioning -- StreamCodec -- Unifier -- Custom dynamic partitioning -- Performance optimizations -- Affinity and anti-affinity -- Low-latency versus throughput -- Sample application for dynamic partitioning -- Performance -- other aspects for custom operators -- Summary -- Chapter 5: Fault Tolerance and Reliability -- Distributed systems need to be resilient -- Fault-tolerance components and mechanism in Apex -- Checkpointing -- When to checkpoint -- How to checkpoint - - What to checkpoint -- Incremental state saving -- Incremental recovery -- Processing guarantees -- Example - - exactly-once counting -- The exactly-once output to JDBC -- Summary -- Chapter 6: Example Project -- Real-Time Aggregation and Visualization -- Streaming ETL and beyond -- The application pattern in a real-world use case -- Analyzing Twitter feed -- Top Hashtags -- TweetStats -- Running the application -- Configuring Twitter API access -- Enabling WebSocket output -- The Pub/Sub server -- Grafana visualization -- Installing Grafana -- Installing Grafana Simple JSON Datasource -- The Grafana Pub/Sub adapter server -- Setting up the dashboard -- Summary -- Chapter 7: Example Project -- Real-Time Ride Service Data Processing -- The goal -- Datasource -- The pipeline. 505 8 Simulation of a real-time feed using historical data -- Parsing the data -- Looking up of the zip code and preparing for the windowing operation -- Windowed operator configuration -- Serving the data with WebSocket -- Running the application -- Running the application on GCP Dataproc -- Summary -- Chapter 8: Example Project -- ETL Using SQL -- The application pipeline -- Building and running the application -- Application configuration -- The application code -- Partitioning -- Application testing -- Understanding application logs -- Calcite integration -- Summary -- Chapter 9: Introduction to Apache Beam -- Introduction to Apache Beam -- Beam concepts -- Pipelines, PTransforms, and PCollections -- ParDo -- elementwise computation -- GroupByKey/ CombinePerKey -- aggregation across elements -- Windowing, watermarks, and triggering in Beam -- Windowing in Beam -- Watermarks in Beam -- Triggering in Beam -- Advanced topic -- stateful ParDo -- WordCount in Apache Beam -- Setting up your pipeline -- Reading the works of Shakespeare in parallel -- Splitting each line on spaces -- Eliminating empty strings -- Counting the occurrences of each word -- Format your results -- Writing to a sharded text file in parallel -- Testing the pipeline at small scale with DirectRunner -- Running Apache Beam WordCount on Apache Apex -- Summary -- Chapter 10: The Future of Stream Processing -- Lower barrier for building streaming pipelines -- Visual development tools -- Streaming SQL -- Better programming API -- Bridging the gap between data science and engineering -- Machine learning integration -- State management -- State query and data consistency -- Containerized infrastructure -- Management tools -- Summary -- Index. 588 Description based on online resource; title from title page (viewed January 9, 2018). 590 eBooks on EBSCOhost|bEBSCO eBook Subscription Academic Collection - North America 630 00 Apache Apex. 650 0 Data mining.|0https://id.loc.gov/authorities/subjects/ sh97002073 650 0 Big data.|0https://id.loc.gov/authorities/subjects/ sh2012003227 650 7 Data mining.|2fast|0https://id.worldcat.org/fast/887946 650 7 Big data.|2fast|0https://id.worldcat.org/fast/1892965 655 4 Electronic books. 700 1 Ramanath, Munagala V.,|eauthor. 700 1 Yan, David,|0https://id.loc.gov/authorities/names/ nb2017021640|eauthor. 700 1 Knowles, Kenneth,|eauthor. 856 40 |uhttps://rider.idm.oclc.org/login?url=http:// search.ebscohost.com/login.aspx?direct=true&scope=site& db=nlebk&AN=1643015|zOnline ebook via EBSCO. Access restricted to current Rider University students, faculty, and staff. 856 42 |3Instructions for reading/downloading the EBSCO version of this ebook|uhttp://guides.rider.edu/ebooks/ebsco 901 MARCIVE 20231220 948 00 |d20200727|cEBSCO|tEBSCOebooksacademic NEW June-July 17 7032|lridw 994 92|bRID