Skip to content
You are not logged in |Login  
     
Limit search to available items
Record:   Prev Next
Resources
More Information
Bestseller
BestsellerE-book
Author Morgan, Andrew.

Title Mastering Spark for data science / Andrew Morgan, Antoine Amend, Matthew Hallett, David George ; foreword by Harry Powell.

Publication Info. Birmingham, UK : Packt Publishing Ltd., 2017.

Item Status

Description 1 online resource
text file
Note Includes index.
Summary "Master the techniques and sophisticated analytics used to construct Spark-based solutions that scale to deliver production-grade data science products."
Contents Cover; Copyright; Credits; Foreword; About the Authors; About the Reviewer; www.PacktPub.com; Customer Feedback; Table of Contents; Preface; Chapter 1: The Big Data Science Ecosystem; Introducing the Big Data ecosystem; Data management; Data management responsibilities; The right tool for the job; Overall architecture; Data Ingestion; Data Lake; Reliable storage; Scalable data processing capability; Data science platform; Data Access; Data technologies; The role of Apache Spark; Companion tools; Apache HDFS; Advantages; Disadvantages; Installation; Amazon S3; Advantages; Disadvantages.
InstallationApache Kafka; Advantages; Disadvantages; Installation; Apache Parquet; Advantages; Disadvantages; Installation; Apache Avro; Advantages; Disadvantages; Installation; Apache NiFi; Advantages; Disadvantages; Installation; Apache YARN; Advantages; Disadvantages; Installation; Apache Lucene; Advantages; Disadvantages; Installation; Kibana; Advantages; Disadvantages; Installation; Elasticsearch; Advantages; Disadvantages; Installation; Accumulo; Advantages; Disadvantages; Installation; Summary; Chapter 2: Data Acquisition; Data pipelines; Universal ingestion framework.
Introducing the GDELT news streamDiscovering GDELT in real-time; Our first GDELT feed; Improving with publish and subscribe; Content registry; Choices and more choices; Going with the flow; Metadata model; Kibana dashboard; Quality assurance; [Example 1 -- Basic quality checking, no contending users]; Example 1 -- Basic quality checking, no contending users; Example 2 -- Advanced quality checking, no contending users; Example 3 -- Basic quality checking, 50% utility due to contending users; Summary; Chapter 3: Input Formats and Schema; A structured life is a good life; GDELT dimensional modeling.
GDELT modelFirst look at the data; Core global knowledge graph model; Hidden complexity; Denormalized models; Challenges with flattened data; Issue 1 -- Loss of contextual information; Issue 2: Re-establishing dimensions; Issue 3: Including reference data; Loading your data; Schema agility; Reality check; GKG ELT; Position matters; Avro; Spark-Avro method; Pedagogical method; When to perform Avro transformation; Parquet; Summary; Chapter 4: Exploratory Data Analysis; The problem, principles and planning; Understanding the EDA problem; Design principles; General plan of exploration; Preparation.
Introducing mask based data profilingIntroducing character class masks; Building a mask based profiler; Setting up Apache Zeppelin; Constructing a reusable notebook; Exploring GDELT; GDELT GKG datasets; The files; Special collections; Reference data; Exploring the GKG v2.1; The Translingual files; A configurable GCAM time series EDA; Plot.ly charting on Apache Zeppelin; Exploring translation sourced GCAM sentiment with plot.ly; Concluding remarks; A configurable GCAM Spatio-Temporal EDA; Introducing GeoGCAM; Does our spatial pivot work?; Summary; Chapter 5: Spark for Geographic Analysis.
Local Note eBooks on EBSCOhost EBSCO eBook Subscription Academic Collection - North America
Subject Spark (Electronic resource : Apache Software Foundation)
Spark (Electronic resource : Apache Software Foundation)
Data mining.
Data mining.
Machine learning.
Machine learning.
Big data.
Big data.
Genre/Form Electronic books.
Added Author Amend, Antoine.
George, David.
Hallett, Matthew.
Other Form: Print version: Morgan, Andrew. Mastering Spark for Data Science. Birmingham : Packt Publishing, ©2017
ISBN 1785888285 (electronic book)
9781785888281 (electronic book)
1785882147