Description |
1 online resource (288 pages) |
Contents |
Cover; Title Page; Copyright and Credits; Dedication; About Packt; Contributors; Table of Contents; Preface; Section 1: Scala and Data Analysis Life Cycle; Chapter 1: Scala Overview; Getting started with Scala; Running Scala code online; Scastie; ScalaFiddle; Installing Scala on your computer; Installing command-line tools; Installing IDE; Overview of object-oriented and functional programming; Object-oriented programming using Scala; Functional programming using Scala; Scala case classes and the collection API; Scala case classes; Scala collection API; Array; List; Map |
|
Overview of Scala libraries for data analysisApache Spark; Breeze; Breeze-viz; DeepLearning; Epic; Saddle; Scalalab; Smile; Vegas; Summary; Chapter 2: Data Analysis Life Cycle; Data journey; Sourcing data; Data formats; XML; JSON; CSV; Understanding data; Using statistical methods for data exploration; Using Scala; Other Scala tools; Using data visualization for data exploration; Using the vegas-viz library for data visualization; Other libraries for data visualization; Using ML to learn from data; Setting up Smile; Running Smile; Creating a data pipeline; Summary; Chapter 3: Data Ingestion |
|
Data extractionPull-oriented data extraction; Push-oriented data delivery; Data staging; Why is the staging important?; Cleaning and normalizing; Enriching; Organizing and storing; Summary; Chapter 4: Data Exploration and Visualization; Sampling data; Selecting the sample; Selecting samples using Saddle; Performing ad hoc analysis; Finding a relationship between data elements; Visualizing data; Vegas viz for data visualization; Spark Notebook for data visualization; Downloading and installing Spark Notebook; Creating a Spark Notebook with simple visuals; More charts with Spark Notebook |
|
Box plotHistogram; Bubble chart; Summary; Chapter 5: Applying Statistics and Hypothesis Testing; Basics of statistics; Summary level statistics; Correlation statistics; Vector level statistics; Random data generation; Pseudorandom numbers; Random numbers with normal distribution; Random numbers with Poisson distribution; Hypothesis testing; Summary; Section 2: Advanced Data Analysis and Machine Learning; Chapter 6: Introduction to Spark for Distributed Data Analysis; Spark setup and overview; Spark core concepts; Spark Datasets and DataFrames; Sourcing data using Spark; Parquet file format |
|
Avro file formatSpark JDBC integration; Using Spark to explore data; Summary; Chapter 7: Traditional Machine Learning for Data Analysis; ML overview; Characteristics of ML; Categories or types of ML; Decision trees; Implementing decision trees; Decision tree algorithms; Implementing decision tree algorithms in our example; Evaluating the results; Using our model with a decision tree; Random forest; Random forest algorithms; Ridge and lasso regression; Characteristics of ridge regression; Characteristics of lasso regression; k-means cluster analysis |
Note |
Natural language processing for data analysis |
Summary |
This book will help you perform effective data analysis with Scala using practical examples. You will come across different challenges and their effective solutions for a variety of data processing tasks - be it data exploration, data manipulation, or real-time data analysis using Apache Spark. |
Local Note |
eBooks on EBSCOhost EBSCO eBook Subscription Academic Collection - North America |
Subject |
Data mining.
|
|
Scala (Computer program language)
|
|
SQL.
|
|
Data mining. |
|
Information visualization. |
|
Data capture & analysis. |
|
Computers -- Database Management -- Data Mining. |
|
Computers -- Data Processing. |
|
Data mining |
|
Scala (Computer program language) |
Other Form: |
Print version: Gupta, Rajesh. Hands-On Data Analysis with Scala : Perform Data Collection, Processing, Manipulation, and Visualization with Scala. Birmingham : Packt Publishing, Limited, ©2019 9781789346114 |
ISBN |
1789344263 |
|
9781789344264 (electronic bk.) |
|