Name: Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python 2nd Edition
Brand: O'Reilly Media
SKU: EB1357
Price: 4.00 USD
Availability: InStock

Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python 2nd Edition

$4.00

Format

PDF

Author(s)

Peter Gedeck, Peter Bruce, Andrew Bruce

Publisher

O'Reilly Media

ISBN-10

149207294X

ISBN-13

978-1492072942

Pages

363

Language

English

Edition

2nd edition | June 2, 2020

File Size

15 MB

Amazon Price

$52

High Quality

100% High Quality Guaranteed

Secure Payment

Secure Payment Services

Notify me when this e-book is on special sale!

Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python 2nd Edition

Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this popular guide adds comprehensive examples in Python, provides practical guidance on applying statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not.

Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.

With this book, you’ll learn:

Why exploratory data analysis is a key preliminary step in data science
How random sampling can reduce bias and yield a higher-quality dataset, even with big data
How the principles of experimental design yield definitive answers to questions
How to use regression to estimate outcomes and detect anomalies
Key classification techniques for predicting which categories a record belongs to
Statistical machine learning methods that “learn” from data
Unsupervised learning methods for extracting meaning from unlabeled data.

Additional ISBNs:

∗ eText ISBN: 1492072893, 978-1492072898, 9781492072898

Contents

Preface
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
1. Exploratory Data Analysis
Elements of Structured Data
Further Reading
Rectangular Data
Data Frames and Indexes
Nonrectangular Data Structures
Further Reading
Estimates of Location
Mean
Median and Robust Estimates
Example: Location Estimates of Population and Murder Rates
Further Reading
Estimates of Variability
Standard Deviation and Related Estimates
Estimates Based on Percentiles
Example: Variability Estimates of State Population
Further Reading
Exploring the Data Distribution
Percentiles and Boxplots
Frequency Tables and Histograms
Density Plots and Estimates
Further Reading
Exploring Binary and Categorical Data
Mode
Expected Value
Probability
Further Reading
Correlation
Scatterplots
Further Reading
Exploring Two or More Variables
Hexagonal Binning and Contours (Plotting Numeric Versus Numeric Data)
Two Categorical Variables
Categorical and Numeric Data
Visualizing Multiple Variables
Further Reading
Summary
2. Data and Sampling Distributions
Random Sampling and Sample Bias
Bias
Random Selection
Size Versus Quality: When Does Size Matter?
Sample Mean Versus Population Mean
Further Reading
Selection Bias
Regression to the Mean
Further Reading
Sampling Distribution of a Statistic
Central Limit Theorem
Standard Error
Further Reading
The Bootstrap
Resampling Versus Bootstrapping
Further Reading
Confidence Intervals
Further Reading
Normal Distribution
Standard Normal and QQ-Plots
Long-Tailed Distributions
Further Reading
Student’s t-Distribution
Further Reading
Binomial Distribution
Further Reading
Chi-Square Distribution
Further Reading
F-Distribution
Further Reading
Poisson and Related Distributions
Poisson Distributions
Exponential Distribution
Estimating the Failure Rate
Weibull Distribution
Further Reading
Summary
3. Statistical Experiments and Significance Testing
A/B Testing
Why Have a Control Group?
Why Just A/B? Why Not C, D,…?
Further Reading
Hypothesis Tests
The Null Hypothesis
Alternative Hypothesis
One-Way Versus Two-Way Hypothesis Tests
Further Reading
Resampling
Permutation Test
Example: Web Stickiness
Exhaustive and Bootstrap Permutation Tests
Permutation Tests: The Bottom Line for Data Science
Further Reading
Statistical Significance and p-Values
p-Value
Alpha
Type 1 and Type 2 Errors
Data Science and p-Values
Further Reading
t-Tests
Further Reading
Multiple Testing
Further Reading
Degrees of Freedom
Further Reading
ANOVA
F-Statistic
Two-Way ANOVA
Further Reading
Chi-Square Test
Chi-Square Test: A Resampling Approach
Chi-Square Test: Statistical Theory
Fisher’s Exact Test
Relevance for Data Science
Further Reading
Multi-Arm Bandit Algorithm
Further Reading
Power and Sample Size
Sample Size
Further Reading
Summary
4. Regression and Prediction
Simple Linear Regression
The Regression Equation
Fitted Values and Residuals
Least Squares
Prediction Versus Explanation (Profiling)
Further Reading
Multiple Linear Regression
Example: King County Housing Data
Assessing the Model
Cross-Validation
Model Selection and Stepwise Regression
Weighted Regression
Further Reading
Prediction Using Regression
The Dangers of Extrapolation
Confidence and Prediction Intervals
Factor Variables in Regression
Dummy Variables Representation
Factor Variables with Many Levels
Ordered Factor Variables
Interpreting the Regression Equation
Correlated Predictors
Multicollinearity
Confounding Variables
Interactions and Main Effects
Regression Diagnostics
Outliers
Influential Values
Heteroskedasticity, Non-Normality, and Correlated Errors
Partial Residual Plots and Nonlinearity
Polynomial and Spline Regression
Polynomial
Splines
Generalized Additive Models
Further Reading
Summary
5. Classification
Naive Bayes
Why Exact Bayesian Classification Is Impractical
The Naive Solution
Numeric Predictor Variables
Further Reading
Discriminant Analysis
Covariance Matrix
Fisher’s Linear Discriminant
A Simple Example
Further Reading
Logistic Regression
Logistic Response Function and Logit
Logistic Regression and the GLM
Generalized Linear Models
Predicted Values from Logistic Regression
Interpreting the Coefficients and Odds Ratios
Linear and Logistic Regression: Similarities and Differences
Assessing the Model
Further Reading
Evaluating Classification Models
Confusion Matrix
The Rare Class Problem
Precision, Recall, and Specificity
ROC Curve
AUC
Lift
Further Reading
Strategies for Imbalanced Data
Undersampling
Oversampling and Up/Down Weighting
Data Generation
Cost-Based Classification
Exploring the Predictions
Further Reading
Summary
6. Statistical Machine Learning
K-Nearest Neighbors
A Small Example: Predicting Loan Default
Distance Metrics
One Hot Encoder
Standardization (Normalization, z-Scores)
Choosing K
KNN as a Feature Engine
Tree Models
A Simple Example
The Recursive Partitioning Algorithm
Measuring Homogeneity or Impurity
Stopping the Tree from Growing
Predicting a Continuous Value
How Trees Are Used
Further Reading
Bagging and the Random Forest
Bagging
Random Forest
Variable Importance
Hyperparameters
Boosting
The Boosting Algorithm
XGBoost
Regularization: Avoiding Overfitting
Hyperparameters and Cross-Validation
Summary
7. Unsupervised Learning
Principal Components Analysis
A Simple Example
Computing the Principal Components
Interpreting Principal Components
Correspondence Analysis
Further Reading
K-Means Clustering
A Simple Example
K-Means Algorithm
Interpreting the Clusters
Selecting the Number of Clusters
Hierarchical Clustering
A Simple Example
The Dendrogram
The Agglomerative Algorithm
Measures of Dissimilarity
Model-Based Clustering
Multivariate Normal Distribution
Mixtures of Normals
Selecting the Number of Clusters
Further Reading
Scaling and Categorical Variables
Scaling the Variables
Dominant Variables
Categorical Data and Gower’s Distance
Problems with Clustering Mixed Data
Summary
Bibliography
Index

About the Author

Peter Bruce, Andrew Bruce, Peter Gedeck

Peter Bruce is the Founder and Chief Academic Officer of the Institute for Statistics Education at Statistics.com, which offers about 80 courses in statistics and analytics, roughly half of which are aimed at data scientists. He has authored or co-authored several books in statistics and analytics, and he earned his Bachelor’s degree at Princeton, and Masters degrees at Harvard and the University of Maryland.

Andrew Bruce, Principal Research Scientist at Amazon, has over 30 years of experience in statistics and data science in academia, government and business. The co-author of Applied Wavelet Analysis with S-PLUS, he earned his bachelor’s degree at Princeton, and PhD in statistics at the University of Washington

Peter Gedeck, Senior Data Scientist at Collaborative Drug Discovery, specializes in the development of machine learning algorithms to predict biological and physicochemical properties of drug candidates. Co-author of Data Mining for Business Analytics, he earned PhD’s in Chemistry from the University of Erlangen-Nürnberg in Germany and Mathematics from Fernuniversität Hagen, Germany.

Notice

Immediately after payment, you can Download Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python 2nd Edition e-Book (PDF).

Format:	PDF
Author(s):	Ken Bluttman
Publisher:	John Wiley & Sons Inc
ISBN-10:	1119839114
ISBN-13:	978-1119839118
Pages:	416
Language	English
Edition :	For Dummies; 6th edition \| December 3, 2021
File Size:	11 MB

Format:	PDF
Author(s):	Jeffrey Beasley, Piyasat Nilkaew
Publisher:	Pearson
ISBN-10:	0137455925
ISBN-13:	978-0137455928
Pages:	851
Language	English
Edition :	6th edition \| 2021
File Size:	34 MB

Format:	PDF
Author(s):	Marc Saltzman
Publisher:	Wiley
ISBN-10:	1119846404
ISBN-13:	978-1119846406
Pages:	432
Language	English
Edition :	5th edition \| January 6, 2022
File Size:	49 MB

Format:	PDF
Author(s):	Gavriel Salvendy, Waldemar Karwowski
Publisher:	Wiley
ISBN-10:	1119636086
ISBN-13:	978-1119636083
Pages:	1603
Language	English
Edition :	5th edition \| 2021
File Size:	48 MB

Format:	PDF
Author(s):	John Paul Mueller
Publisher:	Wiley
ISBN-10:	1119601746
ISBN-13:	978-1119601746
Pages:	912
Language	English
Edition :	4th edition \| January 7, 2021
File Size:	8 MB

CURRENT Medical Diagnosis and Treatment 2023 62nd Edition

CURRENT Medical Diagnosis and Treatment 2022 61st Edition

Principles of Neural Science, Sixth Edition