https://probability4datascience.com/index.html
Introduction to Probability for Data Science
Michigan Publishing, 2021
ISBN 978-1-60785-746-4 (hardcover): Purchase from Amazon
ISBN 978-1-60785-747-1 (electronic) Free download from Univ. Michigan Publishing
Table of Content
Chapter 1 Mathematical Background
1.1 Infinite series
1.1.1 Geometric series
1.1.2 Binomial Series
1.2 Approximation
1.2.1 Taylor approximation
1.2.2 Exponential series
1.2.3 Logarithmic approximation
1.3 Integration
1.3.1 Odd and even functions
1.3.2 Fundamental theorem of calculus
1.4 Linear Algebra
1.4.1 Why do we need linear algebra in data science?
1.4.2 Everything you need to know about linear algebra
1.4.3 Inner products and norms
1.4.4 Matrix calculus
1.5 Basic Combinatorics
1.5.1 Birthday paradox
1.5.2 Permutation
1.5.3 Combination
Chapter 2 Probability
2.1 Set Theory
2.1.1 Why study set theory?
2.1.2 Basic concepts of a set
2.1.3 Subsets
2.1.4 Empty set and universal set
2.1.5 Union
2.1.6 Intersection
2.1.7 Complement and difference
2.1.8 Disjoint and partition
2.1.9 Set operations
2.1.10 Closing Remark
2.2 Probability Space
2.2.1 Sample space
2.2.2 Event space
2.2.3 Probability law
2.2.4 Measure zero sets
2.2.5 Summary of the probability space
2.3 Axioms of Probability
2.3.1 Why these three probability axioms?
2.3.2 Axioms through the lens of measure
2.3.3 Corollaries derived from axioms
2.4 Conditional Probability
2.4.1 Definition of conditional probability
2.4.2 Independence
2.4.3 Bayes’ theorem and law of total probability
2.4.4 Prisoner's dilemma
Chapter 3 Discrete Random Variables
3.1 Random Variables
3.1.1 A motivating example
3.1.2 Definition of a random variable
3.1.3 Probability measure on random variables
3.2 Probability Mass Function
3.2.1 Definition
3.2.2 PMF and probability measure
3.2.3 Normalization Property
3.2.4 PMF vs Histogram
3.2.5 Estimating histograms from real data
3.3 Cumulative Distribution Function
3.3.1 Definition
3.3.2 Properties of CDF
3.3.3 Converting between PMF and CDF
3.4 Expectation
3.4.1 Definition
3.4.2 Existence of Expectation
3.4.3 Properties of Expectation
3.4.4 Moments and Variance
3.5 Common Discrete Random Variables
3.5.1 Bernoulli Random Variable
3.5.2 Binomial random variable
3.5.3 Geometric random variable
3.5.4 Poisson random variable
Chapter 4 Continuous Random Variables
4.1 Probability Density Function
4.1.1 Some intuition about probability density functions
4.1.2 More in-depth discussion about PDFs
4.1.3 Connecting with PMF
4.2 Expectation, Moment, and Variance
4.2.1 Definition and properties
4.2.2 Existence of Expectation
4.2.3 Moment and Variance
4.3 Cumulative Distribution Function
4.3.1 CDF for continuous random variables
4.3.2 Properties of CDF
4.3.3 Retrieving PDF from CDF
4.3.4 CDF: Unifying discrete and continuous random variables
4.4 Median, Mode, and Mean
4.4.1 Median
4.4.2 Mode
4.4.3 Mean
4.5 Uniform and Exponential Random Variables
4.5.1 Uniform Random Variable
4.5.2 Exponential Random Variable
4.5.3 Origin of exponential random variable
4.5.4 Applications of exponential random variables
4.6 Gaussian Random Variables
4.6.1 Definition of a Gaussian random variable
4.6.2 Standard Gaussian
4.6.3 Skewness and Kurtosis
4.6.4 Origin of Gaussian random variables
4.7 Functions of Random Variables
4.7.1 General principle
4.7.2 Worked examples
4.8 Generating Random Numbers
4.8.1 Principle
4.8.2 Examples
Chapter 5 Joint Distributions
5.1 Joint PMF and Joint PDF
5.1.1 Probability measure in 2D
5.1.2 Discrete random variables
5.1.3 Continuous random variables
5.1.4 Normalization
5.1.5 Marginal PMF and marginal PDF
5.1.6 Independent random variables
5.1.7 Joint CDF
5.2 Joint Expectation
5.2.1 Definition and interpretation
5.2.2 Covariance and correlation coeffcient
5.2.3 Independence and correlation
5.2.4 Computing correlation from data
5.3 Conditional PMF and PDF
5.3.1 Conditional PMF
5.3.2 Conditional PDF
5.4 Conditional Expectation
5.4.1 Definition
5.4.2 Law of total expectation
5.5 Sum of Two Random Variables
5.5.1 Intuition through convolution
5.5.2 Main result
5.5.3 Sum of common distributions
5.6 Random Vector and Covariance Matrices
5.6.1 PDF of random vectors
5.6.2 Expectation of random vectors
5.6.3 Covariance matrix
5.6.4 Multi-dimensional Gaussian
5.7 Transformaiton of Multi-dimensional Gaussian
5.7.1 Linear transformation of mean and covariance
5.7.2 Eigenvalues and eigenvectors
5.7.3 Covariance matrices are always positive semi-definite
5.7.4 Gaussian whitening
5.8 Principal Component Analysis
5.8.1 The main idea: Eigen-decomposition
5.8.2 The Eigenface problem
5.8.3 What cannot be analyzed by PCA?
Chapter 6 Sample Statistics
6.1 Moment Generating and Characteristic Functions
6.1.1 Moment Generating Function
6.1.2 Sum of independent variables via MGF
6.1.3 Characteristic Functions
6.2 Probability Inequalities
6.2.1 Union bound
6.2.2 Cauchy-Schwarz's inequality
6.2.3 Jensen's inequality
6.2.4 Markov's inequality
6.2.5 Chebyshev's inequality
6.2.6 Chernoff's bound
6.2.7 Comparing Chernoff and Chebyshev
6.2.8 Hoeffding's inequality
6.3 Law of Large Numbers
6.3.1 Sample average
6.3.2 Weak law of large numbers (WLLN)
6.3.3 Convergence in probability
6.3.4 Can we prove WLLN using Chernoff's bound?
6.3.5 Does weak of large numbers always hold?
6.3.6 Strong law of large numbers
6.3.7 Almost sure convergence
6.3.8 Proof of strong law of large numbers
6.4 Central Limit Theorem
6.4.1 Convergence in distribution
6.4.2 Central Limit Theorem
6.4.3 Examples
6.4.4 Limitation of the Central Limit Theorem
Chapter 7 Regression
7.1 Principles of Regression
7.1.1 Intuition: how to fit a straight line?
7.1.2 Solving the linear regression problem
7.1.3 Extension: Beyond a straight line
7.1.4 Over-determined and under-determined systems
7.1.5 Robust linear regression
7.2 Over-ftting
7.2.1 Overview of overfitting
7.2.2 Analysis of the linear case
7.2.3 Interpreting the linear analysis results
7.3 Bias and variance trade off
7.3.1 Decomposing the testing error
7.3.2 Analysis of the bias
7.3.3 Variance
7.3.4 Bias and variance on the learning curve
7.4 Regularization
7.4.1 Ridge regularization
7.4.2 LASSO regularization
Chapter 8 Estimation
8.1 Maximum-Likelihood Estimation
8.1.1 Likelihood function
8.1.2 Maximum-likelihood estimate
8.1.3 Application 1: Social network analysis
8.1.4 Application 2: Reconstructing images
8.1.5 More examples on ML estimation
8.1.6 Regression vs ML estimation
8.2 Properties of ML Estimates
8.2.1 Estimators
8.2.2 Unbiased estimators
8.2.3 Consistent estimators
8.2.4 Invariance principle
8.3 Maximum-A-Posteriori Estimation
8.3.1 The trio of likelihood, prior, and posterior
8.3.2 Understanding the priors
8.3.3 MAP formulation and solution
8.3.4 Analyzing the MAP solution
8.3.5 Analysis of the posterior distribution
8.3.6 Conjugate Prior
8.3.7 Linking MAP with regression
8.4 Mean-Square Error Estimation
8.4.1 Positioning the mean square error estimation
8.4.2 Mean square error
8.4.3 MMSE solution = conditional expectation
8.4.4 MMSE estimator for multi-dimensional Gaussian
8.4.5 Linking MMSE and neural networks
Chapter 9 Confidence and Hypothesis
9.1 Confidence Interval
9.1.1 The randomness of an estimator
9.1.2 Understanding confidence intervals
9.1.3 Constructing a confidence interval
9.1.4 Properties about the confidence interval
9.1.5 Student's -distribution
9.1.6 Comparing Student's -distribution and Gaussian
9.2 Bootstrap
9.2.1 A brute force approach
9.2.2 Bootstrap
9.3 Hypothesis Testing
9.3.1 What is a hypothesis?
9.3.2 Critical-value test
9.3.3 -value test
9.3.4 -test and -test
9.4 Neyman-Pearson Test
9.4.1 Null and alternative distributions
9.4.2 Type 1 and type 2 error
9.4.3 Neyman-Pearson decision
9.5 ROC and Precision-Recall Curve
9.5.1 Receiver Operating Characteristic (ROC)
9.5.2 Comparing ROC curves
9.5.3 ROC curve in practice
9.5.4 Precision-Recall (PR) curve
Chapter 10 Random Processes
10.1 Basic Concepts
10.1.1 Everything you need to know about a random process
10.1.2 Statistical and temporal perspectives
10.2 Mean and Correlation Functions
10.2.1 Mean function
10.2.2 Autocorrelation function
10.2.3 Independent Processes
10.3 Wide Sense Stationary Processes
10.3.1 Definition of a WSS process
10.3.2 Properties of
10.3.3 Physical Interpretation of
10.4 Power Spectral Density
10.4.1 Basic concepts
10.4.2 Origin of the power spectral density
10.5 WSS Process through LTI Systems
10.5.1 Review of a linear time-invariant (LTI) system
10.5.2 Mean and autocorrelation through LTI Systems
10.5.3 Power spectral density through LTI systems
10.5.4 Cross-correlation through LTI Systems
10.6 Optimal Linear Filter
10.6.1 Discrete-time random processes
10.6.2 Problem formulation
10.6.3 Yule-Walker equation
10.6.4 Linear prediction
10.6.5 Wiener Filter
No hay comentarios:
Publicar un comentario