jueves, 21 de octubre de 2021

Matlab, Python, Julia - Book Introduction to Probability for Data Science

 

https://probability4datascience.com/index.html



Introduction to Probability for Data Science

Stanley H. Chan
An undergraduate textbook on probability for data science.

Michigan Publishing, 2021

ISBN 978-1-60785-746-4 (hardcover): Purchase from Amazon
ISBN 978-1-60785-747-1 (electronic) Free download from Univ. Michigan Publishing


Table of Content

Chapter 1 Mathematical Background

  • 1.1 Infinite series

    • 1.1.1 Geometric series

    • 1.1.2 Binomial Series

  • 1.2 Approximation

    • 1.2.1 Taylor approximation

    • 1.2.2 Exponential series

    • 1.2.3 Logarithmic approximation

  • 1.3 Integration

    • 1.3.1 Odd and even functions

    • 1.3.2 Fundamental theorem of calculus

  • 1.4 Linear Algebra

    • 1.4.1 Why do we need linear algebra in data science?

    • 1.4.2 Everything you need to know about linear algebra

    • 1.4.3 Inner products and norms

    • 1.4.4 Matrix calculus

  • 1.5 Basic Combinatorics

    • 1.5.1 Birthday paradox

    • 1.5.2 Permutation

    • 1.5.3 Combination

Chapter 2 Probability

  • 2.1 Set Theory

    • 2.1.1 Why study set theory?

    • 2.1.2 Basic concepts of a set

    • 2.1.3 Subsets

    • 2.1.4 Empty set and universal set

    • 2.1.5 Union

    • 2.1.6 Intersection

    • 2.1.7 Complement and difference

    • 2.1.8 Disjoint and partition

    • 2.1.9 Set operations

    • 2.1.10 Closing Remark

  • 2.2 Probability Space

    • 2.2.1 Sample space Omega

    • 2.2.2 Event space mathcal{F}

    • 2.2.3 Probability law mathbf{P}

    • 2.2.4 Measure zero sets

    • 2.2.5 Summary of the probability space

  • 2.3 Axioms of Probability

    • 2.3.1 Why these three probability axioms?

    • 2.3.2 Axioms through the lens of measure

    • 2.3.3 Corollaries derived from axioms

  • 2.4 Conditional Probability

    • 2.4.1 Definition of conditional probability

    • 2.4.2 Independence

    • 2.4.3 Bayes’ theorem and law of total probability

    • 2.4.4 Prisoner's dilemma

Chapter 3 Discrete Random Variables

  • 3.1 Random Variables

    • 3.1.1 A motivating example

    • 3.1.2 Definition of a random variable

    • 3.1.3 Probability measure on random variables

  • 3.2 Probability Mass Function

    • 3.2.1 Definition

    • 3.2.2 PMF and probability measure

    • 3.2.3 Normalization Property

    • 3.2.4 PMF vs Histogram

    • 3.2.5 Estimating histograms from real data

  • 3.3 Cumulative Distribution Function

    • 3.3.1 Definition

    • 3.3.2 Properties of CDF

    • 3.3.3 Converting between PMF and CDF

  • 3.4 Expectation

    • 3.4.1 Definition

    • 3.4.2 Existence of Expectation

    • 3.4.3 Properties of Expectation

    • 3.4.4 Moments and Variance

  • 3.5 Common Discrete Random Variables

    • 3.5.1 Bernoulli Random Variable

    • 3.5.2 Binomial random variable

    • 3.5.3 Geometric random variable

    • 3.5.4 Poisson random variable

Chapter 4 Continuous Random Variables

  • 4.1 Probability Density Function

    • 4.1.1 Some intuition about probability density functions

    • 4.1.2 More in-depth discussion about PDFs

    • 4.1.3 Connecting with PMF

  • 4.2 Expectation, Moment, and Variance

    • 4.2.1 Definition and properties

    • 4.2.2 Existence of Expectation

    • 4.2.3 Moment and Variance

  • 4.3 Cumulative Distribution Function

    • 4.3.1 CDF for continuous random variables

    • 4.3.2 Properties of CDF

    • 4.3.3 Retrieving PDF from CDF

    • 4.3.4 CDF: Unifying discrete and continuous random variables

  • 4.4 Median, Mode, and Mean

    • 4.4.1 Median

    • 4.4.2 Mode

    • 4.4.3 Mean

  • 4.5 Uniform and Exponential Random Variables

    • 4.5.1 Uniform Random Variable

    • 4.5.2 Exponential Random Variable

    • 4.5.3 Origin of exponential random variable

    • 4.5.4 Applications of exponential random variables

  • 4.6 Gaussian Random Variables

    • 4.6.1 Definition of a Gaussian random variable

    • 4.6.2 Standard Gaussian

    • 4.6.3 Skewness and Kurtosis

    • 4.6.4 Origin of Gaussian random variables

  • 4.7 Functions of Random Variables

    • 4.7.1 General principle

    • 4.7.2 Worked examples

  • 4.8 Generating Random Numbers

    • 4.8.1 Principle

    • 4.8.2 Examples

Chapter 5 Joint Distributions

  • 5.1 Joint PMF and Joint PDF

    • 5.1.1 Probability measure in 2D

    • 5.1.2 Discrete random variables

    • 5.1.3 Continuous random variables

    • 5.1.4 Normalization

    • 5.1.5 Marginal PMF and marginal PDF

    • 5.1.6 Independent random variables

    • 5.1.7 Joint CDF

  • 5.2 Joint Expectation

    • 5.2.1 Definition and interpretation

    • 5.2.2 Covariance and correlation coeffcient

    • 5.2.3 Independence and correlation

    • 5.2.4 Computing correlation from data

  • 5.3 Conditional PMF and PDF

    • 5.3.1 Conditional PMF

    • 5.3.2 Conditional PDF

  • 5.4 Conditional Expectation

    • 5.4.1 Definition

    • 5.4.2 Law of total expectation

  • 5.5 Sum of Two Random Variables

    • 5.5.1 Intuition through convolution

    • 5.5.2 Main result

    • 5.5.3 Sum of common distributions

  • 5.6 Random Vector and Covariance Matrices

    • 5.6.1 PDF of random vectors

    • 5.6.2 Expectation of random vectors

    • 5.6.3 Covariance matrix

    • 5.6.4 Multi-dimensional Gaussian

  • 5.7 Transformaiton of Multi-dimensional Gaussian

    • 5.7.1 Linear transformation of mean and covariance

    • 5.7.2 Eigenvalues and eigenvectors

    • 5.7.3 Covariance matrices are always positive semi-definite

    • 5.7.4 Gaussian whitening

  • 5.8 Principal Component Analysis

    • 5.8.1 The main idea: Eigen-decomposition

    • 5.8.2 The Eigenface problem

    • 5.8.3 What cannot be analyzed by PCA?

Chapter 6 Sample Statistics

  • 6.1 Moment Generating and Characteristic Functions

    • 6.1.1 Moment Generating Function

    • 6.1.2 Sum of independent variables via MGF

    • 6.1.3 Characteristic Functions

  • 6.2 Probability Inequalities

    • 6.2.1 Union bound

    • 6.2.2 Cauchy-Schwarz's inequality

    • 6.2.3 Jensen's inequality

    • 6.2.4 Markov's inequality

    • 6.2.5 Chebyshev's inequality

    • 6.2.6 Chernoff's bound

    • 6.2.7 Comparing Chernoff and Chebyshev

    • 6.2.8 Hoeffding's inequality

  • 6.3 Law of Large Numbers

    • 6.3.1 Sample average

    • 6.3.2 Weak law of large numbers (WLLN)

    • 6.3.3 Convergence in probability

    • 6.3.4 Can we prove WLLN using Chernoff's bound?

    • 6.3.5 Does weak of large numbers always hold?

    • 6.3.6 Strong law of large numbers

    • 6.3.7 Almost sure convergence

    • 6.3.8 Proof of strong law of large numbers

  • 6.4 Central Limit Theorem

    • 6.4.1 Convergence in distribution

    • 6.4.2 Central Limit Theorem

    • 6.4.3 Examples

    • 6.4.4 Limitation of the Central Limit Theorem

Chapter 7 Regression

  • 7.1 Principles of Regression

    • 7.1.1 Intuition: how to fit a straight line?

    • 7.1.2 Solving the linear regression problem

    • 7.1.3 Extension: Beyond a straight line

    • 7.1.4 Over-determined and under-determined systems

    • 7.1.5 Robust linear regression

  • 7.2 Over-ftting

    • 7.2.1 Overview of overfitting

    • 7.2.2 Analysis of the linear case

    • 7.2.3 Interpreting the linear analysis results

  • 7.3 Bias and variance trade off

    • 7.3.1 Decomposing the testing error

    • 7.3.2 Analysis of the bias

    • 7.3.3 Variance

    • 7.3.4 Bias and variance on the learning curve

  • 7.4 Regularization

    • 7.4.1 Ridge regularization

    • 7.4.2 LASSO regularization

Chapter 8 Estimation

  • 8.1 Maximum-Likelihood Estimation

    • 8.1.1 Likelihood function

    • 8.1.2 Maximum-likelihood estimate

    • 8.1.3 Application 1: Social network analysis

    • 8.1.4 Application 2: Reconstructing images

    • 8.1.5 More examples on ML estimation

    • 8.1.6 Regression vs ML estimation

  • 8.2 Properties of ML Estimates

    • 8.2.1 Estimators

    • 8.2.2 Unbiased estimators

    • 8.2.3 Consistent estimators

    • 8.2.4 Invariance principle

  • 8.3 Maximum-A-Posteriori Estimation

    • 8.3.1 The trio of likelihood, prior, and posterior

    • 8.3.2 Understanding the priors

    • 8.3.3 MAP formulation and solution

    • 8.3.4 Analyzing the MAP solution

    • 8.3.5 Analysis of the posterior distribution

    • 8.3.6 Conjugate Prior

    • 8.3.7 Linking MAP with regression

  • 8.4 Mean-Square Error Estimation

    • 8.4.1 Positioning the mean square error estimation

    • 8.4.2 Mean square error

    • 8.4.3 MMSE solution = conditional expectation

    • 8.4.4 MMSE estimator for multi-dimensional Gaussian

    • 8.4.5 Linking MMSE and neural networks

Chapter 9 Confidence and Hypothesis

  • 9.1 Confidence Interval

    • 9.1.1 The randomness of an estimator

    • 9.1.2 Understanding confidence intervals

    • 9.1.3 Constructing a confidence interval

    • 9.1.4 Properties about the confidence interval

    • 9.1.5 Student's t-distribution

    • 9.1.6 Comparing Student's t-distribution and Gaussian

  • 9.2 Bootstrap

    • 9.2.1 A brute force approach

    • 9.2.2 Bootstrap

  • 9.3 Hypothesis Testing

    • 9.3.1 What is a hypothesis?

    • 9.3.2 Critical-value test

    • 9.3.3 p-value test

    • 9.3.4 Z-test and t-test

  • 9.4 Neyman-Pearson Test

    • 9.4.1 Null and alternative distributions

    • 9.4.2 Type 1 and type 2 error

    • 9.4.3 Neyman-Pearson decision

  • 9.5 ROC and Precision-Recall Curve

    • 9.5.1 Receiver Operating Characteristic (ROC)

    • 9.5.2 Comparing ROC curves

    • 9.5.3 ROC curve in practice

    • 9.5.4 Precision-Recall (PR) curve

Chapter 10 Random Processes

  • 10.1 Basic Concepts

    • 10.1.1 Everything you need to know about a random process

    • 10.1.2 Statistical and temporal perspectives

  • 10.2 Mean and Correlation Functions

    • 10.2.1 Mean function

    • 10.2.2 Autocorrelation function

    • 10.2.3 Independent Processes

  • 10.3 Wide Sense Stationary Processes

    • 10.3.1 Definition of a WSS process

    • 10.3.2 Properties of R_X(tau)

    • 10.3.3 Physical Interpretation of R_X(tau)

  • 10.4 Power Spectral Density

    • 10.4.1 Basic concepts

    • 10.4.2 Origin of the power spectral density

  • 10.5 WSS Process through LTI Systems

    • 10.5.1 Review of a linear time-invariant (LTI) system

    • 10.5.2 Mean and autocorrelation through LTI Systems

    • 10.5.3 Power spectral density through LTI systems

    • 10.5.4 Cross-correlation through LTI Systems

  • 10.6 Optimal Linear Filter

    • 10.6.1 Discrete-time random processes

    • 10.6.2 Problem formulation

    • 10.6.3 Yule-Walker equation

    • 10.6.4 Linear prediction

    • 10.6.5 Wiener Filter



No hay comentarios:

Publicar un comentario