This book is known as the textbook for machine learning learners. It covers various algorithm and the theory underline. It’s hard to learn too! So, I have to find the complexity of each part in order to study more productivity.

I found the guideline and complexity reference from this Japanese page. So, I just translate & copy them to here. I have read a small parts of this book, these guideline is fair ok IMO.

Table of Contents

1. Levels

LevelContent
πŸ˜„
Basic
The basic theory and methods for beginers. These part should be read first.
😊
Intermediate
Basic Bayes inference and a bit adavanced contents. Beside tha, it contents some useful methods for special cases also. These parts target are doctoral students.
😰
Advance
About the advanced contents with deep theory underline. So, these will be fittable for doctoral studiers, researcher and machine learning’s engineers.

2. Chapters

2.1. Chapter 1: Introduction

IndexTitle
πŸ˜„ 1Introduction
πŸ˜„ 1.1Example: Polynomial Curve Fitting
πŸ˜„ 1.2Probability Theory
πŸ˜„ 1.2.1Probability densities
πŸ˜„ 1.2.2Expectations and covariances
πŸ˜„ 1.2.3Bayesian probabilities
πŸ˜„ 1.2.4The Gaussian distribution
πŸ˜„ 1.2.5Curve fitting re-visited
😊 1.2.6Bayesian curve fitting
πŸ˜„ 1.3Model Selection
πŸ˜„ 1.4The Curse of Dimensionality
πŸ˜„ 1.5Decision Theory
πŸ˜„ 1.5.1Minimizing the misclassification rate
πŸ˜„ 1.5.2Minimizing the expected loss
😰 1.5.3The reject option
😊 1.5.4Inference and decision
πŸ˜„ 1.5.5Loss functions for regression
πŸ˜„ 1.6Information Theory
πŸ˜„ 1.6.1Relative entropy and mutual information

2.2. Chapter 2: Probability Distributions

IndexTitle
πŸ˜„ 2Probability Distributions
πŸ˜„ 2.1Binary Variables
πŸ˜„ 2.1.1The beta distribution
πŸ˜„ 2.2Multinomial Variables
πŸ˜„ 2.2.1The Dirichlet distribution
πŸ˜„ 2.3The Gaussian Distribution
πŸ˜„ 2.3.1Conditional Gaussian distributions
πŸ˜„ 2.3.2Marginal Gaussian distributions
πŸ˜„ 2.3.3Bayes’ theorem for Gaussian variables
πŸ˜„ 2.3.4Maximum likelihood for the Gaussian
😊 2.3.5Sequential estimation
😊 2.3.6Bayesian inference for the Gaussian
😊 2.3.7Student’s t-distribution
😰 2.3.8Periodic variables
πŸ˜„ 2.3.9Mixtures of Gaussians
πŸ˜„ 2.4The Exponential Family
πŸ˜„ 2.4.1Maximum likelihood and sufficient statistics
😊 2.4.2Conjugate priors
😊 2.4.3Noninformative priors
πŸ˜„ 2.5Nonparametric Methods
πŸ˜„ 2.5.1Kernel density estimators
πŸ˜„ 2.5.2Nearest-neighbour methods

2.3. Chapter 3: Linear Models for Regression

IndexTitle
πŸ˜„ 3Linear Models for Regression
πŸ˜„ 3.1Linear Basis Function Models
πŸ˜„ 3.1.1Maximum likelihood and least squares
😰 3.1.2Geometry of least squares
😊 3.1.3Sequential learning
😊 3.1.4Regularized least squares
😰 3.1.5Multiple outputs
πŸ˜„ 3.2The Bias-Variance Decomposition
😊 3.3Bayesian Linear Regression
😊 3.3.1Parameter distribution
😊 3.3.2Predictive distribution
😊 3.3.3Equivalent kernel
😰 3.4Bayesian Model Comparison
😰 3.5The Evidence Approximation
😰 3.5.1Evaluation of the evidence function
😰 3.5.2Maximizing the evidence function
😰 3.5.3Effective number of parameters
😊 3.6Limitations of Fixed Basis Functions

2.4. Chapter 4: Linear Models for Classification

IndexTitle
πŸ˜„ 4Linear Models for Regression
πŸ˜„ 4.1Discriminant Functions
πŸ˜„ 4.1.1Two classes
πŸ˜„ 4.1.2Multiple classes
πŸ˜„ 4.1.3Lest squares for classification
πŸ˜„ 4.1.4Fisher’s linear discriminant
πŸ˜„ 4.1.5Relation to least squares
πŸ˜„ 4.1.6Fisher’s discriminant for multiple classes
πŸ˜„ 4.1.7The perceptron algorithm
πŸ˜„ 4.2Probabilistic Generative Models
πŸ˜„ 4.2.1Continuous inputs
πŸ˜„ 4.2.2Maximum likelihood solution
πŸ˜„ 4.2.3Discrete features
πŸ˜„ 4.2.4Exponential family
πŸ˜„ 4.3Probabilistic Discriminant Models
πŸ˜„ 4.3.1Fixed basis functions
πŸ˜„ 4.3.2Logistic regression
πŸ˜„ 4.3.3Interative reweighted least squares
πŸ˜„ 4.3.4Multiclass logistic regression
😰 4.3.5Probit regression
😰 4.3.6Canonical link functions
😊 4.4The Laplace Approximation
😊 4.4.1Model comparison and BIC
😊 4.5Bayesian Logistic Regression
😊 4.5.1Laplace approximation
😊 4.5.2Predictive distribution

2.5. Chapter 5: Neural Networks

IndexTitle
πŸ˜„ 5Neural Networks
πŸ˜„ 5.1Feed-forward Networks Functions
πŸ˜„ 5.1.1Weight-space symmetries
πŸ˜„ 5.2Network Training
πŸ˜„ 5.2.1Parameter optimization
πŸ˜„ 5.2.2Local quadratic approximation
πŸ˜„ 5.2.3Use of gradient information
πŸ˜„ 5.2.4Gradient descent optimization
πŸ˜„ 5.3Error Backpropagation
πŸ˜„ 5.3.1Evaluation of error-function derivatives
πŸ˜„ 5.3.2A simple example
πŸ˜„ 5.3.3Efficiency of backpropagation
😰 5.3.4The Jacobian matrix
😰 5.4The Hessian Matrix
😰 5.4.1Diagonal approximation
😰 5.4.2Outer product approximation
😰 5.4.3Inverse Hessian
😰 5.4.4Finite differences
😰 5.4.5Exact evaluation of the Hessian
😰 5.4.6Fast multiplication by the Hessian
😊 5.5Regularization in Neural Networks
😊 5.5.1Consistent Gaussian priors
😊 5.5.2Early stopping
😰 5.5.3Invariances
😰 5.5.4Tangent propagation
😰 5.5.5Training with transformed data
😰 5.5.6Convolutional networks
😰 5.5.7Soft weight sharing
😰 5.6Mixture Density Networks
😰 5.7Bayesian Neural Networks
😰 5.7.1Posterior parameter distribution
😰 5.7.2Hyperparameter optimization
😰 5.7.3Bayesian neural networks for classification

2.6. Chapter 6: Kernel Methods

IndexTitle
πŸ˜„ 6Kernel Methods
πŸ˜„ 6.1Dual Representaions
πŸ˜„ 6.3Constructing Kernels
πŸ˜„ 6.3Radial Basis Function Networks
πŸ˜„ 6.3.1Nadaraya-Watson model
😰 6.4Gaussian Processes
😰 6.4.1Linear regression revisited
😰 6.4.2Gaussian processes for regression
😰 6.4.3Learning the hyperparameter
😰 6.4.4Automatic relevance determination
😰 6.4.5Gaussian processes for classification
😰 6.4.6Laplace approximation
😰 6.4.7Connection to neural networks

2.7. Chapter 7: Sparse Kernel Machines

IndexTitle
πŸ˜„ 7Sparse Kernel Machines
πŸ˜„ 7.1Maximum Margin Classifiers
πŸ˜„ 7.1.1Overlapping class distributions
😊 7.1.2Relation to logistic regression
😊 7.1.3Multiclass SVMs
😊 7.1.4SVMs for regression
😊 7.1.5Computational learning theory
😰 7.2Relevance Vector Machines
😰 7.2.1RVM for regression
😰 7.2.2Analysis of sparsity
😰 7.2.3RVM for classification

2.8. Chapter 8: Graphical Models

IndexTitle
😊 8Graphical Models
😊 8.1Bayesian Networks
😊 8.1.1Example: Polynomial regression
😊 8.1.2Generative models
😊 8.1.3Discrete variables
😊 8.1.4Linear-Gaussian models
😊 8.2Conditional Independence
😊 8.2.1Three example graphs
😊 8.2.2D-separation
😊 8.3Markov Random Fields
😊 8.3.1Conditional independence properties
😊 8.3.2Factorization properties
😊 8.3.3Illustration: Image de-noising
😊 8.3.4Relation to directed graphs
😊 8.4inference in Graphical Models
😊 8.4.1Inference on a chain
😊 8.4.2Trees
😊 8.4.3Factor graphs
😊 8.4.4The sum-product algorithm
😊 8.4.5The max-sum algorithm
😰 8.4.6Exact inference in general graphs
😰 8.4.7Loopy belief propagation
😰 8.4.8Learning the graph structure

2.9. Chapter 9: Mixture Models and EM

IndexTitle
πŸ˜„ 9Mixture Models and EM
πŸ˜„ 9.1K-means Clustering
πŸ˜„ 9.1.1Image segmentation and compression
πŸ˜„ 9.2Mixtures of Gaussians
πŸ˜„ 9.2.1Maximum likelihood
πŸ˜„ 9.2.2EM for Gaussian mixtures
πŸ˜„ 9.3An Alternative View of EM
πŸ˜„ 9.3.1Gaussian mixtures revisited
πŸ˜„ 9.3.2Relation to K-means
😰 9.3.3Mixtures of Bernoulli distributions
😰 9.3.4EM for Bayesian linear regression
😊 9.4The EM Algorithm in General

2.10. Chapter 10: Approximate Inference

IndexTitle
😊 10Approximate Inference
😊 10.1Variational Inference
😊 10.1.1Factorized distributions
😊 10.1.2Properties of factorized approximations
😊 10.1.3Example: The univariate Gaussian
😊 10.1.4Model comparison
😊 10.2Illustration: Variational Mixture of Gaussians
😊 10.2.1Variational distribution
😊 10.2.2Variational lower bound
😊 10.2.3Predictive density
😰 10.2.4Determining the number of components
😰 10.2.5Induced factorizations
😰 10.3Variational Linear Regression
😰 10.3.1Variational distribution
😰 10.3.2Predictive distribution
😰 10.3.3Lower bound
😰 10.4Exponential Family Distributions
😰 10.4.1Variational message passing
😰 10.5Local Variational Methods
😰 10.6Variational Logistic Regression
😰 10.6.1Variational posterior distribution
😰 10.6.2Optimizing the variational parameters
😰 10.6.3Inference of hyperparameters
😰 10.7Expectation Propagation
😰 10.7.1Example: The clutter problem
😰 10.7.2Expectation propagation of graphs

2.11. Chapter 11: Sampling Methods

IndexTitle
😊 11Sampling Methods
😊 11.1Basis Sampling Algorithms
😊 11.1.1Standard distributions
😊 11.1.2Rejection sampling
😰 11.1.3Adaptive rejection sampling
😰 11.1.4Importance sampling
😰 11.1.5Sampling-importance-resampling
😰 11.1.6Sampling and EM algorithm
😊 11.2Markov Chain Monte Carlo
😊 11.2.1Markov chains
😊 11.2.2The Metropolis-Hastings algorithm
😊 11.3Gibbs Sampling
😰 11.4Slice Sampling
😰 11.5The Hybrid Monte Carlo Algorithm
😰 11.5.1Dynamical systems
😰 11.5.2Hybrid Monte Carlo
😰 11.6Estimating the Partition Function

2.12. Chapter 12: Continuous Latent Variables

IndexTitle
πŸ˜„ 12Continuous Latent Variables
πŸ˜„ 12.1Principal Component Analysis
πŸ˜„ 12.1.1Maximum variance formulation
πŸ˜„ 12.1.2Minimum-error formulation
πŸ˜„ 12.1.3Applications of peA
πŸ˜„ 12.1.4PCA for high-dimensional data
😰 12.2Probabilistic p e A
😰 12.2.1Maximum likelihood peA
😰 12.2.2EM algorithm for peA
😰 12.2.3Bayesian peA
😰 12.2.4Factor analysis
😊 12.3Kernel PCA
😰 12.4Nonliear Latent Variable Models
😰 12.4.1Independent component analysis
😰 12.4.2Autoassociative neural networks
😰 12.4.3Modelling nonlinear manifolds

2.13. Chapter 13: Sequential Data

IndexTitle
😊 13Sequential Data
😊 13.1Markov Models
😊 13.2Hidden Markov Models
😊 13.2.1Maximum likelihood for the HMM
😊 13.2.2The forward-backward algorithm
😰 13.2.3The sum-product algorithm for the HMM
😰 13.2.4Scaling factors
😊 13.2.5The Viterbi algorithm
😰 13.2.6Extensions of the hidden Markov model
😊 13.3Linear Dynamical Systems
😊 13.3.1Inference in LDS
😊 13.3.2Learning in LDS
😰 13.3.3Extensions of LDS
😰 13.3.4Particle filters

2.14. Chapter 14: Combining Models

IndexTitle
😊 14Combining Models
😊 14.1Bayesian Model Averaging
😊 14.2Committees
😊 14.3Boosting
😊 14.3.1Minimizing exponential error
😊 14.3.2Error functions for boosting
πŸ˜„ 14.4Tree-based Models
😰 14.5Conditional Mixture Models
😰 14.5.1Mixtures of linear regression models
😰 14.5.2Mixtures of logistic models
😰 14.5.3Mixtures of experts