This book is known as the textbook for machine learning learners. It covers various algorithm and the theory underline. It’s hard to learn too! So, I have to find the complexity of each part in order to study more productivity.

I found the guideline and complexity reference from this Japanese page. So, I just translate & copy them to here. I have read a small parts of this book, these guideline is fair ok IMO.

Table of Contents

1. Levels

The basic theory and methods for beginers. These part should be read first.
Basic Bayes inference and a bit adavanced contents. Beside tha, it contents some useful methods for special cases also. These parts target are doctoral students.
About the advanced contents with deep theory underline. So, these will be fittable for doctoral studiers, researcher and machine learning’s engineers.

2. Chapters

2.1. Chapter 1: Introduction

πŸ˜„ 1Introduction
πŸ˜„ 1.1Example: Polynomial Curve Fitting
πŸ˜„ 1.2Probability Theory
πŸ˜„ 1.2.1Probability densities
πŸ˜„ 1.2.2Expectations and covariances
πŸ˜„ 1.2.3Bayesian probabilities
πŸ˜„ 1.2.4The Gaussian distribution
πŸ˜„ 1.2.5Curve fitting re-visited
😊 1.2.6Bayesian curve fitting
πŸ˜„ 1.3Model Selection
πŸ˜„ 1.4The Curse of Dimensionality
πŸ˜„ 1.5Decision Theory
πŸ˜„ 1.5.1Minimizing the misclassification rate
πŸ˜„ 1.5.2Minimizing the expected loss
😰 1.5.3The reject option
😊 1.5.4Inference and decision
πŸ˜„ 1.5.5Loss functions for regression
πŸ˜„ 1.6Information Theory
πŸ˜„ 1.6.1Relative entropy and mutual information

2.2. Chapter 2: Probability Distributions

πŸ˜„ 2Probability Distributions
πŸ˜„ 2.1Binary Variables
πŸ˜„ 2.1.1The beta distribution
πŸ˜„ 2.2Multinomial Variables
πŸ˜„ 2.2.1The Dirichlet distribution
πŸ˜„ 2.3The Gaussian Distribution
πŸ˜„ 2.3.1Conditional Gaussian distributions
πŸ˜„ 2.3.2Marginal Gaussian distributions
πŸ˜„ 2.3.3Bayes’ theorem for Gaussian variables
πŸ˜„ 2.3.4Maximum likelihood for the Gaussian
😊 2.3.5Sequential estimation
😊 2.3.6Bayesian inference for the Gaussian
😊 2.3.7Student’s t-distribution
😰 2.3.8Periodic variables
πŸ˜„ 2.3.9Mixtures of Gaussians
πŸ˜„ 2.4The Exponential Family
πŸ˜„ 2.4.1Maximum likelihood and sufficient statistics
😊 2.4.2Conjugate priors
😊 2.4.3Noninformative priors
πŸ˜„ 2.5Nonparametric Methods
πŸ˜„ 2.5.1Kernel density estimators
πŸ˜„ 2.5.2Nearest-neighbour methods

2.3. Chapter 3: Linear Models for Regression

πŸ˜„ 3Linear Models for Regression
πŸ˜„ 3.1Linear Basis Function Models
πŸ˜„ 3.1.1Maximum likelihood and least squares
😰 3.1.2Geometry of least squares
😊 3.1.3Sequential learning
😊 3.1.4Regularized least squares
😰 3.1.5Multiple outputs
πŸ˜„ 3.2The Bias-Variance Decomposition
😊 3.3Bayesian Linear Regression
😊 3.3.1Parameter distribution
😊 3.3.2Predictive distribution
😊 3.3.3Equivalent kernel
😰 3.4Bayesian Model Comparison
😰 3.5The Evidence Approximation
😰 3.5.1Evaluation of the evidence function
😰 3.5.2Maximizing the evidence function
😰 3.5.3Effective number of parameters
😊 3.6Limitations of Fixed Basis Functions

2.4. Chapter 4: Linear Models for Classification

πŸ˜„ 4Linear Models for Regression
πŸ˜„ 4.1Discriminant Functions
πŸ˜„ 4.1.1Two classes
πŸ˜„ 4.1.2Multiple classes
πŸ˜„ 4.1.3Lest squares for classification
πŸ˜„ 4.1.4Fisher’s linear discriminant
πŸ˜„ 4.1.5Relation to least squares
πŸ˜„ 4.1.6Fisher’s discriminant for multiple classes
πŸ˜„ 4.1.7The perceptron algorithm
πŸ˜„ 4.2Probabilistic Generative Models
πŸ˜„ 4.2.1Continuous inputs
πŸ˜„ 4.2.2Maximum likelihood solution
πŸ˜„ 4.2.3Discrete features
πŸ˜„ 4.2.4Exponential family
πŸ˜„ 4.3Probabilistic Discriminant Models
πŸ˜„ 4.3.1Fixed basis functions
πŸ˜„ 4.3.2Logistic regression
πŸ˜„ 4.3.3Interative reweighted least squares
πŸ˜„ 4.3.4Multiclass logistic regression
😰 4.3.5Probit regression
😰 4.3.6Canonical link functions
😊 4.4The Laplace Approximation
😊 4.4.1Model comparison and BIC
😊 4.5Bayesian Logistic Regression
😊 4.5.1Laplace approximation
😊 4.5.2Predictive distribution

2.5. Chapter 5: Neural Networks

πŸ˜„ 5Neural Networks
πŸ˜„ 5.1Feed-forward Networks Functions
πŸ˜„ 5.1.1Weight-space symmetries
πŸ˜„ 5.2Network Training
πŸ˜„ 5.2.1Parameter optimization
πŸ˜„ 5.2.2Local quadratic approximation
πŸ˜„ 5.2.3Use of gradient information
πŸ˜„ 5.2.4Gradient descent optimization
πŸ˜„ 5.3Error Backpropagation
πŸ˜„ 5.3.1Evaluation of error-function derivatives
πŸ˜„ 5.3.2A simple example
πŸ˜„ 5.3.3Efficiency of backpropagation
😰 5.3.4The Jacobian matrix
😰 5.4The Hessian Matrix
😰 5.4.1Diagonal approximation
😰 5.4.2Outer product approximation
😰 5.4.3Inverse Hessian
😰 5.4.4Finite differences
😰 5.4.5Exact evaluation of the Hessian
😰 5.4.6Fast multiplication by the Hessian
😊 5.5Regularization in Neural Networks
😊 5.5.1Consistent Gaussian priors
😊 5.5.2Early stopping
😰 5.5.3Invariances
😰 5.5.4Tangent propagation
😰 5.5.5Training with transformed data
😰 5.5.6Convolutional networks
😰 5.5.7Soft weight sharing
😰 5.6Mixture Density Networks
😰 5.7Bayesian Neural Networks
😰 5.7.1Posterior parameter distribution
😰 5.7.2Hyperparameter optimization
😰 5.7.3Bayesian neural networks for classification

2.6. Chapter 6: Kernel Methods

πŸ˜„ 6Kernel Methods
πŸ˜„ 6.1Dual Representaions
πŸ˜„ 6.3Constructing Kernels
πŸ˜„ 6.3Radial Basis Function Networks
πŸ˜„ 6.3.1Nadaraya-Watson model
😰 6.4Gaussian Processes
😰 6.4.1Linear regression revisited
😰 6.4.2Gaussian processes for regression
😰 6.4.3Learning the hyperparameter
😰 6.4.4Automatic relevance determination
😰 6.4.5Gaussian processes for classification
😰 6.4.6Laplace approximation
😰 6.4.7Connection to neural networks

2.7. Chapter 7: Sparse Kernel Machines

πŸ˜„ 7Sparse Kernel Machines
πŸ˜„ 7.1Maximum Margin Classifiers
πŸ˜„ 7.1.1Overlapping class distributions
😊 7.1.2Relation to logistic regression
😊 7.1.3Multiclass SVMs
😊 7.1.4SVMs for regression
😊 7.1.5Computational learning theory
😰 7.2Relevance Vector Machines
😰 7.2.1RVM for regression
😰 7.2.2Analysis of sparsity
😰 7.2.3RVM for classification

2.8. Chapter 8: Graphical Models

😊 8Graphical Models
😊 8.1Bayesian Networks
😊 8.1.1Example: Polynomial regression
😊 8.1.2Generative models
😊 8.1.3Discrete variables
😊 8.1.4Linear-Gaussian models
😊 8.2Conditional Independence
😊 8.2.1Three example graphs
😊 8.2.2D-separation
😊 8.3Markov Random Fields
😊 8.3.1Conditional independence properties
😊 8.3.2Factorization properties
😊 8.3.3Illustration: Image de-noising
😊 8.3.4Relation to directed graphs
😊 8.4inference in Graphical Models
😊 8.4.1Inference on a chain
😊 8.4.2Trees
😊 8.4.3Factor graphs
😊 8.4.4The sum-product algorithm
😊 8.4.5The max-sum algorithm
😰 8.4.6Exact inference in general graphs
😰 8.4.7Loopy belief propagation
😰 8.4.8Learning the graph structure

2.9. Chapter 9: Mixture Models and EM

πŸ˜„ 9Mixture Models and EM
πŸ˜„ 9.1K-means Clustering
πŸ˜„ 9.1.1Image segmentation and compression
πŸ˜„ 9.2Mixtures of Gaussians
πŸ˜„ 9.2.1Maximum likelihood
πŸ˜„ 9.2.2EM for Gaussian mixtures
πŸ˜„ 9.3An Alternative View of EM
πŸ˜„ 9.3.1Gaussian mixtures revisited
πŸ˜„ 9.3.2Relation to K-means
😰 9.3.3Mixtures of Bernoulli distributions
😰 9.3.4EM for Bayesian linear regression
😊 9.4The EM Algorithm in General

2.10. Chapter 10: Approximate Inference

😊 10Approximate Inference
😊 10.1Variational Inference
😊 10.1.1Factorized distributions
😊 10.1.2Properties of factorized approximations
😊 10.1.3Example: The univariate Gaussian
😊 10.1.4Model comparison
😊 10.2Illustration: Variational Mixture of Gaussians
😊 10.2.1Variational distribution
😊 10.2.2Variational lower bound
😊 10.2.3Predictive density
😰 10.2.4Determining the number of components
😰 10.2.5Induced factorizations
😰 10.3Variational Linear Regression
😰 10.3.1Variational distribution
😰 10.3.2Predictive distribution
😰 10.3.3Lower bound
😰 10.4Exponential Family Distributions
😰 10.4.1Variational message passing
😰 10.5Local Variational Methods
😰 10.6Variational Logistic Regression
😰 10.6.1Variational posterior distribution
😰 10.6.2Optimizing the variational parameters
😰 10.6.3Inference of hyperparameters
😰 10.7Expectation Propagation
😰 10.7.1Example: The clutter problem
😰 10.7.2Expectation propagation of graphs

2.11. Chapter 11: Sampling Methods

😊 11Sampling Methods
😊 11.1Basis Sampling Algorithms
😊 11.1.1Standard distributions
😊 11.1.2Rejection sampling
😰 11.1.3Adaptive rejection sampling
😰 11.1.4Importance sampling
😰 11.1.5Sampling-importance-resampling
😰 11.1.6Sampling and EM algorithm
😊 11.2Markov Chain Monte Carlo
😊 11.2.1Markov chains
😊 11.2.2The Metropolis-Hastings algorithm
😊 11.3Gibbs Sampling
😰 11.4Slice Sampling
😰 11.5The Hybrid Monte Carlo Algorithm
😰 11.5.1Dynamical systems
😰 11.5.2Hybrid Monte Carlo
😰 11.6Estimating the Partition Function

2.12. Chapter 12: Continuous Latent Variables

πŸ˜„ 12Continuous Latent Variables
πŸ˜„ 12.1Principal Component Analysis
πŸ˜„ 12.1.1Maximum variance formulation
πŸ˜„ 12.1.2Minimum-error formulation
πŸ˜„ 12.1.3Applications of peA
πŸ˜„ 12.1.4PCA for high-dimensional data
😰 12.2Probabilistic p e A
😰 12.2.1Maximum likelihood peA
😰 12.2.2EM algorithm for peA
😰 12.2.3Bayesian peA
😰 12.2.4Factor analysis
😊 12.3Kernel PCA
😰 12.4Nonliear Latent Variable Models
😰 12.4.1Independent component analysis
😰 12.4.2Autoassociative neural networks
😰 12.4.3Modelling nonlinear manifolds

2.13. Chapter 13: Sequential Data

😊 13Sequential Data
😊 13.1Markov Models
😊 13.2Hidden Markov Models
😊 13.2.1Maximum likelihood for the HMM
😊 13.2.2The forward-backward algorithm
😰 13.2.3The sum-product algorithm for the HMM
😰 13.2.4Scaling factors
😊 13.2.5The Viterbi algorithm
😰 13.2.6Extensions of the hidden Markov model
😊 13.3Linear Dynamical Systems
😊 13.3.1Inference in LDS
😊 13.3.2Learning in LDS
😰 13.3.3Extensions of LDS
😰 13.3.4Particle filters

2.14. Chapter 14: Combining Models

😊 14Combining Models
😊 14.1Bayesian Model Averaging
😊 14.2Committees
😊 14.3Boosting
😊 14.3.1Minimizing exponential error
😊 14.3.2Error functions for boosting
πŸ˜„ 14.4Tree-based Models
😰 14.5Conditional Mixture Models
😰 14.5.1Mixtures of linear regression models
😰 14.5.2Mixtures of logistic models
😰 14.5.3Mixtures of experts