I’ve decided to maintain a wishlist of things I’d like to work on whenever my interest peaks. Clearly, this is a weakly organized scratchpad.


These are works that I’ve only glossed over and never delved deeper. Sometime in the future, I’d like to understand them better.

Gaussian Processes

GP priors are unusually interesting. They lead to smooth interpolators with uncertainty estimates. Some parts of the inference are embarrasingly parallelizable. See references for my readings. The background section of Exact GPs on a Million Points is also full of great references.


Kernels can be applied to almost any kind of data and provide a notion of “similarity”. They are like the grand-old-daddy of feature representation (and quite powerful at that). The famous Representer Theorem and its generalizations provide a powerful list of results, many of which I don’t fully understand. See references for my readings.

Statistical Learning Theory

For this topic, I don’t have a particularly defined scope. My primary qualm with most treatment on this topic is the reliance on plenty of obscure sounding inqualities (except for a few popular ones like Markov’s, Hoeffding’s, Chebychev’s, Azuma’s). Often times, I think this is more of an art of posing the question what to put a bound on? and then put out some results in terms of a function of error tolerance $\epsilon$ and confidence $\delta$. I might have a myopic view on this and still looking for big-picture treatment. See references for my readings. I’m looking for a general attack recipe for such kind of bounds analysis.

Bayesian Monte Carlo

MCMC methods have been perhaps the most popular methods to solve intractable probabilistic models, however it has its fair share of criticism . I kind of share the same feelings (which I spent a lot of time with in my Masters thesis) though I wouldn’t rule it out just yet. There’s another family of work called the Bayesian Monte Carlo . I believe variants are also known as Adaptive Bayesian Quadrature which in principle allows us to encode smoothness priors into the algorithm for sample efficieny, albeit at a computational cost. Quite frankly, I have not found enough friendly resources to dig enough into this topic but it definitely is an exciting direction to pursue.


Most of these are gaps in my knowledge. Looking for resources and perspectives to improve my understanding. I hope to summarize answers to these as blog posts.


Updates and Corrections

If you see mistakes or want to suggest changes, please create issues or revisions against source.