Jia-Bin Huang
gives a very good intuition on KL-Divergence–starting
with entropy and finishing with unbiased estimator.
what i learned:
- cross-entropy is an
entropy based on another different distribution
- asymmetric metrics are more meaningful than symmetric metrics
- “10cm taller/shorter” vs “10cm difference”
- approximating KL-Divergence – biased/variance
- related blog post: http://joschu.net/blog/kl-approx.html
- “The general way to lower variance is with a control variate. I.e.,
take k1 and add something that has expectation zero but is negatively
correlated with k1”
- “The idea of measuring distance by looking at the difference between
a convex function and its tangent plane appears in many places. It’s
called a Bregman
divergence and has many beautiful properties.”