Tossaporn (Tree) Saengja

KL-Divergence

Jia-Bin Huang gives a very good intuition on KL-Divergence–starting with entropy and finishing with unbiased estimator.

what i learned:

cross-entropy is an entropy based on another different distribution
asymmetric metrics are more meaningful than symmetric metrics
- “10cm taller/shorter” vs “10cm difference”
approximating KL-Divergence – biased/variance
- related blog post: http://joschu.net/blog/kl-approx.html
- “The general way to lower variance is with a control variate. I.e., take k1 and add something that has expectation zero but is negatively correlated with k1”
- “The idea of measuring distance by looking at the difference between a convex function and its tangent plane appears in many places. It’s called a Bregman divergence and has many beautiful properties.”