views, comments.
Author:LYY
Activation Functions
Sigmoid
Transfer logit z to a probability by mapping z to (0,1).
Definition:
P=1+exp(−z)1
Derivative:
∂z∂P=−(1+exp(−z))−2⋅exp(−z)⋅(−1)=(1+exp(−z))2exp(−z)=1+exp(−z)1⋅1+exp(−z)exp(−z)=P⋅(1−P)>0{P∈(0,1)}
It means z↑,p↑ .
SoftMax
Transfer logits zi to probabilities by normalizing sequence zi to (0,1), where the summation is equal to 1.
Definition:
Pk=∑iexp(zi)exp(zk)
Derivative:
∂zk∂Pk=(∑iexp(zi))2exp(zk)∑iexp(zi)−exp(zk)2=∑iexp(zi)exp(zk)⋅∑iexp(zi)∑iexp(zi)−exp(zk)=Pk⋅(1−Pk)>0{Pi∈(0,1)}
It means zk↑,pk↑ .
∂zj∂Pk=(∑iexp(zi))2−exp(zk)⋅exp(zj)=−Pk⋅Pj<0{Pi∈(0,1)}
It means zj↑,pk↓ .
Odds
Introduction:
- Win rate: 4 to 1, odds = {number of wins}/{number of loses} = 4
Definition: Assume have probability p to win,
odds=win rate=1−pp
We usually use log(odds), because it has better symmetry property.
- Win rate: 4 to 1, odds = 4, but log(odds)=log4.
- Win rate: 1 to 4, odds = 1/4, but log(odds)=−log4.
Relationship between probability and log(odds): sigmoid σ
p=1+oddsodds=1+exp((log(odds)))exp(log(odds))=1+exp(−log(odds))1=σ(log(odds))
Example for machine learning:
1.Logistic Regression: 对数线性模型
Probabiliy(Postive Class)=P=f(x)=1+e−z1,z=wx+b
Easy to find
z=log(1−pp)
2.XgBoost for Classification
Loss=L(yi,pi)=−(yilog(pi)+(1−yi)log(1−pi))=−(yilog(1−pipi)+log(1−pi))=−(yilog(odds)−log(1+1−pipi))=−(yilog(odds)−log(1+odds))=−yilog(odds)+log(1+odds)
Derivative
gi=∂log(oddsi)∂L=−yi+1+elog(oddsi)elog(oddsi)=−yi+1+oddsioddsi=−yi+pi
Second Order Derivative
hi=∂2log(oddsi)∂2L=∂log(oddsi)∂gi=σ(log(oddsi))(1−σ(log(oddsi)))=pi(1−pi)