MAUTISTE | So we need esatto compute the gradient of CE Loss respect each CNN class conteggio in \(s\)
20917
post-template-default,single,single-post,postid-20917,single-format-standard,ajax_fade,page_not_loaded,,qode_grid_1300,footer_responsive_adv,hide_top_bar_on_mobile_header,qode-child-theme-ver-1.0.0,qode-theme-ver-16.7,qode-theme-bridge,wpb-js-composer js-comp-ver-5.5.2,vc_responsive
 

So we need esatto compute the gradient of CE Loss respect each CNN class conteggio in \(s\)

So we need esatto compute the gradient of CE Loss respect each CNN class conteggio in \(s\)

So we need esatto compute the gradient of CE Loss respect each CNN class conteggio in \(s\)

Defined the loss, now we’ll have esatto compute its gradient respect esatto the output neurons of the CNN sopra order puro backpropagate it through the net and optimize the defined loss function tuning the net parameters. The loss terms coming from the negative classes are zero. However, the loss gradient respect those negative classes is not cancelled, since the Softmax of the positive class also depends on the negative classes scores.

The gradient expression will be the same for all \(C\) except for the ground truth class \(C_p\), because the punteggio of \(C_p\) (\(s_p\)) is mediante the nominator.

  • Caffe: SoftmaxWithLoss Layer. Is limited preciso multi-class classification.
  • Pytorch: CrossEntropyLoss. Is limited sicuro multi-class classification.
  • TensorFlow: softmax_cross_entropy. Is limited sicuro multi-class classification.

Durante this Facebook rete di emittenti they claim that, despite being counter-intuitive, Categorical Cross-Entropy loss, or Softmax loss worked better than Binary Ciclocampestre-Entropy loss mediante their multi-label classification problem.

> Skip this part if you are not interested con Facebook or me using Softmax Loss for multi-label classification, which is not norma.

When Softmax loss is used is verso multi-label scenario, the gradients get per bit more complex, since the loss contains an element for each positive class. Consider \(M\) are the positive classes of per sample. The CE Loss with Softmax activations would be:

Where each \(s_p\) durante \(M\) is the CNN conteggio for each positive class. As sopra Facebook paper, I introduce a scaling factor \(1/M\) onesto make the loss invariant onesto the number of positive classes, which ple.

As Caffe Softmax with Loss layer nor Multinomial Logistic Loss Layer accept multi-label targets, I implemented my own PyCaffe Softmax loss layer, following the specifications of the Facebook paper. Caffe python layers let’s us easily customize the operations done durante the forward and backward passes of the layer:

Forward pass: Loss computation

We first compute Softmax activations for each class and abri them durante probs. Then we compute the loss for each image mediante the batch considering there might be more than one positive label. We use an scale_factor (\(M\)) and we also multiply losses by the labels, which can be binary or real numbers, so they can be used for instance to introduce class balancing. The batch loss will be the mean loss of the elements mediante the batch. We then save the tempo_loss puro schermo it and the probs to use them durante the backward pass.

Backward pass: Gradients computation

Con the backward pass we need puro compute the gradients of each element of the batch respect esatto each one of the classes scores \(s\). As the gradient for all the classes \(C\) except positive classes \(M\) is equal sicuro probs, we assign probs values puro sbocco. For the positive classes mediante \(M\) we subtract 1 puro the corresponding probs value and use scale_factor sicuro gara the gradient expression. We compute the mean gradients codici promozionali matchbox of all the batch onesto run the backpropagation.

Binary Ciclocampestre-Entropy Loss

Also called Sigmoid Ciclocross-Entropy loss. It is a Sigmoid activation plus per Ciclocross-Entropy loss. Unlike Softmax loss it is independent for each vector component (class), meaning that the loss computed for every CNN output vector component is not affected by other component values. That’s why it is used for multi-label classification, were the insight of an element belonging puro per insecable class should not influence the decision for another class. It’s called Binary Cross-Entropy Loss because it sets up per binary classification problem between \(C’ = 2\) classes for every class sopra \(C\), as explained above. So when using this Loss, the formulation of Ciclocross Entroypy Loss for binary problems is often used:

No Comments

Sorry, the comment form is closed at this time.