Skip to content

Commit 04f9a77

Browse files
committed
[docs] clarify handling of bias and scaling by BiasLayer, ScaleLayer
A bias/scaling can be applied wherever desired by defining the respective layers, and `ScaleLayer` can handle both as a memory optimization.
1 parent 048530a commit 04f9a77

3 files changed

Lines changed: 15 additions & 15 deletions

File tree

include/caffe/layers/batch_norm_layer.hpp

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,9 @@ namespace caffe {
2727
* param {lr_mult: 0} three times in the layer definition.
2828
*
2929
* Note that the original paper also included a per-channel learned bias and
30-
* scaling factor. It is possible (though a bit cumbersome) to implement
31-
* this in caffe using a single-channel DummyDataLayer filled with zeros,
32-
* followed by a Convolution layer with output the same size as the current.
33-
* This produces a channel-specific value that can be added or multiplied by
34-
* the BatchNorm layer's output.
30+
* scaling factor. To implement this in Caffe, define a `ScaleLayer` configured
31+
* with `bias_term: true` after each `BatchNormLayer` to handle both the bias
32+
* and scaling factor.
3533
*
3634
* [1] S. Ioffe and C. Szegedy, "Batch Normalization: Accelerating Deep Network
3735
* Training by Reducing Internal Covariate Shift." arXiv preprint

include/caffe/layers/bias_layer.hpp

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,13 @@
1010
namespace caffe {
1111

1212
/**
13-
* @brief Computes a sum of two input Blobs, with the shape of the
14-
* latter Blob "broadcast" to match the shape of the former.
15-
* Equivalent to tiling the latter Blob, then computing the elementwise
16-
* sum.
13+
* @brief Computes a sum of two input Blobs, with the shape of the latter Blob
14+
* "broadcast" to match the shape of the former. Equivalent to tiling
15+
* the latter Blob, then computing the elementwise sum.
1716
*
1817
* The second input may be omitted, in which case it's learned as a parameter
19-
* of the layer.
18+
* of the layer. Note: in case bias and scaling are desired, both operations can
19+
* be handled by `ScaleLayer` configured with `bias_term: true`.
2020
*/
2121
template <typename Dtype>
2222
class BiasLayer : public Layer<Dtype> {

include/caffe/layers/scale_layer.hpp

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,15 @@
1212
namespace caffe {
1313

1414
/**
15-
* @brief Computes a product of two input Blobs, with the shape of the
16-
* latter Blob "broadcast" to match the shape of the former.
15+
* @brief Computes the elementwise product of two input Blobs, with the shape of
16+
* the latter Blob "broadcast" to match the shape of the former.
1717
* Equivalent to tiling the latter Blob, then computing the elementwise
18-
* product.
18+
* product. Note: for efficiency and convenience, this layer can
19+
* additionally perform a "broadcast" sum too when `bias_term: true`
20+
* is set.
1921
*
20-
* The second input may be omitted, in which case it's learned as a parameter
21-
* of the layer.
22+
* The latter, scale input may be omitted, in which case it's learned as
23+
* parameter of the layer (as is the bias, if it is included).
2224
*/
2325
template <typename Dtype>
2426
class ScaleLayer: public Layer<Dtype> {

0 commit comments

Comments
 (0)