File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -27,11 +27,9 @@ namespace caffe {
2727 * param {lr_mult: 0} three times in the layer definition.
2828 *
2929 * Note that the original paper also included a per-channel learned bias and
30- * scaling factor. It is possible (though a bit cumbersome) to implement
31- * this in caffe using a single-channel DummyDataLayer filled with zeros,
32- * followed by a Convolution layer with output the same size as the current.
33- * This produces a channel-specific value that can be added or multiplied by
34- * the BatchNorm layer's output.
30+ * scaling factor. To implement this in Caffe, define a `ScaleLayer` configured
31+ * with `bias_term: true` after each `BatchNormLayer` to handle both the bias
32+ * and scaling factor.
3533 *
3634 * [1] S. Ioffe and C. Szegedy, "Batch Normalization: Accelerating Deep Network
3735 * Training by Reducing Internal Covariate Shift." arXiv preprint
Original file line number Diff line number Diff line change 1010namespace caffe {
1111
1212/* *
13- * @brief Computes a sum of two input Blobs, with the shape of the
14- * latter Blob "broadcast" to match the shape of the former.
15- * Equivalent to tiling the latter Blob, then computing the elementwise
16- * sum.
13+ * @brief Computes a sum of two input Blobs, with the shape of the latter Blob
14+ * "broadcast" to match the shape of the former. Equivalent to tiling
15+ * the latter Blob, then computing the elementwise sum.
1716 *
1817 * The second input may be omitted, in which case it's learned as a parameter
19- * of the layer.
18+ * of the layer. Note: in case bias and scaling are desired, both operations can
19+ * be handled by `ScaleLayer` configured with `bias_term: true`.
2020 */
2121template <typename Dtype>
2222class BiasLayer : public Layer <Dtype> {
Original file line number Diff line number Diff line change 1212namespace caffe {
1313
1414/* *
15- * @brief Computes a product of two input Blobs, with the shape of the
16- * latter Blob "broadcast" to match the shape of the former.
15+ * @brief Computes the elementwise product of two input Blobs, with the shape of
16+ * the latter Blob "broadcast" to match the shape of the former.
1717 * Equivalent to tiling the latter Blob, then computing the elementwise
18- * product.
18+ * product. Note: for efficiency and convenience, this layer can
19+ * additionally perform a "broadcast" sum too when `bias_term: true`
20+ * is set.
1921 *
20- * The second input may be omitted, in which case it's learned as a parameter
21- * of the layer.
22+ * The latter, scale input may be omitted, in which case it's learned as
23+ * parameter of the layer (as is the bias, if it is included) .
2224 */
2325template <typename Dtype>
2426class ScaleLayer : public Layer <Dtype> {
You can’t perform that action at this time.
0 commit comments