2021-10-22 07:18 AM
I would like to compare a lot of CNN models in terms of RAM usage.
If I analyze a single model the important result I am interested in is the ram (total) usage. As that practically defines, if the model fits into the RAM.
weights (ro) : 1,544 B (1.51 KiB)
activations (rw) : 134,964 B (131.80 KiB)
ram (total) : 397,108 B (387.80 KiB) = 134,964 + 131,072 + 131,072
However, I don't quite understand how the total RAM is calculated.
The input buffer is 1x256x64x2 32-bit floating-point values large => 131072B
The same goes for the output buffer => 131072B.
The weights are also pretty clear to me. (1,544)
What I don't understand is, how the maximum ram for the activations is calculated.
In that example 134,964
For example, the second convolution gets a [1,256,64,8] (8-bit int) buffer as input and outputs a [1,256,64,8] (8-bit int) buffer.
Shouldn't there be two [1,256,64,8] buffers (one for input and one for output)? Or is there only a single buffer?
My question is how to exactly understand the memory graph. I guess what I'm missing is, what part the scratch buffer plays and why the scratch buffer does not consume any memory
In different publications, scratch buffers are mentioned to hold temporary values that need to be stored during a calculation.