As we know that in Tensorflow, there is an optimization module called "Grappler". It provides many kinds of optimization functionalities, such as: Layout, Memory, ModelPruner, and so on... In this experiment, we can see the effect of some memory options enabled in a simple CNN model using MNIST dataset.
Here is the simple CNN model:
There are several memory options for you to use as follows:
You can use from Item 1 to item 6 and put it in the following place with red color characters.
In my case, it seems that HEURISTICS is the best choice to optimize the memory usage when the batch size becomes extremely larger.
Update:
The situation is a little bit different in TensorFlow 1.9. Maybe I need to dig into the source code more.
The max batch size changes to 11052, and the winner is not "HEURISTICS" anymore.
Here you go:
Max Batch Size: 11052
NO_MEM_OPT: out of memory
SWAPPING_HEURISTICS: 6755.58 MB
RECOMPUTATION_HEURISTICS: 6723.38 MB
SCHEDULING_HEURISTICS: 6755.58 MB
HEURISTICS: out of memory
Here is the simple CNN model:
height = 28
width = 28
channels = 1
n_inputs = height * width
conv1_fmaps = 32
conv1_ksize = 3
conv1_stride = 1
conv1_pad = "SAME"
conv2_fmaps = 64
conv2_ksize = 3
conv2_stride = 1
conv2_pad = "SAME"
conv2_dropout_rate = 0.25
pool3_fmaps = conv2_fmaps
n_fc1 = 128
fc1_dropout_rate = 0.5
n_outputs = 10
with tf.device('/cpu:0'):
with tf.name_scope("inputs"):
X = tf.placeholder(tf.float32, shape=[None, n_inputs], name="X")
X_reshaped = tf.reshape(X, shape=[-1, height, width, channels])
y = tf.placeholder(tf.int32, shape=[None], name="y")
training = tf.placeholder_with_default(False, shape=[], name='training')
with tf.device('/gpu:0'):
conv1 = tf.layers.conv2d(X_reshaped, filters=conv1_fmaps, kernel_size=conv1_ksize,
strides=conv1_stride, padding=conv1_pad,
activation=tf.nn.relu, name="conv1")
conv2 = tf.layers.conv2d(conv1, filters=conv2_fmaps, kernel_size=conv2_ksize,
strides=conv2_stride, padding=conv2_pad,
activation=tf.nn.relu, name="conv2")
with tf.name_scope("pool3"):
pool3 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID")
pool3_flat = tf.reshape(pool3, shape=[-1, pool3_fmaps * 14 * 14])
pool3_flat_drop = tf.layers.dropout(pool3_flat, conv2_dropout_rate, training=training)
with tf.name_scope("fc1"):
fc1 = tf.layers.dense(pool3_flat_drop, n_fc1, activation=tf.nn.relu, name="fc1")
fc1_drop = tf.layers.dropout(fc1, fc1_dropout_rate, training=training)
with tf.name_scope("output"):
logits = tf.layers.dense(fc1, n_outputs, name="output")
Y_proba = tf.nn.softmax(logits, name="Y_proba")
with tf.name_scope("train"):
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y)
loss = tf.reduce_mean(xentropy)
optimizer = tf.train.AdamOptimizer()
training_op = optimizer.minimize(loss)
with tf.device('/cpu:0'):
with tf.name_scope("eval"):
correct = tf.nn.in_top_k(logits, y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
with tf.name_scope("init_and_save"):
init = tf.global_variables_initializer()
saver = tf.train.Saver()
There are several memory options for you to use as follows:
- NO_MEM_OPT
- DEFAULT_MEM_OPT
- SWAPPING_HEURISTICS
- RECOMPUTATION_HEURISTICS
- SCHEDULING_HEURISTICS
- HEURISTICS
- Third Party: Gradient-Checkpointing
P.S: Gradient-Checkpointing is not related with Grappler's memory optimizer. it is just another approach.
You can use from Item 1 to item 6 and put it in the following place with red color characters.
from tensorflow.core.protobuf import rewriter_config_pb2
rewrite_options = rewriter_config_pb2.RewriterConfig(disable_model_pruning=True)
rewrite_options.memory_optimization = rewriter_config_pb2.RewriterConfig.<Put memory option here>
graph_options = tf.GraphOptions(rewrite_options=rewrite_options) #, infer_shapes=True)
config = tf.ConfigProto(graph_options=graph_options)
config.gpu_options.allow_growth=True
config.allow_soft_placement = True
For Gradient-Checkpoint approach, you should add the following code in the top of your program:
from tensorflow.contrib.memory_stats.python.ops import memory_stats_ops
#monkey patch tf.gradients to point to our custom version, with automatic checkpoint selection
def grads(ys, xs, grad_ys=None, **kwargs):
return memory_saving_gradients.gradients(ys, xs, grad_ys,
checkpoints='memory', **kwargs)
old_grads = tf.gradients
tf.__dict__["gradients"] = grads
I pick up several batch sizes and compare with the GPU memory usage. The result of memory options enabled is in the table below:
TensorFlow version = 1.8 | |||||
Memory Option for Optimizer | Batch Size: 9000 | Batch Size: 11000 | Batch Size: 11100 | Batch Size: 11105 | |
NO_MEM_OPT | OK | OOM | OOM | OOM | |
DEFAULT_MEM_OPT | OK | OK | OOM | OOM | |
SWAPPING_HEURISTICS | OK | OK | OOM | OOM | |
RECOMPUTATION_HEURISTICS | OK | OK | OOM | OOM | |
SCHEDULING_HEURISTICS | OK | OK | OOM | OOM | |
* | HEURISTICS | OK | OK | OK | OK |
Third Party: Check-pointing | OK | OOM | OOM | OOM |
Update:
The situation is a little bit different in TensorFlow 1.9. Maybe I need to dig into the source code more.
The max batch size changes to 11052, and the winner is not "HEURISTICS" anymore.
Here you go:
Max Batch Size: 11052
NO_MEM_OPT: out of memory
SWAPPING_HEURISTICS: 6755.58 MB
RECOMPUTATION_HEURISTICS: 6723.38 MB
SCHEDULING_HEURISTICS: 6755.58 MB
HEURISTICS: out of memory
No comments:
Post a Comment