Monday, September 17, 2018

[TFLMS] Large Model Support in TensorFlow by Graph Rewriting

This post just introduces this paper "Large Model Support in TensorFlow by Graph Rewriting" and it is published as a pull request in the TensorFlow repository for contributing to the TensorFlow community. With TFLMS, we were able to train ResNet-50 and 3DUnet with 4.7x and 2x larger batch size, respectively. Quite amazing...




The paper: Large Model Support contrib module for training large models
https://arxiv.org/pdf/1807.02037.pdf

The concept is from this paper:
Training Deeper Models by GPU Memory Optimization on TensorFlow 
http://learningsys.org/nips17/assets/papers/paper_18.pdf

For more information, please check out the following links.

Their source code is in "lms-contrib" branch of the repo:
https://github.com/tungld/tensorflow/tree/lms-contrib

The IBM guy has implemented the functions and pull request in the following URL:
https://github.com/tensorflow/tensorflow/pull/19845

TensorFlow Large Model Support Case Study with 3D Image Segmentation
https://developer.ibm.com/linuxonpower/2018/07/27/tensorflow-large-model-support-case-study-3d-image-segmentation/

P.S: I use my MNIST_CNN model to test TFLMS and it works. The max batch size increases from 69 to 118, which improves 71%

No comments: