The whole implemented code is in my fork's TensorFlow repo with branch r1.8:
https://github.com/teyenliu/tensorflow/tree/r1.8
If you want to see what's implemented, please check it out:
https://github.com/teyenliu/tensorflow/commit/3941debe3001d52fe9a6d4048bd679a5a1f0f075
Basically, it can be used like the way of TFRecordDataset, TextLineDataset. The following is the example to use TFRecordDataset:
By the way, I also provide some samples for those who want to benchmark TFRecordDataset, LMDBDataset or others' performance. Please also check the following:
https://github.com/teyenliu/tensorflow/tree/r1.8/tensorflow/examples/how_tos/reading_data
convert_to_records_lmdb.py: This python file is to convert MNIST data format into lmdb,
which yields datapoints.
fully_connected_reader_lmdb.py: This python file is to train a fully connected neural net with MNIST data in lmdb,
which contains a new argument perf to only measure the performance of input data pipeline.
Example 1: to train on MNIST dataset, you may give the following command:
$ python fully_connected_reader_lmdb.py --train_dir ./lmdb_data --num_epochs 10 --batch_size 128 --perf trainingExample 2: to check the performance of data pipeline on MNIST dataset, you may give the following command:
$ python fully_connected_reader_lmdb.py --train_dir ./lmdb_data --num_epochs 10 --batch_size 128 --perf datapipeline
The performance result shows that TFRecordDataset APIs is still faster than others in speed performance test.
No comments:
Post a Comment