The whole implemented code is in my fork's TensorFlow repo with branch r1.8:
If you want to see what's implemented, please check it out:
Basically, it can be used like the way of TFRecordDataset, TextLineDataset. The following is the example to use TFRecordDataset:
By the way, I also provide some samples for those who want to benchmark TFRecordDataset, LMDBDataset or others' performance. Please also check the following:
convert_to_records_lmdb.py: This python file is to convert MNIST data format into lmdb,
which yields datapoints.
fully_connected_reader_lmdb.py: This python file is to train a fully connected neural net with MNIST data in lmdb,
which contains a new argument perf to only measure the performance of input data pipeline.
Example 1: to train on MNIST dataset, you may give the following command:
$ python fully_connected_reader_lmdb.py --train_dir ./lmdb_data --num_epochs 10 --batch_size 128 --perf trainingExample 2: to check the performance of data pipeline on MNIST dataset, you may give the following command:
$ python fully_connected_reader_lmdb.py --train_dir ./lmdb_data --num_epochs 10 --batch_size 128 --perf datapipeline
The performance result shows that TFRecordDataset APIs is still faster than others in speed performance test.