If you are an LLVM newbie and are interested in LLVM like me, you may take a look at my LLVM studying list. It takes time for me to search the related resources and documents. So, I think it will help somehow. By the way, most of my list items are written in Chinese so that those who are native Engish speakers may not suit for this.
Wednesday, October 24, 2018
Tuesday, October 23, 2018
[TensorFlow] Does it help the processing time and transmission time if increasing CUDA Steam number in TensorFlow?
Before starting to increase the CUDA Steam number in TensorFlow, I want to recap some ideas about the Executor module. When TensorFlow session runs, it will build Executor. Meanwhile, if you enable CUDA in TensorFlow build configuration, the Executor will add visible GPU devices and create TF device object (GPUDevice object) mapping to physical GPU device. There are 4 kinds of streams inside GPUDevice:
- CUDA stream
- Host_to_Device stream
- Device_to_Host stream
- Device_to_Device stream
Thursday, October 18, 2018
[TensorFlow Grappler] How to do the topological sorting in TensorFlow Grappler?
If you try to implement some optimizers in TensorFlow Grappler, you must have to know how to deal with the directed computation graph. One of the most important tools/knowledges is topological sorting.
The definition from Wiki: Topological sorting
https://en.wikipedia.org/wiki/Topological_sorting
"In the field of computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering."
The definition from Wiki: Topological sorting
https://en.wikipedia.org/wiki/Topological_sorting
"In the field of computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering."
[Tool] To draw a sequence diagram using online tool sequencediagram
This website provides an online free tool for users to draw the sequence diagram as follows:
https://sequencediagram.org/
Basically, you can follow the instructions at the left top corner button. Check it out.
Here is my example of the sequence diagram about tracing some source codes of XLA AOT in TensorFlow.
https://sequencediagram.org/
Basically, you can follow the instructions at the left top corner button. Check it out.
Here is my example of the sequence diagram about tracing some source codes of XLA AOT in TensorFlow.
Wednesday, October 17, 2018
[TensorFlow Grappler] The ways to traverse all nodes' input and output in the graph using C++ in TensorFlow Grappler
Here I want to introduce 2 ways to traverse all nodes' input and output in the graph using C++ in Grappler.
P.S: you have to be able to get GrapplerItem and GraphDef objects in your code.
First, check my example node name in Tensorboard as follows:
conv1/Conv2D
P.S: you have to be able to get GrapplerItem and GraphDef objects in your code.
First, check my example node name in Tensorboard as follows:
conv1/Conv2D
Tuesday, October 2, 2018
[NUMACTL] How to use numactl in practice?
I recently attended the Intel AI workshop and they gave an advice of using NUMACTL to improve the performance of training and inferencing in Deep Learning with Intel Caffe. Here I post some related information as follows:
Tuesday, September 18, 2018
[XLA 研究] How to use XLA AOT compilation in TensorFlow ( Part II )
My previous post: [XLA 研究] How to use XLA AOT compilation in TensorFlow is about a simple example to use XLA AOT. But, if you want to see a more complicated example, please take a look at this: https://gist.github.com/carlthome/6ae8a570e21069c60708017e3f96c9fd
Monday, September 17, 2018
[TFLMS] Large Model Support in TensorFlow by Graph Rewriting
This post just introduces this paper "Large Model Support in TensorFlow by Graph Rewriting" and it is published as a pull request in the TensorFlow repository for contributing to the TensorFlow community. With TFLMS, we were able to train ResNet-50 and 3DUnet with 4.7x and 2x larger batch size, respectively. Quite amazing...
Subscribe to:
Posts (Atom)