Tuesday, July 17, 2018

[Confusion Matrix] How to calculate confusion matrix, precision and recall list from scratch

I directly give an example which is with 10 categories, such as CIFAR-10 and MNIST. It explains how to calculate the confusion matrix, precision and recall list from scratch in Python. My data is generated at random. You should replace by yours. Here it goes:

import numpy
import json

SAMPLES = 1000
label_list = [i for i in range(CATEGORY)]

pred_list = numpy.random.randint(0, CATEGORY-1, size=SAMPLES)
y_batch_list = numpy.random.randint(0, CATEGORY-1, size=SAMPLES)
print(pred_list, y_batch_list)

class confusion_matrix:
  def __init__(self, pred_list, y_batch_list, label_list):
    if len(pred_list) != len(y_batch_list):
      raise Exception('Prediction length is different from Label list!')
    self.pred_list = pred_list
    self.y_batch_list = y_batch_list
    self.matrix_size = len(label_list)
    # this matrix are 2 dimensions(y_batch, pred)
    self.confusion_matrix = [[ x*0 for x in range(self.matrix_size)] for y in range(self.matrix_size)]
    self.precision_list = [x*0 for x in range(self.matrix_size)]
    self.recall_list = [x*0 for x in range(self.matrix_size)]

  def calculate_confusion_matrix(self):
    for i in range(len(self.pred_list)):
      # dimension => [y_batch, pred]
      self.confusion_matrix[self.y_batch_list[i]][self.pred_list[i]] += 1

  def calculate_recall_precision_list(self):
    # calculate recall
    for i in range(self.matrix_size):
      tmp_value = 0
      for j in range(self.matrix_size):
        tmp_value += self.confusion_matrix[i][j]
        if tmp_value is not 0:
          self.recall_list[i] = float(self.confusion_matrix[i][i]) / tmp_value

    # calculate precision
    for j in range(self.matrix_size):
      tmp_value = 0
      for i in range(self.matrix_size):
        tmp_value += self.confusion_matrix[i][j]
        if tmp_value is not 0:
          self.precision_list[j] = float(self.confusion_matrix[j][j]) / tmp_value

  def gen_json_data(self):
    data = {'confusion_matrix': self.confusion_matrix,
            'precision_list': self.precision_list,
            'recall_list': self.recall_list
    return data

ret = confusion_matrix(pred_list.tolist(), y_batch_list.tolist(), label_list)

{'precision_list': [0.0625, 0.14912280701754385, 0.02654867256637168, 0.1452991452991453, 0.07377049180327869, 0.10526315789473684, 0.11320754716981132, 0.13, 0.13725490196078433, 0], 
 'confusion_matrix': [[7, 14, 10, 15, 17, 19, 17, 18, 14, 0], [10, 17, 14, 9, 5, 11, 9, 12, 12, 0], [11, 11, 3, 19, 16, 13, 4, 11, 7, 0], [13, 18, 16, 17, 13, 12, 11, 11, 12, 0], [15, 12, 15, 14, 9, 13, 17, 9, 11, 0], [19, 8, 11, 11, 17, 12, 13, 10, 8, 0], [9, 9, 10, 11, 14, 11, 12, 7, 15, 0], [20, 14, 13, 10, 18, 10, 11, 13, 9, 0], [8, 11, 21, 11, 13, 13, 12, 9, 14, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 
 'recall_list': [0.05343511450381679, 0.1717171717171717, 0.031578947368421054, 0.13821138211382114, 0.0782608695652174, 0.11009174311926606, 0.12244897959183673, 0.11016949152542373, 0.125, 0]

Saturday, July 14, 2018

[Qt5] How to develop Qt5 GUI with TensorFlow C++ library?

Here I give a simple and complete example of how to develop Qt5 GUI with TensorFlow C++ library on Linux platform. Please check out my GitHub's repository as follow:

For building TensorFlow C++ APIs library, you can refer to my previous post:

I think the key point is how to prepare the CMakeLists.txt and you can refer to mine. If you use Qt Creator to open this project and make it, the GUI will look like this when running.

Monday, July 9, 2018

[TensorFlow] How to implement LMDBDataset in tf.data API?

I have finished implementing the LMDBDataset in tf.data API.  It could be not the bug-free component, but at least it's my first time to try to implement C++ and Python function in TensorFlow. The API architecture looks like this:

The whole implemented code is in my fork's TensorFlow repo with branch r1.8:

If you want to see what's implemented, please check it out:

Basically, it can be used like the way of TFRecordDataset, TextLineDataset. The following is the example to use TFRecordDataset:

By the way, I also provide some samples for those who want to benchmark TFRecordDataset, LMDBDataset or others' performance. Please also check the following:

convert_to_records_lmdb.py: This python file is to convert MNIST data format into lmdb,
which yields datapoints.

fully_connected_reader_lmdb.py: This python file is to train a fully connected neural net with MNIST data in lmdb,
which contains a new argument perf to only measure the performance of input data pipeline.

Example 1: to train on MNIST dataset, you may give the following command:

$ python fully_connected_reader_lmdb.py --train_dir ./lmdb_data --num_epochs 10 --batch_size 128 --perf training
Example 2: to check the performance of data pipeline on MNIST dataset, you may give the following command:
$ python fully_connected_reader_lmdb.py --train_dir ./lmdb_data --num_epochs 10 --batch_size 128 --perf datapipeline

The performance result shows that TFRecordDataset APIs is still faster than others in speed performance test.

Wednesday, July 4, 2018

[TensorFlow] How to build your C++ program or application with TensorFlow library using CMake

When you want to build your  C++ program or application using TensorFlow library or functions, you probably will encounter some header file missed issues or linking problems. Here is the step list that I have verified and it works well.

1. Prepare TensorFlow and its third party's library
$ git clone --recursive https://github.com/tensorflow/tensorflow
$ cd tensorflow/contrib/makefile
$ ./build_all_linux.sh

2. Build TensorFlow C++ APIs library
$ cd tensorflow
$ ./configure
<<< Please based on your requirement to configure the items in this step >>> 
$ bazel build //tensorflow:libtensorflow_cc.so

3. Setup header file and library
$ sudo mkdir /usr/local/tensorflow
$ sudo mkdir /usr/local/tensorflow/include
$ sudo cp -r tensorflow/contrib/makefile/downloads/eigen/Eigen /usr/local/tensorflow/include/
$ sudo cp -r tensorflow/contrib/makefile/downloads/eigen/unsupported /usr/local/tensorflow/include/
$ sudo cp -r tensorflow/contrib/makefile/gen/protobuf/include/google /usr/local/tensorflow/include/
$ sudo cp tensorflow/contrib/makefile/downloads/nsync/public/* /usr/local/tensorflow/include/
$ sudo cp -r bazel-genfiles/tensorflow /usr/local/tensorflow/include/
$ sudo cp -r tensorflow/cc /usr/local/tensorflow/include/tensorflow
$ sudo cp -r tensorflow/core /usr/local/tensorflow/include/tensorflow
$ sudo mkdir /usr/local/tensorflow/include/third_party
$ sudo cp -r third_party/eigen3 /usr/local/tensorflow/include/third_party/
$ sudo mkdir /usr/local/tensorflow/lib
$ sudo cp bazel-bin/tensorflow/libtensorflow_*.so /usr/local/tensorflow/lib

If you finish the steps above, you are able to build your C++ program or application from now on.
Then, I provide a simple project and CMakeLists.txt for your reference as follows:

If you git clone it, you will get this in the folder:
├── CMakeLists.txt
├── data_set.cc
├── data_set.h
├── model.cc
├── normalized_car_features.csv
└── README.md

My CMakeLists.txt is here:
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
# Instruct CMake to run moc automatically when needed
# Create code from a list of Qt designer ui files

project(DNN_Tensorflow_CPP LANGUAGES CXX)

add_executable(${PROJECT_NAME} model.cc data_set.cc data_set.h)

#target_link_libraries(main PRIVATE tensorflow)
configure_file(normalized_car_features.csv ${CMAKE_CURRENT_BINARY_DIR}/normalized_car_features.csv COPYONLY)

    target_compile_definitions(main PRIVATE COMPILER_MSVC)


set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -g -fPIC  ")


TARGET_LINK_LIBRARIES(${PROJECT_NAME}  "/usr/local/tensorflow/lib/libtensorflow_cc.so")
TARGET_LINK_LIBRARIES(${PROJECT_NAME}  "/usr/local/tensorflow/lib/libtensorflow_framework.so")

Finally, I can build and run it successfully.
$ mkdir build
$ cd build
$ cmake ..
$ make
$ ./DNN_Tensorflow_CPP

P.S: For the more in details about the C++ example, please check out this blog:

Tuesday, June 26, 2018

[XLA JIT] How to turn on XLA JIT compilation at multiple GPUs training

Before I discuss this question, let's recall how to turn on  XLA JIT compilation and use it in TensorFlow python API.

1. Session
Turning on JIT compilation at the session level will result in all possible operators being greedily compiled into XLA computations. Each XLA computation will be compiled into one or more kernels for the underlying device.

2. Manual
JIT compilation can also be turned on manually for one or more operators. This is done by tagging the operators to compile with the attribute _XlaCompile=true. The simplest way to do this is via the tf.contrib.compiler.jit.experimental_jit_scope() scope defined in tensorflow/contrib/compiler/jit.py.

3. Placing operators on XLA devices ( we won't consider this option due to too tedious work)
Another way to run computations via XLA is to place an operator on a specific XLA device. This method is normally only used for testing.

Basically, I have tried first two options in this script: cifar10_multi_gpu_train.py because it already contains codes of multiple GPUs with synchronous updates.

For the first option(Session), it doesn't work with multiple GPUs training. This option will force TensorFlow to compile all possible options into XLA computation and we don't know that the code design in cifar10_multi_gpu_train.py for multiple GPUs with synchronous updates still exists.

But, for the second option(Manual), it can work with multiple GPUs training using JIT scope.

Here is my experiment using second option (Manual with using jit scope)
Source: https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py
Batch Size: 6000
Total Iterations: 2000

How to turn on XLA JIT compilation in cifar10_multi_gpu_train.py?
#Add jit_scope definition in the early beginning of your code
jit_scope = tf.contrib.compiler.jit.experimental_jit_scope

#Add jit_scope() scope
    with tf.variable_scope(tf.get_variable_scope()):
      for i in xrange(FLAGS.num_gpus):
        with tf.device('/gpu:%d' % i):
          with jit_scope():  # <-- Add this line
            with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
              # Dequeues one batch for the GPU
              image_batch, label_batch = batch_queue.dequeue()

Case1: Turning on XLA JIT ( using jit scope )
Training time: 910 seconds
Avg. images/sec: 36647
Memory Usage and GPU-Util%:

Case2: Turning off XLA JIT ( no XLA )
Training time: 1120 seconds
Avg. images/sec: 25203
Memory Usage and GPU-Util%:

In Summary:
Turning on XLA JIT at multiple GPUs training in this experiment, the training performance is improved by more than 18%.

Sunday, June 24, 2018

[PCIe] How to read/write PCIe Switch Configuration Space?

Here is a question, how to read/write PCIe Switch Configuration Space? We can see this picture first.

The memory map shows the entire physical address space of the root complex.  Only the green block at the bottom is system DRAM. Those yellow areas above are memory mapped peripherals, including PCIe Switch. So CPU can read PCIe Switch configuration space via MMIO in Host Memory. So Basic Address Registers (BAR) are very important and My laptop doesn't have PCIe switch device so that I just pick up a SATA device and it is a very simple example to read 256 bytes of configuration space as follows:

P.S: If you have PCIe switch device ID, just replace it to the code.

reference: http://telesky.pixnet.net/blog/post/7022197-a-simple-linux-driver-example-on-fpga%3A-adder

#include <linux/init.h>
#include <linux/module.h>
#include <linux/pci.h>


#define    OUR_SATA_VENDOR_ID    0x14e4
#define    OUR_SATA_PROD_ID    0x165f

void print_addr_func(u8 *src, int size) {
    int i;
    if (size < 0) {
        printk(KERN_ALERT "The size should be greater than 0!\n");
    for(i = 0; i < size; i++) {
        if (! (i & 15))
            printk(KERN_ALERT " %02x:", i);
        printk(KERN_ALERT " %02x", src[i]);
        if ((i & 15) == 15)
            printk(KERN_ALERT "\n");

static int aboutpci_init(void)
    u8 config_arr[256];
    //int iobase;
    //int iobase_end;
    int i;
    //u8 data_byte = 0;
    //u32 pio_start, pio_end, pio_flags, pio_len = 0;
    unsigned long mmio_start, mmio_end, mmio_flags, mmio_len, ioaddr;
    //u16 data_one_word;
    unsigned int *base_addr, *base_addr_0;

    struct pci_dev *pdev = NULL;

    //Finding the device by Vendor/Device ID Pair
    pdev = pci_get_device(OUR_SATA_VENDOR_ID, OUR_SATA_PROD_ID, pdev);
    if (pdev != NULL) {
        printk(KERN_ALERT "Our SATA HBA found!\n");
        if ( pdev->dma_mask == DMA_BIT_MASK(64) )
            printk(KERN_ALERT "64-bit addressing capable!\n");
        else if ( pdev->dma_mask == DMA_BIT_MASK(32) )
            printk(KERN_ALERT "32-bit addressing capable!\n");
        /* Bus-specific parameters. For a PCI NIC, it looks as follows */
        printk(KERN_ALERT "Use pci_read_config_byte() to print bytes in configuration space\n");
        for(i = 0; i < 256; i++) {
            pci_read_config_byte(pdev, i, &config_arr[i]);
            //printk(KERN_ALERT " %02X ", config_arr[i]);
        print_addr_func(config_arr, 256);

        printk(KERN_ALERT "Use pci_resource_XXX() to access BAR 0\n");
        mmio_start = pci_resource_start (pdev, 0);
        mmio_end = pci_resource_end (pdev, 0);
        mmio_flags = pci_resource_flags (pdev, 0);
        mmio_len = pci_resource_len (pdev, 0);
        printk(KERN_ALERT "MMIO region size of BAR 1 is :%lu\n", mmio_len);
 printk(KERN_ALERT "MMIO region base addr is %x\n", mmio_start);

        /* make sure PCI base addr 1 is MMIO */
 if (!(mmio_flags & IORESOURCE_MEM)) {
     printk(KERN_ALERT, "region #1 not an MMIO resource, aborting\n");
        // Get BAR0's address
        /* ioremap MMIO region */
 ioaddr = ioremap(mmio_start, mmio_len);
 if (ioaddr == NULL) {
     printk(KERN_ALERT "MMIO region is rrror!! \n");
 printk(KERN_ALERT "MMIO Remap addr is %x\n", ioaddr);
        // print out the MMIO region content from remap addr (virtual address)
        print_addr_func(ioaddr, 16 /* part of mmio_len */);
        printk(KERN_ALERT "Our SATA HBA Not found!\n");

    //Finding the device by its class code
    pdev = NULL;
    pdev = pci_get_class(PCI_CLASS_STORAGE_SATA_AHCI, pdev);
    if (pdev != NULL) {
        printk(KERN_ALERT "SATA HBA Class device found!\n");
        printk(KERN_ALERT "Device Vendor ID: 0x%X\n", pdev->vendor);
        printk(KERN_ALERT "Device Product ID: 0x%X\n", pdev->device);
       /* Bus-specific parameters. For a PCI NIC, it looks as follows */
       //iobase = pci_resource_start(dev, 1);
       //iobase_end = iobase + pci_resource_len(dev, 1);
       //printk(KERN_ALERT "Device class bar0 from: 0x%X to 0x%X\n", iobase, iobase_end);
        printk(KERN_ALERT "SATA HBA Class device Not found!\n");

    return 0;

static void aboutpci_exit(void)
    printk(KERN_ALERT "Goodbye, pci hackers\n");


Use this Makefile to build your module:
    obj-m := aboutpci.o
    KERNELDIR := /lib/modules/$(shell uname -r)/build
    PWD := $(shell pwd)
        $(MAKE) -C $(KERNELDIR) M=$(PWD) modules

Once you have done, you will get the following result in files:
$ ls -al
-rw-rw-r-- 1 liudanny liudanny  3798 Aug 16  2017 aboutpci.c
-rw-rw-r-- 1 liudanny liudanny  6464 Jun 25 11:10 aboutpci.ko
-rw-rw-r-- 1 liudanny liudanny   363 Jun 25 11:10 .aboutpci.ko.cmd
-rw-rw-r-- 1 liudanny liudanny   542 Jun 25 11:10 aboutpci.mod.c
-rw-rw-r-- 1 liudanny liudanny  2536 Jun 25 11:10 aboutpci.mod.o
-rw-rw-r-- 1 liudanny liudanny 28760 Jun 25 11:10 .aboutpci.mod.o.cmd
-rw-rw-r-- 1 liudanny liudanny  5784 Jun 25 11:10 aboutpci.o
-rw-rw-r-- 1 liudanny liudanny 42695 Jun 25 11:10 .aboutpci.o.cmd
-rw-rw-r-- 1 liudanny liudanny   191 Jul 28  2017 Makefile
-rw-rw-r-- 1 liudanny liudanny    77 Jun 25 11:10 modules.order
-rw-rw-r-- 1 liudanny liudanny     0 Jul 28  2017 Module.symvers

Now, you can insert the module and see the kernel message
$ sudo insmod aboutpci.ko
$ dmesg

Thursday, June 21, 2018

[TensorFlow] How to get CPU configuration flags (such as SSE4.1, SSE4.2, and AVX...) in a bash script for building TensorFlow from source

Did you wonder what CPU configuration flags (such as SSE4.1, SSE4.2, and AVX...) you should use on your machine when building Tensorflow from source? If so, here is a quick solution for you.

1. Create a bash shell script file ( get_tf_build_cpu_opt.sh ) as below:
#!/usr/bin/env bash

# Detect platform
if [ "$(uname)" == "Darwin" ]; then
        # MacOS
        raw_cpu_flags=`sysctl -a | grep machdep.cpu.features | cut -d ":" -f 2 | tr '[:upper:]' '[:lower:]'`
elif [ "$(uname)" == "Linux" ]; then
        # GNU/Linux
        raw_cpu_flags=`grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr '[:upper:]' '[:lower:]'`
        echo "Unknown plaform: $(uname)"
        exit -1


for cpu_feature in $raw_cpu_flags
        case "$cpu_feature" in
                "sse4.1" | "sse4.2" | "ssse3" | "fma" | "cx16" | "popcnt" | "maes")
                    COPT+=" --copt=-m$cpu_feature"
                    COPT+=" --copt=-mavx"
                        # noop
echo $COPT

2. Execute it:
$ ./get_tf_build_cpu_opt.sh
==>  In my machine, I got these:
--copt=-march=native --copt=-mssse3 --copt=-mfma --copt=-mcx16 --copt=-mpopcnt

3. Now you can put these string in your bazel build command to build TensorFlow from source such as:
$ bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package