Tuesday, October 2, 2018

[NUMACTL] How to use numactl in practice?

I recently attended the Intel AI workshop and they gave an advice of using NUMACTL to improve the performance of training and inferencing in Deep Learning with Intel Caffe. Here I post some related information as follows:



First, use this command to check your PC's CPU core information and node distances. The node distances will affect some of the performance if accessing remote socket memory too often. Here is an example command:
$ numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
node 0 size: 257855 MB
node 0 free: 36833 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
node 1 size: 258041 MB
node 1 free: 228846 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

Here is the example, if your PC has 2 CPUs and each CPU has 40 cores like this:

Before using NUMACTL, you should manually config these env variables probably for your application. That's saying you only want to use 16 cores for your application. Setup these env variables as follows:
//Choose appropriate core number for OMP threads.
$ export OMP_NUM_THREADS=16
//Set core affinity to bind threads to cores.
$ export KMP_AFFINITY=granularity=fine,compact,1,0;

Launch your application/program on socket1 with core:0 to core:15 in socket1.


// Use NUMA to use specific cores and avoid remote-socket memory.
$ numactl -C 0-15 -m 0 python ....(your python program)...


No comments: