Showing posts with label NCCL. Show all posts
Showing posts with label NCCL. Show all posts

Monday, May 15, 2017

[NCCL] Build and run the test of NCCL


NCCL requires at least CUDA 7.0 and Kepler or newer GPUs. Best performance is achieved when all GPUs are located on a common PCIe root complex, but multi-socket configurations are also supported.

Note: NCCL may also work with CUDA 6.5, but this is an untested configuration.

Build & run

To build the library and tests.

$ cd nccl
$ make CUDA_HOME=<cuda install path> test
Test binaries are located in the subdirectories nccl/build/test/{single,mpi}.

$ ~/git/nccl$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:./build/lib
$ ~/git/nccl$ ./build/test/single/all_reduce_test 100000000
# Using devices
#   Rank  0 uses device  0 [0x04] GeForce GTX 1080 Ti
#   Rank  1 uses device  1 [0x05] GeForce GTX 1080 Ti
#   Rank  2 uses device  2 [0x08] GeForce GTX 1080 Ti
#   Rank  3 uses device  3 [0x09] GeForce GTX 1080 Ti
#   Rank  4 uses device  4 [0x83] GeForce GTX 1080 Ti
#   Rank  5 uses device  5 [0x84] GeForce GTX 1080 Ti
#   Rank  6 uses device  6 [0x87] GeForce GTX 1080 Ti
#   Rank  7 uses device  7 [0x88] GeForce GTX 1080 Ti

#                                                 out-of-place                    in-place
#      bytes             N    type      op     time  algbw  busbw      res     time  algbw  busbw      res
   100000000     100000000    char     sum   30.244   3.31   5.79    0e+00   29.892   3.35   5.85    0e+00
   100000000     100000000    char    prod   30.493   3.28   5.74    0e+00   30.524   3.28   5.73    0e+00
   100000000     100000000    char     max   29.745   3.36   5.88    0e+00   29.877   3.35   5.86    0e+00
   100000000     100000000    char     min   29.744   3.36   5.88    0e+00   29.868   3.35   5.86    0e+00
   100000000      25000000     int     sum   29.692   3.37   5.89    0e+00   29.754   3.36   5.88    0e+00
   100000000      25000000     int    prod   30.733   3.25   5.69    0e+00   30.697   3.26   5.70    0e+00
   100000000      25000000     int     max   29.871   3.35   5.86    0e+00   29.700   3.37   5.89    0e+00
   100000000      25000000     int     min   29.809   3.35   5.87    0e+00   29.852   3.35   5.86    0e+00
   100000000      50000000    half     sum   28.590   3.50   6.12    1e-02   27.545   3.63   6.35    1e-02
   100000000      50000000    half    prod   27.416   3.65   6.38    1e-03   27.375   3.65   6.39    1e-03
   100000000      50000000    half     max   30.811   3.25   5.68    0e+00   30.670   3.26   5.71    0e+00
   100000000      50000000    half     min   30.818   3.24   5.68    0e+00   30.931   3.23   5.66    0e+00
   100000000      25000000   float     sum   29.719   3.36   5.89    1e-06   29.750   3.36   5.88    1e-06
   100000000      25000000   float    prod   29.741   3.36   5.88    1e-07   30.029   3.33   5.83    1e-07
   100000000      25000000   float     max   28.400   3.52   6.16    0e+00   28.400   3.52   6.16    0e+00
   100000000      25000000   float     min   28.364   3.53   6.17    0e+00   28.434   3.52   6.15    0e+00
   100000000      12500000  double     sum   33.989   2.94   5.15    0e+00   34.104   2.93   5.13    0e+00
   100000000      12500000  double    prod   33.895   2.95   5.16    2e-16   33.833   2.96   5.17    2e-16
   100000000      12500000  double     max   30.228   3.31   5.79    0e+00   30.273   3.30   5.78    0e+00
   100000000      12500000  double     min   30.324   3.30   5.77    0e+00   30.341   3.30   5.77    0e+00
   100000000      12500000   int64     sum   29.914   3.34   5.85    0e+00   30.036   3.33   5.83    0e+00
   100000000      12500000   int64    prod   30.975   3.23   5.65    0e+00   31.083   3.22   5.63    0e+00
   100000000      12500000   int64     max   29.954   3.34   5.84    0e+00   29.949   3.34   5.84    0e+00
   100000000      12500000   int64     min   29.946   3.34   5.84    0e+00   29.952   3.34   5.84    0e+00
   100000000      12500000  uint64     sum   29.981   3.34   5.84    0e+00   30.100   3.32   5.81    0e+00
   100000000      12500000  uint64    prod   30.911   3.24   5.66    0e+00   30.800   3.25   5.68    0e+00
   100000000      12500000  uint64     max   29.890   3.35   5.85    0e+00   29.947   3.34   5.84    0e+00
   100000000      12500000  uint64     min   29.929   3.34   5.85    0e+00   29.964   3.34   5.84    0e+00

 Out of bounds values : 0 OK
 Avg bus bandwidth    : 5.81761