Dear edwardlib users.
When doing inference with GPU (Quadro M1000M) and CPU in a simple model, CPU wins to GPU! Why is this? (I have CUDA 9.0, tensorflow 1.6, edward 1.3.5,)
nvidia-smi shows a low value on GPU-util (~30%) Maybe I forgot switch on something on the GPU …
Sat Jul 7 18:40:57 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.67 Driver Version: 390.67 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro M1000M Off | 00000000:01:00.0 On | N/A |
| N/A 50C P0 N/A / N/A | 1446MiB / 4010MiB | 27% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 749 G /usr/lib/xorg/Xorg 107MiB |
| 0 1026 G /usr/lib/xorg/Xorg 376MiB |
| 0 1116 G /usr/bin/gnome-shell 279MiB |
| 0 1434 G ...-token=1D0E45785BEA23F319116BEF8579CF02 125MiB |
| 0 1538 G ...-token=B1B89C1FA11A41D60B93E42EE7B10455 197MiB |
| 0 4005 G ...-token=499EFE838A1B3A99EB5FF5A5F1F95349 194MiB |
| 0 5336 C python 106MiB |
+-----------------------------------------------------------------------------+
- Output doing inference with GPU:
2018-07-07 18:40:35.359084: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-07-07 18:40:35.419953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-07-07 18:40:35.420395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: Quadro M1000M major: 5 minor: 0 memoryClockRate(GHz): 1.0715
pciBusID: 0000:01:00.0
totalMemory: 3.92GiB freeMemory: 2.58GiB
2018-07-07 18:40:35.420411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-07-07 18:40:35.869614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2294 MB memory) -> physical GPU (device: 0, name: Quadro M1000M, pci bus id: 0000:01:00.0, compute capability: 5.0)
10000/10000 [100%] ██████████████████████████████ Elapsed: 21s | Acceptance Rate: 0.977
- Output doing inference with CPU:
2018-07-07 18:41:06.702931: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
10000/10000 [100%] ██████████████████████████████ Elapsed: 5s | Acceptance Rate: 0.977
- Here the simple model to test:
"""Correlated normal posterior. Inference with Hamiltonian Monte Carlo.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import tensorflow as tf
import time
from edward.models import Empirical, MultivariateNormalTriL
import edward as ed
ed.set_seed(42)
# MODEL
z = MultivariateNormalTriL(
loc=tf.ones(2),
scale_tril=tf.cholesky(tf.constant([[1.0, 0.8], [0.8, 1.0]])))
# INFERENCE
qz = Empirical(params=tf.Variable(tf.random_normal([10000, 2])))
inference = ed.HMC({z: qz})
inference.run()