gcloud ML engine - Keras not running on GPU
up vote
3
down vote
favorite
I'm new with google cloud machine learning engine,
I'm trying to train a DL algorithm for immage classification based on keras in gcloud.
In order to configure the GPU on gcloud I have included 'tensorflow-gpu'
in the setup.py install_requires
.
My cloud-gpu.yaml
is the following
trainingInput:
scaleTier: BASIC_GPU
runtimeVersion: "1.0"
In the code I've added
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
at the beginning and
with tf.device('/gpu:0'):
before any keras code.
The result is that gcloud is recognizing the gpu but not using it, as you can see from
Screenshot of actual cloud training:
INFO 2018-11-18 12:19:59 -0600 master-replica-0 Epoch 1/20
INFO 2018-11-18 12:20:56 -0600 master-replica-0 1/219 [..............................] - ETA: 4:17:12 - loss: 0.8846 - acc: 0.5053 - f1_measure: 0.1043
INFO 2018-11-18 12:21:57 -0600 master-replica-0 2/219 [..............................] - ETA: 3:51:32 - loss: 0.8767 - acc: 0.5018 - f1_measure: 0.1013
INFO 2018-11-18 12:22:59 -0600 master-replica-0 3/219 [..............................] - ETA: 3:46:49 - loss: 0.8634 - acc: 0.5039 - f1_measure: 0.1010
INFO 2018-11-18 12:23:58 -0600 master-replica-0 4/219 [..............................] - ETA: 3:44:59 - loss: 0.8525 - acc: 0.5045 - f1_measure: 0.0991
INFO 2018-11-18 12:24:48 -0600 master-replica-0 5/219 [..............................] - ETA: 3:41:17 - loss: 0.8434 - acc: 0.5031 - f1_measure: 0.0992Sun Nov 18 18:24:48 2018
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-----------------------------------------------------------------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | NVIDIA-SMI 396.26 Driver Version: 396.26 |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 |-------------------------------+----------------------+----------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 |===============================+======================+======================|
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | N/A 32C P0 56W / 149W | 10955MiB / 11441MiB | 0% Default |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-------------------------------+----------------------+----------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-----------------------------------------------------------------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | Processes: GPU Memory |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | GPU PID Type Process name Usage |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 |=============================================================================|
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-----------------------------------------------------------------------------+
Basically the GPU usage stays at 0% during the training, how can this be possible ?
python tensorflow keras google-cloud-platform deep-learning
add a comment |
up vote
3
down vote
favorite
I'm new with google cloud machine learning engine,
I'm trying to train a DL algorithm for immage classification based on keras in gcloud.
In order to configure the GPU on gcloud I have included 'tensorflow-gpu'
in the setup.py install_requires
.
My cloud-gpu.yaml
is the following
trainingInput:
scaleTier: BASIC_GPU
runtimeVersion: "1.0"
In the code I've added
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
at the beginning and
with tf.device('/gpu:0'):
before any keras code.
The result is that gcloud is recognizing the gpu but not using it, as you can see from
Screenshot of actual cloud training:
INFO 2018-11-18 12:19:59 -0600 master-replica-0 Epoch 1/20
INFO 2018-11-18 12:20:56 -0600 master-replica-0 1/219 [..............................] - ETA: 4:17:12 - loss: 0.8846 - acc: 0.5053 - f1_measure: 0.1043
INFO 2018-11-18 12:21:57 -0600 master-replica-0 2/219 [..............................] - ETA: 3:51:32 - loss: 0.8767 - acc: 0.5018 - f1_measure: 0.1013
INFO 2018-11-18 12:22:59 -0600 master-replica-0 3/219 [..............................] - ETA: 3:46:49 - loss: 0.8634 - acc: 0.5039 - f1_measure: 0.1010
INFO 2018-11-18 12:23:58 -0600 master-replica-0 4/219 [..............................] - ETA: 3:44:59 - loss: 0.8525 - acc: 0.5045 - f1_measure: 0.0991
INFO 2018-11-18 12:24:48 -0600 master-replica-0 5/219 [..............................] - ETA: 3:41:17 - loss: 0.8434 - acc: 0.5031 - f1_measure: 0.0992Sun Nov 18 18:24:48 2018
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-----------------------------------------------------------------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | NVIDIA-SMI 396.26 Driver Version: 396.26 |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 |-------------------------------+----------------------+----------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 |===============================+======================+======================|
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | N/A 32C P0 56W / 149W | 10955MiB / 11441MiB | 0% Default |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-------------------------------+----------------------+----------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-----------------------------------------------------------------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | Processes: GPU Memory |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | GPU PID Type Process name Usage |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 |=============================================================================|
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-----------------------------------------------------------------------------+
Basically the GPU usage stays at 0% during the training, how can this be possible ?
python tensorflow keras google-cloud-platform deep-learning
Pleas provide actual code / text instead of screenshots, since your image could disappear from hosting service.
– LoneWanderer
Nov 18 at 3:56
Thank You, I've added the actual terminal code
– Flavio Di Palo
Nov 18 at 18:29
add a comment |
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I'm new with google cloud machine learning engine,
I'm trying to train a DL algorithm for immage classification based on keras in gcloud.
In order to configure the GPU on gcloud I have included 'tensorflow-gpu'
in the setup.py install_requires
.
My cloud-gpu.yaml
is the following
trainingInput:
scaleTier: BASIC_GPU
runtimeVersion: "1.0"
In the code I've added
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
at the beginning and
with tf.device('/gpu:0'):
before any keras code.
The result is that gcloud is recognizing the gpu but not using it, as you can see from
Screenshot of actual cloud training:
INFO 2018-11-18 12:19:59 -0600 master-replica-0 Epoch 1/20
INFO 2018-11-18 12:20:56 -0600 master-replica-0 1/219 [..............................] - ETA: 4:17:12 - loss: 0.8846 - acc: 0.5053 - f1_measure: 0.1043
INFO 2018-11-18 12:21:57 -0600 master-replica-0 2/219 [..............................] - ETA: 3:51:32 - loss: 0.8767 - acc: 0.5018 - f1_measure: 0.1013
INFO 2018-11-18 12:22:59 -0600 master-replica-0 3/219 [..............................] - ETA: 3:46:49 - loss: 0.8634 - acc: 0.5039 - f1_measure: 0.1010
INFO 2018-11-18 12:23:58 -0600 master-replica-0 4/219 [..............................] - ETA: 3:44:59 - loss: 0.8525 - acc: 0.5045 - f1_measure: 0.0991
INFO 2018-11-18 12:24:48 -0600 master-replica-0 5/219 [..............................] - ETA: 3:41:17 - loss: 0.8434 - acc: 0.5031 - f1_measure: 0.0992Sun Nov 18 18:24:48 2018
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-----------------------------------------------------------------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | NVIDIA-SMI 396.26 Driver Version: 396.26 |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 |-------------------------------+----------------------+----------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 |===============================+======================+======================|
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | N/A 32C P0 56W / 149W | 10955MiB / 11441MiB | 0% Default |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-------------------------------+----------------------+----------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-----------------------------------------------------------------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | Processes: GPU Memory |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | GPU PID Type Process name Usage |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 |=============================================================================|
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-----------------------------------------------------------------------------+
Basically the GPU usage stays at 0% during the training, how can this be possible ?
python tensorflow keras google-cloud-platform deep-learning
I'm new with google cloud machine learning engine,
I'm trying to train a DL algorithm for immage classification based on keras in gcloud.
In order to configure the GPU on gcloud I have included 'tensorflow-gpu'
in the setup.py install_requires
.
My cloud-gpu.yaml
is the following
trainingInput:
scaleTier: BASIC_GPU
runtimeVersion: "1.0"
In the code I've added
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
at the beginning and
with tf.device('/gpu:0'):
before any keras code.
The result is that gcloud is recognizing the gpu but not using it, as you can see from
Screenshot of actual cloud training:
INFO 2018-11-18 12:19:59 -0600 master-replica-0 Epoch 1/20
INFO 2018-11-18 12:20:56 -0600 master-replica-0 1/219 [..............................] - ETA: 4:17:12 - loss: 0.8846 - acc: 0.5053 - f1_measure: 0.1043
INFO 2018-11-18 12:21:57 -0600 master-replica-0 2/219 [..............................] - ETA: 3:51:32 - loss: 0.8767 - acc: 0.5018 - f1_measure: 0.1013
INFO 2018-11-18 12:22:59 -0600 master-replica-0 3/219 [..............................] - ETA: 3:46:49 - loss: 0.8634 - acc: 0.5039 - f1_measure: 0.1010
INFO 2018-11-18 12:23:58 -0600 master-replica-0 4/219 [..............................] - ETA: 3:44:59 - loss: 0.8525 - acc: 0.5045 - f1_measure: 0.0991
INFO 2018-11-18 12:24:48 -0600 master-replica-0 5/219 [..............................] - ETA: 3:41:17 - loss: 0.8434 - acc: 0.5031 - f1_measure: 0.0992Sun Nov 18 18:24:48 2018
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-----------------------------------------------------------------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | NVIDIA-SMI 396.26 Driver Version: 396.26 |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 |-------------------------------+----------------------+----------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 |===============================+======================+======================|
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | N/A 32C P0 56W / 149W | 10955MiB / 11441MiB | 0% Default |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-------------------------------+----------------------+----------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-----------------------------------------------------------------------------+
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | Processes: GPU Memory |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 | GPU PID Type Process name Usage |
INFO 2018-11-18 12:24:48 -0600 master-replica-0 |=============================================================================|
INFO 2018-11-18 12:24:48 -0600 master-replica-0 +-----------------------------------------------------------------------------+
Basically the GPU usage stays at 0% during the training, how can this be possible ?
python tensorflow keras google-cloud-platform deep-learning
python tensorflow keras google-cloud-platform deep-learning
edited Nov 18 at 18:28
asked Nov 18 at 3:12
Flavio Di Palo
164
164
Pleas provide actual code / text instead of screenshots, since your image could disappear from hosting service.
– LoneWanderer
Nov 18 at 3:56
Thank You, I've added the actual terminal code
– Flavio Di Palo
Nov 18 at 18:29
add a comment |
Pleas provide actual code / text instead of screenshots, since your image could disappear from hosting service.
– LoneWanderer
Nov 18 at 3:56
Thank You, I've added the actual terminal code
– Flavio Di Palo
Nov 18 at 18:29
Pleas provide actual code / text instead of screenshots, since your image could disappear from hosting service.
– LoneWanderer
Nov 18 at 3:56
Pleas provide actual code / text instead of screenshots, since your image could disappear from hosting service.
– LoneWanderer
Nov 18 at 3:56
Thank You, I've added the actual terminal code
– Flavio Di Palo
Nov 18 at 18:29
Thank You, I've added the actual terminal code
– Flavio Di Palo
Nov 18 at 18:29
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53357560%2fgcloud-ml-engine-keras-not-running-on-gpu%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Pleas provide actual code / text instead of screenshots, since your image could disappear from hosting service.
– LoneWanderer
Nov 18 at 3:56
Thank You, I've added the actual terminal code
– Flavio Di Palo
Nov 18 at 18:29