Pytorch on google-colaboratory GPU - Illegal memory access

up vote
0
down vote

favorite

I am using pytorch(0.4.0) on google-colaboratory ( NVIDIA-SMI 396.44 Driver Version: 396.44)

When running my code outside any function, I am able to send pytorch tensors and model to the GPU :

...

model.cuda()

data_tensor = data_tensor.cuda()

...

And my CNN model is trained successfully with 98% accurancy.

But when I put the same code in a function,

def main(...):

    ....

    model.cuda()

    data_tensor= data_tensor.cuda()

    ...



if __name__ == "__main__":

    main('...)

I have the following error:

cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

UPDATE(18/11/21):

It turned out that being part or not of a function is irrelevant. Usually, I have first a CUDNN_STATUS_EXECUTION_FAILED error then the second time a cuda runtime error (77) as shown below. But it sometimes works a few times before failing.

CUDNN_STATUS_EXECUTION_FAILED (first try) :

RuntimeError                              Traceback (most recent call last)

<ipython-input-27-53476e08e017> in <module>()

      1 main('mnist', 'to', 'ndd', Xd=16, epo=5, bs=100, tXn=-1, vXn=300,

----> 2      lr=0.05, suf="s1", n_class=10, cuda=True)



<ipython-input-23-918584456207> in main(ds, framework, format, Xd, epo, bs, tXn, vXn, lr, suf, n_class, cuda)

     12     opt = torch.optim.SGD(net.parameters(), lr)

     13 

---> 14     train(net, opt, Xd, epo, bs, cuda, tXn, tX, tT, vX, vT,lr)

     15 



<ipython-input-26-6b574a9e8af6> in train(model, optimizer, Xd, epo, bs, cuda, Xn, tX, tT, vX, vT, lr)

     26             #t = t.cuda()

     27             optimizer.zero_grad()

---> 28             z = model(x)

     29             bat_loss = criterion(z, t)

     30             bat_loss.backward()



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)

    489             result = self._slow_forward(*input, **kwargs)

    490         else:

--> 491             result = self.forward(*input, **kwargs)

    492         for hook in self._forward_hooks.values():

    493             hook_result = hook(self, input, result)



<ipython-input-22-b4bc2e0b39b8> in forward(self, X)

     10         H0 = torch.zeros(self.n_H, X.size(0), self.Wh)

     11         C0 = torch.zeros(self.n_H, X.size(0), self.Wh)

---> 12         O, (Hn, Cn), = self.lstm1(X, (H0, C0))

     13         O = self.linear1(O[:, -1, :])

     14         return O



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)

    489             result = self._slow_forward(*input, **kwargs)

    490         else:

--> 491             result = self.forward(*input, **kwargs)

    492         for hook in self._forward_hooks.values():

    493             hook_result = hook(self, input, result)



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in forward(self, input, hx)

    190             flat_weight=flat_weight

    191         )

--> 192         output, hidden = func(input, self.all_weights, hx, batch_sizes)

    193         if is_packed:

    194             output = PackedSequence(output, batch_sizes)



/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/rnn.py in forward(input, *fargs, **fkwargs)

    321             func = decorator(func)

    322 

--> 323         return func(input, *fargs, **fkwargs)

    324 

    325     return forward



/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/rnn.py in forward(input, weight, hx, batch_sizes)

    285             batch_first, dropout, train, bool(bidirectional),

    286             list(batch_sizes.data) if variable_length else (),

--> 287             dropout_ts)

    288 

    289         if cx is not None:



RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

cuda runtime error (77) (other tries):

RuntimeError                              Traceback (most recent call last)

<ipython-input-28-53476e08e017> in <module>()

      1 main('mnist', 'to', 'ndd', Xd=16, epo=5, bs=100, tXn=-1, vXn=300,

----> 2      lr=0.05, suf="s1", n_class=10, cuda=True)



<ipython-input-23-918584456207> in main(ds, framework, format, Xd, epo, bs, tXn, vXn, lr, suf, n_class, cuda)

     12     opt = torch.optim.SGD(net.parameters(), lr)

     13 

---> 14     train(net, opt, Xd, epo, bs, cuda, tXn, tX, tT, vX, vT,lr)

     15 



<ipython-input-26-6b574a9e8af6> in train(model, optimizer, Xd, epo, bs, cuda, Xn, tX, tT, vX, vT, lr)

      4     if cuda and torch.cuda.is_available():

      5         print("tX type (before):", tX.type())

----> 6         model.cuda()

      7         tX = tX.cuda()

      8         tT = tT.cuda()



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in cuda(self, device)

    247             Module: self

    248         """

--> 249         return self._apply(lambda t: t.cuda(device))

    250 

    251     def cpu(self):



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)

    174     def _apply(self, fn):

    175         for module in self.children():

--> 176             module._apply(fn)

    177 

    178         for param in self._parameters.values():



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in _apply(self, fn)

    109 

    110     def _apply(self, fn):

--> 111         ret = super(RNNBase, self)._apply(fn)

    112         self.flatten_parameters()

    113         return ret



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)

    180                 # Tensors stored in modules are graph leaves, and we don't

    181                 # want to create copy nodes, so we have to unpack the data.

--> 182                 param.data = fn(param.data)

    183                 if param._grad is not None:

    184                     param._grad.data = fn(param._grad.data)



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in <lambda>(t)

    247             Module: self

    248         """

--> 249         return self._apply(lambda t: t.cuda(device))

    250 

    251     def cpu(self):



RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

edited Nov 21 at 15:50

asked Nov 19 at 17:46

u2gilles

1,40211634

2

Can you share a self-contained notebook that reproduces the problem?
– Bob Smith
Nov 19 at 17:51

Bob, I updated my post. To be honest, when I took the trace this morning, it worked once !!! then failed again. Strange. The trace shows that it failed to put the model on GPU, but I also tested that it failed to put pytorch tensors.
– u2gilles
Nov 20 at 1:21

1

Can you share a complete, self-contained notebook? It will significantly simplify diagnosis.
– Bob Smith
Nov 20 at 1:27

Bob, please find a link to a standalone ipynb file to reproduce the problem if you like (without mounting google-drive) : drive.google.com/open?id=1enYkRsAuotTGsoce93XP2gAIuK6-i9ub
– u2gilles
Nov 21 at 4:25

I find the same problem on colab, but for me any attempt to convert a tensor to the gpu results in this error. After looking online, I'm still unsure what is wrong. Maybe colab gpus are being quirky?
– Superman
Nov 25 at 18:41

|
show 1 more comment

up vote
0
down vote

favorite

I am using pytorch(0.4.0) on google-colaboratory ( NVIDIA-SMI 396.44 Driver Version: 396.44)

When running my code outside any function, I am able to send pytorch tensors and model to the GPU :

...

model.cuda()

data_tensor = data_tensor.cuda()

...

And my CNN model is trained successfully with 98% accurancy.

But when I put the same code in a function,

def main(...):

    ....

    model.cuda()

    data_tensor= data_tensor.cuda()

    ...



if __name__ == "__main__":

    main('...)

I have the following error:

cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

UPDATE(18/11/21):

CUDNN_STATUS_EXECUTION_FAILED (first try) :

RuntimeError                              Traceback (most recent call last)

<ipython-input-27-53476e08e017> in <module>()

      1 main('mnist', 'to', 'ndd', Xd=16, epo=5, bs=100, tXn=-1, vXn=300,

----> 2      lr=0.05, suf="s1", n_class=10, cuda=True)



<ipython-input-23-918584456207> in main(ds, framework, format, Xd, epo, bs, tXn, vXn, lr, suf, n_class, cuda)

     12     opt = torch.optim.SGD(net.parameters(), lr)

     13 

---> 14     train(net, opt, Xd, epo, bs, cuda, tXn, tX, tT, vX, vT,lr)

     15 



<ipython-input-26-6b574a9e8af6> in train(model, optimizer, Xd, epo, bs, cuda, Xn, tX, tT, vX, vT, lr)

     26             #t = t.cuda()

     27             optimizer.zero_grad()

---> 28             z = model(x)

     29             bat_loss = criterion(z, t)

     30             bat_loss.backward()



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)

    489             result = self._slow_forward(*input, **kwargs)

    490         else:

--> 491             result = self.forward(*input, **kwargs)

    492         for hook in self._forward_hooks.values():

    493             hook_result = hook(self, input, result)



<ipython-input-22-b4bc2e0b39b8> in forward(self, X)

     10         H0 = torch.zeros(self.n_H, X.size(0), self.Wh)

     11         C0 = torch.zeros(self.n_H, X.size(0), self.Wh)

---> 12         O, (Hn, Cn), = self.lstm1(X, (H0, C0))

     13         O = self.linear1(O[:, -1, :])

     14         return O



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)

    489             result = self._slow_forward(*input, **kwargs)

    490         else:

--> 491             result = self.forward(*input, **kwargs)

    492         for hook in self._forward_hooks.values():

    493             hook_result = hook(self, input, result)



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in forward(self, input, hx)

    190             flat_weight=flat_weight

    191         )

--> 192         output, hidden = func(input, self.all_weights, hx, batch_sizes)

    193         if is_packed:

    194             output = PackedSequence(output, batch_sizes)



/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/rnn.py in forward(input, *fargs, **fkwargs)

    321             func = decorator(func)

    322 

--> 323         return func(input, *fargs, **fkwargs)

    324 

    325     return forward



/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/rnn.py in forward(input, weight, hx, batch_sizes)

    285             batch_first, dropout, train, bool(bidirectional),

    286             list(batch_sizes.data) if variable_length else (),

--> 287             dropout_ts)

    288 

    289         if cx is not None:



RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

cuda runtime error (77) (other tries):

RuntimeError                              Traceback (most recent call last)

<ipython-input-28-53476e08e017> in <module>()

      1 main('mnist', 'to', 'ndd', Xd=16, epo=5, bs=100, tXn=-1, vXn=300,

----> 2      lr=0.05, suf="s1", n_class=10, cuda=True)



<ipython-input-23-918584456207> in main(ds, framework, format, Xd, epo, bs, tXn, vXn, lr, suf, n_class, cuda)

     12     opt = torch.optim.SGD(net.parameters(), lr)

     13 

---> 14     train(net, opt, Xd, epo, bs, cuda, tXn, tX, tT, vX, vT,lr)

     15 



<ipython-input-26-6b574a9e8af6> in train(model, optimizer, Xd, epo, bs, cuda, Xn, tX, tT, vX, vT, lr)

      4     if cuda and torch.cuda.is_available():

      5         print("tX type (before):", tX.type())

----> 6         model.cuda()

      7         tX = tX.cuda()

      8         tT = tT.cuda()



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in cuda(self, device)

    247             Module: self

    248         """

--> 249         return self._apply(lambda t: t.cuda(device))

    250 

    251     def cpu(self):



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)

    174     def _apply(self, fn):

    175         for module in self.children():

--> 176             module._apply(fn)

    177 

    178         for param in self._parameters.values():



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in _apply(self, fn)

    109 

    110     def _apply(self, fn):

--> 111         ret = super(RNNBase, self)._apply(fn)

    112         self.flatten_parameters()

    113         return ret



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)

    180                 # Tensors stored in modules are graph leaves, and we don't

    181                 # want to create copy nodes, so we have to unpack the data.

--> 182                 param.data = fn(param.data)

    183                 if param._grad is not None:

    184                     param._grad.data = fn(param._grad.data)



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in <lambda>(t)

    247             Module: self

    248         """

--> 249         return self._apply(lambda t: t.cuda(device))

    250 

    251     def cpu(self):



RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

edited Nov 21 at 15:50

asked Nov 19 at 17:46

u2gilles

1,40211634

2

Can you share a self-contained notebook that reproduces the problem?
– Bob Smith
Nov 19 at 17:51

Bob, I updated my post. To be honest, when I took the trace this morning, it worked once !!! then failed again. Strange. The trace shows that it failed to put the model on GPU, but I also tested that it failed to put pytorch tensors.
– u2gilles
Nov 20 at 1:21

1

Can you share a complete, self-contained notebook? It will significantly simplify diagnosis.
– Bob Smith
Nov 20 at 1:27

Bob, please find a link to a standalone ipynb file to reproduce the problem if you like (without mounting google-drive) : drive.google.com/open?id=1enYkRsAuotTGsoce93XP2gAIuK6-i9ub
– u2gilles
Nov 21 at 4:25

I find the same problem on colab, but for me any attempt to convert a tensor to the gpu results in this error. After looking online, I'm still unsure what is wrong. Maybe colab gpus are being quirky?
– Superman
Nov 25 at 18:41

|
show 1 more comment

up vote
0
down vote

favorite

I am using pytorch(0.4.0) on google-colaboratory ( NVIDIA-SMI 396.44 Driver Version: 396.44)

When running my code outside any function, I am able to send pytorch tensors and model to the GPU :

...

model.cuda()

data_tensor = data_tensor.cuda()

...

And my CNN model is trained successfully with 98% accurancy.

But when I put the same code in a function,

def main(...):

    ....

    model.cuda()

    data_tensor= data_tensor.cuda()

    ...



if __name__ == "__main__":

    main('...)

I have the following error:

cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

UPDATE(18/11/21):

CUDNN_STATUS_EXECUTION_FAILED (first try) :

RuntimeError                              Traceback (most recent call last)

<ipython-input-27-53476e08e017> in <module>()

      1 main('mnist', 'to', 'ndd', Xd=16, epo=5, bs=100, tXn=-1, vXn=300,

----> 2      lr=0.05, suf="s1", n_class=10, cuda=True)



<ipython-input-23-918584456207> in main(ds, framework, format, Xd, epo, bs, tXn, vXn, lr, suf, n_class, cuda)

     12     opt = torch.optim.SGD(net.parameters(), lr)

     13 

---> 14     train(net, opt, Xd, epo, bs, cuda, tXn, tX, tT, vX, vT,lr)

     15 



<ipython-input-26-6b574a9e8af6> in train(model, optimizer, Xd, epo, bs, cuda, Xn, tX, tT, vX, vT, lr)

     26             #t = t.cuda()

     27             optimizer.zero_grad()

---> 28             z = model(x)

     29             bat_loss = criterion(z, t)

     30             bat_loss.backward()



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)

    489             result = self._slow_forward(*input, **kwargs)

    490         else:

--> 491             result = self.forward(*input, **kwargs)

    492         for hook in self._forward_hooks.values():

    493             hook_result = hook(self, input, result)



<ipython-input-22-b4bc2e0b39b8> in forward(self, X)

     10         H0 = torch.zeros(self.n_H, X.size(0), self.Wh)

     11         C0 = torch.zeros(self.n_H, X.size(0), self.Wh)

---> 12         O, (Hn, Cn), = self.lstm1(X, (H0, C0))

     13         O = self.linear1(O[:, -1, :])

     14         return O



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)

    489             result = self._slow_forward(*input, **kwargs)

    490         else:

--> 491             result = self.forward(*input, **kwargs)

    492         for hook in self._forward_hooks.values():

    493             hook_result = hook(self, input, result)



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in forward(self, input, hx)

    190             flat_weight=flat_weight

    191         )

--> 192         output, hidden = func(input, self.all_weights, hx, batch_sizes)

    193         if is_packed:

    194             output = PackedSequence(output, batch_sizes)



/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/rnn.py in forward(input, *fargs, **fkwargs)

    321             func = decorator(func)

    322 

--> 323         return func(input, *fargs, **fkwargs)

    324 

    325     return forward



/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/rnn.py in forward(input, weight, hx, batch_sizes)

    285             batch_first, dropout, train, bool(bidirectional),

    286             list(batch_sizes.data) if variable_length else (),

--> 287             dropout_ts)

    288 

    289         if cx is not None:



RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

cuda runtime error (77) (other tries):

RuntimeError                              Traceback (most recent call last)

<ipython-input-28-53476e08e017> in <module>()

      1 main('mnist', 'to', 'ndd', Xd=16, epo=5, bs=100, tXn=-1, vXn=300,

----> 2      lr=0.05, suf="s1", n_class=10, cuda=True)



<ipython-input-23-918584456207> in main(ds, framework, format, Xd, epo, bs, tXn, vXn, lr, suf, n_class, cuda)

     12     opt = torch.optim.SGD(net.parameters(), lr)

     13 

---> 14     train(net, opt, Xd, epo, bs, cuda, tXn, tX, tT, vX, vT,lr)

     15 



<ipython-input-26-6b574a9e8af6> in train(model, optimizer, Xd, epo, bs, cuda, Xn, tX, tT, vX, vT, lr)

      4     if cuda and torch.cuda.is_available():

      5         print("tX type (before):", tX.type())

----> 6         model.cuda()

      7         tX = tX.cuda()

      8         tT = tT.cuda()



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in cuda(self, device)

    247             Module: self

    248         """

--> 249         return self._apply(lambda t: t.cuda(device))

    250 

    251     def cpu(self):



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)

    174     def _apply(self, fn):

    175         for module in self.children():

--> 176             module._apply(fn)

    177 

    178         for param in self._parameters.values():



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in _apply(self, fn)

    109 

    110     def _apply(self, fn):

--> 111         ret = super(RNNBase, self)._apply(fn)

    112         self.flatten_parameters()

    113         return ret



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)

    180                 # Tensors stored in modules are graph leaves, and we don't

    181                 # want to create copy nodes, so we have to unpack the data.

--> 182                 param.data = fn(param.data)

    183                 if param._grad is not None:

    184                     param._grad.data = fn(param._grad.data)



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in <lambda>(t)

    247             Module: self

    248         """

--> 249         return self._apply(lambda t: t.cuda(device))

    250 

    251     def cpu(self):



RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

edited Nov 21 at 15:50

asked Nov 19 at 17:46

u2gilles

1,40211634

I am using pytorch(0.4.0) on google-colaboratory ( NVIDIA-SMI 396.44 Driver Version: 396.44)

When running my code outside any function, I am able to send pytorch tensors and model to the GPU :

...

model.cuda()

data_tensor = data_tensor.cuda()

...

And my CNN model is trained successfully with 98% accurancy.

But when I put the same code in a function,

def main(...):

    ....

    model.cuda()

    data_tensor= data_tensor.cuda()

    ...



if __name__ == "__main__":

    main('...)

I have the following error:

cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

UPDATE(18/11/21):

CUDNN_STATUS_EXECUTION_FAILED (first try) :

RuntimeError                              Traceback (most recent call last)

<ipython-input-27-53476e08e017> in <module>()

      1 main('mnist', 'to', 'ndd', Xd=16, epo=5, bs=100, tXn=-1, vXn=300,

----> 2      lr=0.05, suf="s1", n_class=10, cuda=True)



<ipython-input-23-918584456207> in main(ds, framework, format, Xd, epo, bs, tXn, vXn, lr, suf, n_class, cuda)

     12     opt = torch.optim.SGD(net.parameters(), lr)

     13 

---> 14     train(net, opt, Xd, epo, bs, cuda, tXn, tX, tT, vX, vT,lr)

     15 



<ipython-input-26-6b574a9e8af6> in train(model, optimizer, Xd, epo, bs, cuda, Xn, tX, tT, vX, vT, lr)

     26             #t = t.cuda()

     27             optimizer.zero_grad()

---> 28             z = model(x)

     29             bat_loss = criterion(z, t)

     30             bat_loss.backward()



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)

    489             result = self._slow_forward(*input, **kwargs)

    490         else:

--> 491             result = self.forward(*input, **kwargs)

    492         for hook in self._forward_hooks.values():

    493             hook_result = hook(self, input, result)



<ipython-input-22-b4bc2e0b39b8> in forward(self, X)

     10         H0 = torch.zeros(self.n_H, X.size(0), self.Wh)

     11         C0 = torch.zeros(self.n_H, X.size(0), self.Wh)

---> 12         O, (Hn, Cn), = self.lstm1(X, (H0, C0))

     13         O = self.linear1(O[:, -1, :])

     14         return O



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)

    489             result = self._slow_forward(*input, **kwargs)

    490         else:

--> 491             result = self.forward(*input, **kwargs)

    492         for hook in self._forward_hooks.values():

    493             hook_result = hook(self, input, result)



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in forward(self, input, hx)

    190             flat_weight=flat_weight

    191         )

--> 192         output, hidden = func(input, self.all_weights, hx, batch_sizes)

    193         if is_packed:

    194             output = PackedSequence(output, batch_sizes)



/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/rnn.py in forward(input, *fargs, **fkwargs)

    321             func = decorator(func)

    322 

--> 323         return func(input, *fargs, **fkwargs)

    324 

    325     return forward



/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/rnn.py in forward(input, weight, hx, batch_sizes)

    285             batch_first, dropout, train, bool(bidirectional),

    286             list(batch_sizes.data) if variable_length else (),

--> 287             dropout_ts)

    288 

    289         if cx is not None:



RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

cuda runtime error (77) (other tries):

RuntimeError                              Traceback (most recent call last)

<ipython-input-28-53476e08e017> in <module>()

      1 main('mnist', 'to', 'ndd', Xd=16, epo=5, bs=100, tXn=-1, vXn=300,

----> 2      lr=0.05, suf="s1", n_class=10, cuda=True)



<ipython-input-23-918584456207> in main(ds, framework, format, Xd, epo, bs, tXn, vXn, lr, suf, n_class, cuda)

     12     opt = torch.optim.SGD(net.parameters(), lr)

     13 

---> 14     train(net, opt, Xd, epo, bs, cuda, tXn, tX, tT, vX, vT,lr)

     15 



<ipython-input-26-6b574a9e8af6> in train(model, optimizer, Xd, epo, bs, cuda, Xn, tX, tT, vX, vT, lr)

      4     if cuda and torch.cuda.is_available():

      5         print("tX type (before):", tX.type())

----> 6         model.cuda()

      7         tX = tX.cuda()

      8         tT = tT.cuda()



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in cuda(self, device)

    247             Module: self

    248         """

--> 249         return self._apply(lambda t: t.cuda(device))

    250 

    251     def cpu(self):



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)

    174     def _apply(self, fn):

    175         for module in self.children():

--> 176             module._apply(fn)

    177 

    178         for param in self._parameters.values():



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in _apply(self, fn)

    109 

    110     def _apply(self, fn):

--> 111         ret = super(RNNBase, self)._apply(fn)

    112         self.flatten_parameters()

    113         return ret



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)

    180                 # Tensors stored in modules are graph leaves, and we don't

    181                 # want to create copy nodes, so we have to unpack the data.

--> 182                 param.data = fn(param.data)

    183                 if param._grad is not None:

    184                     param._grad.data = fn(param._grad.data)



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in <lambda>(t)

    247             Module: self

    248         """

--> 249         return self._apply(lambda t: t.cuda(device))

    250 

    251     def cpu(self):



RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

gpu pytorch google-colaboratory

edited Nov 21 at 15:50

asked Nov 19 at 17:46

u2gilles

1,40211634

edited Nov 21 at 15:50

asked Nov 19 at 17:46

u2gilles

1,40211634

edited Nov 21 at 15:50

asked Nov 19 at 17:46

u2gilles

1,40211634

asked Nov 19 at 17:46

u2gilles

1,40211634

asked Nov 19 at 17:46

u2gilles

1,40211634

2

Can you share a self-contained notebook that reproduces the problem?
– Bob Smith
Nov 19 at 17:51

Bob, I updated my post. To be honest, when I took the trace this morning, it worked once !!! then failed again. Strange. The trace shows that it failed to put the model on GPU, but I also tested that it failed to put pytorch tensors.
– u2gilles
Nov 20 at 1:21

1

Can you share a complete, self-contained notebook? It will significantly simplify diagnosis.
– Bob Smith
Nov 20 at 1:27

Bob, please find a link to a standalone ipynb file to reproduce the problem if you like (without mounting google-drive) : drive.google.com/open?id=1enYkRsAuotTGsoce93XP2gAIuK6-i9ub
– u2gilles
Nov 21 at 4:25

I find the same problem on colab, but for me any attempt to convert a tensor to the gpu results in this error. After looking online, I'm still unsure what is wrong. Maybe colab gpus are being quirky?
– Superman
Nov 25 at 18:41

|
show 1 more comment

2

Can you share a self-contained notebook that reproduces the problem?
– Bob Smith
Nov 19 at 17:51

Bob, I updated my post. To be honest, when I took the trace this morning, it worked once !!! then failed again. Strange. The trace shows that it failed to put the model on GPU, but I also tested that it failed to put pytorch tensors.
– u2gilles
Nov 20 at 1:21

1

Can you share a complete, self-contained notebook? It will significantly simplify diagnosis.
– Bob Smith
Nov 20 at 1:27

Bob, please find a link to a standalone ipynb file to reproduce the problem if you like (without mounting google-drive) : drive.google.com/open?id=1enYkRsAuotTGsoce93XP2gAIuK6-i9ub
– u2gilles
Nov 21 at 4:25

I find the same problem on colab, but for me any attempt to convert a tensor to the gpu results in this error. After looking online, I'm still unsure what is wrong. Maybe colab gpus are being quirky?
– Superman
Nov 25 at 18:41

Can you share a self-contained notebook that reproduces the problem?
– Bob Smith
Nov 19 at 17:51

Bob, I updated my post. To be honest, when I took the trace this morning, it worked once !!! then failed again. Strange. The trace shows that it failed to put the model on GPU, but I also tested that it failed to put pytorch tensors.
– u2gilles
Nov 20 at 1:21

Can you share a complete, self-contained notebook? It will significantly simplify diagnosis.
– Bob Smith
Nov 20 at 1:27

Bob, please find a link to a standalone ipynb file to reproduce the problem if you like (without mounting google-drive) : drive.google.com/open?id=1enYkRsAuotTGsoce93XP2gAIuK6-i9ub
– u2gilles
Nov 21 at 4:25

I find the same problem on colab, but for me any attempt to convert a tensor to the gpu results in this error. After looking online, I'm still unsure what is wrong. Maybe colab gpus are being quirky?
– Superman
Nov 25 at 18:41

|
show 1 more comment

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

It now works with Pytorch 1.0 using:

!pip3 install https://download.pytorch.org/whl/cu80/torch-1.0.0-cp36-cp36m-linux_x86_64.whl

answered 2 days ago

u2gilles

1,40211634

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53380068%2fpytorch-on-google-colaboratory-gpu-illegal-memory-access%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

It now works with Pytorch 1.0 using:

!pip3 install https://download.pytorch.org/whl/cu80/torch-1.0.0-cp36-cp36m-linux_x86_64.whl

answered 2 days ago

u2gilles

1,40211634

add a comment |

up vote
0
down vote

accepted

It now works with Pytorch 1.0 using:

!pip3 install https://download.pytorch.org/whl/cu80/torch-1.0.0-cp36-cp36m-linux_x86_64.whl

answered 2 days ago

u2gilles

1,40211634

add a comment |

up vote
0
down vote

accepted

It now works with Pytorch 1.0 using:

!pip3 install https://download.pytorch.org/whl/cu80/torch-1.0.0-cp36-cp36m-linux_x86_64.whl

answered 2 days ago

u2gilles

1,40211634

It now works with Pytorch 1.0 using:

!pip3 install https://download.pytorch.org/whl/cu80/torch-1.0.0-cp36-cp36m-linux_x86_64.whl

answered 2 days ago

u2gilles

1,40211634

answered 2 days ago

u2gilles

1,40211634

answered 2 days ago

u2gilles

1,40211634

answered 2 days ago

u2gilles

1,40211634

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr