Pytorch on google-colaboratory GPU - Illegal memory access

up vote
0
down vote

favorite

I am using pytorch(0.4.0) on google-colaboratory ( NVIDIA-SMI 396.44 Driver Version: 396.44)

When running my code outside any function, I am able to send pytorch tensors and model to the GPU :

...

model.cuda()

data_tensor = data_tensor.cuda()

...

And my CNN model is trained successfully with 98% accurancy.

But when I put the same code in a function,

def main(...):

    ....

    model.cuda()

    data_tensor= data_tensor.cuda()

    ...



if __name__ == "__main__":

    main('...)

I have the following error:

cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

UPDATE(18/11/21):

It turned out that being part or not of a function is irrelevant. Usually, I have first a CUDNN_STATUS_EXECUTION_FAILED error then the second time a cuda runtime error (77) as shown below. But it sometimes works a few times before failing.

CUDNN_STATUS_EXECUTION_FAILED (first try) :

RuntimeError                              Traceback (most recent call last)

<ipython-input-27-53476e08e017> in <module>()

      1 main('mnist', 'to', 'ndd', Xd=16, epo=5, bs=100, tXn=-1, vXn=300,

----> 2      lr=0.05, suf="s1", n_class=10, cuda=True)



<ipython-input-23-918584456207> in main(ds, framework, format, Xd, epo, bs, tXn, vXn, lr, suf, n_class, cuda)

     12     opt = torch.optim.SGD(net.parameters(), lr)

     13 

---> 14     train(net, opt, Xd, epo, bs, cuda, tXn, tX, tT, vX, vT,lr)

     15 



<ipython-input-26-6b574a9e8af6> in train(model, optimizer, Xd, epo, bs, cuda, Xn, tX, tT, vX, vT, lr)

     26             #t = t.cuda()

     27             optimizer.zero_grad()

---> 28             z = model(x)

     29             bat_loss = criterion(z, t)

     30             bat_loss.backward()



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)

    489             result = self._slow_forward(*input, **kwargs)

    490         else:

--> 491             result = self.forward(*input, **kwargs)

    492         for hook in self._forward_hooks.values():

    493             hook_result = hook(self, input, result)



<ipython-input-22-b4bc2e0b39b8> in forward(self, X)

     10         H0 = torch.zeros(self.n_H, X.size(0), self.Wh)

     11         C0 = torch.zeros(self.n_H, X.size(0), self.Wh)

---> 12         O, (Hn, Cn), = self.lstm1(X, (H0, C0))

     13         O = self.linear1(O[:, -1, :])

     14         return O



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)

    489             result = self._slow_forward(*input, **kwargs)

    490         else:

--> 491             result = self.forward(*input, **kwargs)

    492         for hook in self._forward_hooks.values():

    493             hook_result = hook(self, input, result)



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in forward(self, input, hx)

    190             flat_weight=flat_weight

    191         )

--> 192         output, hidden = func(input, self.all_weights, hx, batch_sizes)

    193         if is_packed:

    194             output = PackedSequence(output, batch_sizes)



/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/rnn.py in forward(input, *fargs, **fkwargs)

    321             func = decorator(func)

    322 

--> 323         return func(input, *fargs, **fkwargs)

    324 

    325     return forward



/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/rnn.py in forward(input, weight, hx, batch_sizes)

    285             batch_first, dropout, train, bool(bidirectional),

    286             list(batch_sizes.data) if variable_length else (),

--> 287             dropout_ts)

    288 

    289         if cx is not None:



RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

cuda runtime error (77) (other tries):

RuntimeError                              Traceback (most recent call last)

<ipython-input-28-53476e08e017> in <module>()

      1 main('mnist', 'to', 'ndd', Xd=16, epo=5, bs=100, tXn=-1, vXn=300,

----> 2      lr=0.05, suf="s1", n_class=10, cuda=True)



<ipython-input-23-918584456207> in main(ds, framework, format, Xd, epo, bs, tXn, vXn, lr, suf, n_class, cuda)

     12     opt = torch.optim.SGD(net.parameters(), lr)

     13 

---> 14     train(net, opt, Xd, epo, bs, cuda, tXn, tX, tT, vX, vT,lr)

     15 



<ipython-input-26-6b574a9e8af6> in train(model, optimizer, Xd, epo, bs, cuda, Xn, tX, tT, vX, vT, lr)

      4     if cuda and torch.cuda.is_available():

      5         print("tX type (before):", tX.type())

----> 6         model.cuda()

      7         tX = tX.cuda()

      8         tT = tT.cuda()



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in cuda(self, device)

    247             Module: self

    248         """

--> 249         return self._apply(lambda t: t.cuda(device))

    250 

    251     def cpu(self):



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)

    174     def _apply(self, fn):

    175         for module in self.children():

--> 176             module._apply(fn)

    177 

    178         for param in self._parameters.values():



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in _apply(self, fn)

    109 

    110     def _apply(self, fn):

--> 111         ret = super(RNNBase, self)._apply(fn)

    112         self.flatten_parameters()

    113         return ret



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)

    180                 # Tensors stored in modules are graph leaves, and we don't

    181                 # want to create copy nodes, so we have to unpack the data.

--> 182                 param.data = fn(param.data)

    183                 if param._grad is not None:

    184                     param._grad.data = fn(param._grad.data)



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in <lambda>(t)

    247             Module: self

    248         """

--> 249         return self._apply(lambda t: t.cuda(device))

    250 

    251     def cpu(self):



RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

edited Nov 21 at 15:50

asked Nov 19 at 17:46

u2gilles

1,40211634

2

Can you share a self-contained notebook that reproduces the problem?
– Bob Smith
Nov 19 at 17:51

Bob, I updated my post. To be honest, when I took the trace this morning, it worked once !!! then failed again. Strange. The trace shows that it failed to put the model on GPU, but I also tested that it failed to put pytorch tensors.
– u2gilles
Nov 20 at 1:21

1

Can you share a complete, self-contained notebook? It will significantly simplify diagnosis.
– Bob Smith
Nov 20 at 1:27

Bob, please find a link to a standalone ipynb file to reproduce the problem if you like (without mounting google-drive) : drive.google.com/open?id=1enYkRsAuotTGsoce93XP2gAIuK6-i9ub
– u2gilles
Nov 21 at 4:25

I find the same problem on colab, but for me any attempt to convert a tensor to the gpu results in this error. After looking online, I'm still unsure what is wrong. Maybe colab gpus are being quirky?
– Superman
Nov 25 at 18:41

|
show 1 more comment

up vote
0
down vote

favorite

I am using pytorch(0.4.0) on google-colaboratory ( NVIDIA-SMI 396.44 Driver Version: 396.44)

When running my code outside any function, I am able to send pytorch tensors and model to the GPU :

...

model.cuda()

data_tensor = data_tensor.cuda()

...

And my CNN model is trained successfully with 98% accurancy.

But when I put the same code in a function,

def main(...):

    ....

    model.cuda()

    data_tensor= data_tensor.cuda()

    ...



if __name__ == "__main__":

    main('...)

I have the following error:

cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

UPDATE(18/11/21):

CUDNN_STATUS_EXECUTION_FAILED (first try) :

RuntimeError                              Traceback (most recent call last)

<ipython-input-27-53476e08e017> in <module>()

      1 main('mnist', 'to', 'ndd', Xd=16, epo=5, bs=100, tXn=-1, vXn=300,

----> 2      lr=0.05, suf="s1", n_class=10, cuda=True)



<ipython-input-23-918584456207> in main(ds, framework, format, Xd, epo, bs, tXn, vXn, lr, suf, n_class, cuda)

     12     opt = torch.optim.SGD(net.parameters(), lr)

     13 

---> 14     train(net, opt, Xd, epo, bs, cuda, tXn, tX, tT, vX, vT,lr)

     15 



<ipython-input-26-6b574a9e8af6> in train(model, optimizer, Xd, epo, bs, cuda, Xn, tX, tT, vX, vT, lr)

     26             #t = t.cuda()

     27             optimizer.zero_grad()

---> 28             z = model(x)

     29             bat_loss = criterion(z, t)

     30             bat_loss.backward()



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)

    489             result = self._slow_forward(*input, **kwargs)

    490         else:

--> 491             result = self.forward(*input, **kwargs)

    492         for hook in self._forward_hooks.values():

    493             hook_result = hook(self, input, result)



<ipython-input-22-b4bc2e0b39b8> in forward(self, X)

     10         H0 = torch.zeros(self.n_H, X.size(0), self.Wh)

     11         C0 = torch.zeros(self.n_H, X.size(0), self.Wh)

---> 12         O, (Hn, Cn), = self.lstm1(X, (H0, C0))

     13         O = self.linear1(O[:, -1, :])

     14         return O



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)

    489             result = self._slow_forward(*input, **kwargs)

    490         else:

--> 491             result = self.forward(*input, **kwargs)

    492         for hook in self._forward_hooks.values():

    493             hook_result = hook(self, input, result)



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in forward(self, input, hx)

    190             flat_weight=flat_weight

    191         )

--> 192         output, hidden = func(input, self.all_weights, hx, batch_sizes)

    193         if is_packed:

    194             output = PackedSequence(output, batch_sizes)



/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/rnn.py in forward(input, *fargs, **fkwargs)

    321             func = decorator(func)

    322 

--> 323         return func(input, *fargs, **fkwargs)

    324 

    325     return forward



/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/rnn.py in forward(input, weight, hx, batch_sizes)

    285             batch_first, dropout, train, bool(bidirectional),

    286             list(batch_sizes.data) if variable_length else (),

--> 287             dropout_ts)

    288 

    289         if cx is not None:



RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

cuda runtime error (77) (other tries):

RuntimeError                              Traceback (most recent call last)

<ipython-input-28-53476e08e017> in <module>()

      1 main('mnist', 'to', 'ndd', Xd=16, epo=5, bs=100, tXn=-1, vXn=300,

----> 2      lr=0.05, suf="s1", n_class=10, cuda=True)



<ipython-input-23-918584456207> in main(ds, framework, format, Xd, epo, bs, tXn, vXn, lr, suf, n_class, cuda)

     12     opt = torch.optim.SGD(net.parameters(), lr)

     13 

---> 14     train(net, opt, Xd, epo, bs, cuda, tXn, tX, tT, vX, vT,lr)

     15 



<ipython-input-26-6b574a9e8af6> in train(model, optimizer, Xd, epo, bs, cuda, Xn, tX, tT, vX, vT, lr)

      4     if cuda and torch.cuda.is_available():

      5         print("tX type (before):", tX.type())

----> 6         model.cuda()

      7         tX = tX.cuda()

      8         tT = tT.cuda()



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in cuda(self, device)

    247             Module: self

    248         """

--> 249         return self._apply(lambda t: t.cuda(device))

    250 

    251     def cpu(self):



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)

    174     def _apply(self, fn):

    175         for module in self.children():

--> 176             module._apply(fn)

    177 

    178         for param in self._parameters.values():



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in _apply(self, fn)

    109 

    110     def _apply(self, fn):

--> 111         ret = super(RNNBase, self)._apply(fn)

    112         self.flatten_parameters()

    113         return ret



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)

    180                 # Tensors stored in modules are graph leaves, and we don't

    181                 # want to create copy nodes, so we have to unpack the data.

--> 182                 param.data = fn(param.data)

    183                 if param._grad is not None:

    184                     param._grad.data = fn(param._grad.data)



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in <lambda>(t)

    247             Module: self

    248         """

--> 249         return self._apply(lambda t: t.cuda(device))

    250 

    251     def cpu(self):



RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

edited Nov 21 at 15:50

asked Nov 19 at 17:46

u2gilles

1,40211634

2

Can you share a self-contained notebook that reproduces the problem?
– Bob Smith
Nov 19 at 17:51

Bob, I updated my post. To be honest, when I took the trace this morning, it worked once !!! then failed again. Strange. The trace shows that it failed to put the model on GPU, but I also tested that it failed to put pytorch tensors.
– u2gilles
Nov 20 at 1:21

1

Can you share a complete, self-contained notebook? It will significantly simplify diagnosis.
– Bob Smith
Nov 20 at 1:27

Bob, please find a link to a standalone ipynb file to reproduce the problem if you like (without mounting google-drive) : drive.google.com/open?id=1enYkRsAuotTGsoce93XP2gAIuK6-i9ub
– u2gilles
Nov 21 at 4:25

I find the same problem on colab, but for me any attempt to convert a tensor to the gpu results in this error. After looking online, I'm still unsure what is wrong. Maybe colab gpus are being quirky?
– Superman
Nov 25 at 18:41

|
show 1 more comment

up vote
0
down vote

favorite

I am using pytorch(0.4.0) on google-colaboratory ( NVIDIA-SMI 396.44 Driver Version: 396.44)

When running my code outside any function, I am able to send pytorch tensors and model to the GPU :

...

model.cuda()

data_tensor = data_tensor.cuda()

...

And my CNN model is trained successfully with 98% accurancy.

But when I put the same code in a function,

def main(...):

    ....

    model.cuda()

    data_tensor= data_tensor.cuda()

    ...



if __name__ == "__main__":

    main('...)

I have the following error:

cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

UPDATE(18/11/21):

CUDNN_STATUS_EXECUTION_FAILED (first try) :

RuntimeError                              Traceback (most recent call last)

<ipython-input-27-53476e08e017> in <module>()

      1 main('mnist', 'to', 'ndd', Xd=16, epo=5, bs=100, tXn=-1, vXn=300,

----> 2      lr=0.05, suf="s1", n_class=10, cuda=True)



<ipython-input-23-918584456207> in main(ds, framework, format, Xd, epo, bs, tXn, vXn, lr, suf, n_class, cuda)

     12     opt = torch.optim.SGD(net.parameters(), lr)

     13 

---> 14     train(net, opt, Xd, epo, bs, cuda, tXn, tX, tT, vX, vT,lr)

     15 



<ipython-input-26-6b574a9e8af6> in train(model, optimizer, Xd, epo, bs, cuda, Xn, tX, tT, vX, vT, lr)

     26             #t = t.cuda()

     27             optimizer.zero_grad()

---> 28             z = model(x)

     29             bat_loss = criterion(z, t)

     30             bat_loss.backward()



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)

    489             result = self._slow_forward(*input, **kwargs)

    490         else:

--> 491             result = self.forward(*input, **kwargs)

    492         for hook in self._forward_hooks.values():

    493             hook_result = hook(self, input, result)



<ipython-input-22-b4bc2e0b39b8> in forward(self, X)

     10         H0 = torch.zeros(self.n_H, X.size(0), self.Wh)

     11         C0 = torch.zeros(self.n_H, X.size(0), self.Wh)

---> 12         O, (Hn, Cn), = self.lstm1(X, (H0, C0))

     13         O = self.linear1(O[:, -1, :])

     14         return O



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)

    489             result = self._slow_forward(*input, **kwargs)

    490         else:

--> 491             result = self.forward(*input, **kwargs)

    492         for hook in self._forward_hooks.values():

    493             hook_result = hook(self, input, result)



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in forward(self, input, hx)

    190             flat_weight=flat_weight

    191         )

--> 192         output, hidden = func(input, self.all_weights, hx, batch_sizes)

    193         if is_packed:

    194             output = PackedSequence(output, batch_sizes)



/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/rnn.py in forward(input, *fargs, **fkwargs)

    321             func = decorator(func)

    322 

--> 323         return func(input, *fargs, **fkwargs)

    324 

    325     return forward



/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/rnn.py in forward(input, weight, hx, batch_sizes)

    285             batch_first, dropout, train, bool(bidirectional),

    286             list(batch_sizes.data) if variable_length else (),

--> 287             dropout_ts)

    288 

    289         if cx is not None:



RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

cuda runtime error (77) (other tries):

RuntimeError                              Traceback (most recent call last)

<ipython-input-28-53476e08e017> in <module>()

      1 main('mnist', 'to', 'ndd', Xd=16, epo=5, bs=100, tXn=-1, vXn=300,

----> 2      lr=0.05, suf="s1", n_class=10, cuda=True)



<ipython-input-23-918584456207> in main(ds, framework, format, Xd, epo, bs, tXn, vXn, lr, suf, n_class, cuda)

     12     opt = torch.optim.SGD(net.parameters(), lr)

     13 

---> 14     train(net, opt, Xd, epo, bs, cuda, tXn, tX, tT, vX, vT,lr)

     15 



<ipython-input-26-6b574a9e8af6> in train(model, optimizer, Xd, epo, bs, cuda, Xn, tX, tT, vX, vT, lr)

      4     if cuda and torch.cuda.is_available():

      5         print("tX type (before):", tX.type())

----> 6         model.cuda()

      7         tX = tX.cuda()

      8         tT = tT.cuda()



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in cuda(self, device)

    247             Module: self

    248         """

--> 249         return self._apply(lambda t: t.cuda(device))

    250 

    251     def cpu(self):



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)

    174     def _apply(self, fn):

    175         for module in self.children():

--> 176             module._apply(fn)

    177 

    178         for param in self._parameters.values():



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in _apply(self, fn)

    109 

    110     def _apply(self, fn):

--> 111         ret = super(RNNBase, self)._apply(fn)

    112         self.flatten_parameters()

    113         return ret



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)

    180                 # Tensors stored in modules are graph leaves, and we don't

    181                 # want to create copy nodes, so we have to unpack the data.

--> 182                 param.data = fn(param.data)

    183                 if param._grad is not None:

    184                     param._grad.data = fn(param._grad.data)



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in <lambda>(t)

    247             Module: self

    248         """

--> 249         return self._apply(lambda t: t.cuda(device))

    250 

    251     def cpu(self):



RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

edited Nov 21 at 15:50

asked Nov 19 at 17:46

u2gilles

1,40211634

I am using pytorch(0.4.0) on google-colaboratory ( NVIDIA-SMI 396.44 Driver Version: 396.44)

When running my code outside any function, I am able to send pytorch tensors and model to the GPU :

...

model.cuda()

data_tensor = data_tensor.cuda()

...

And my CNN model is trained successfully with 98% accurancy.

But when I put the same code in a function,

def main(...):

    ....

    model.cuda()

    data_tensor= data_tensor.cuda()

    ...



if __name__ == "__main__":

    main('...)

I have the following error:

cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

UPDATE(18/11/21):

CUDNN_STATUS_EXECUTION_FAILED (first try) :

RuntimeError                              Traceback (most recent call last)

<ipython-input-27-53476e08e017> in <module>()

      1 main('mnist', 'to', 'ndd', Xd=16, epo=5, bs=100, tXn=-1, vXn=300,

----> 2      lr=0.05, suf="s1", n_class=10, cuda=True)



<ipython-input-23-918584456207> in main(ds, framework, format, Xd, epo, bs, tXn, vXn, lr, suf, n_class, cuda)

     12     opt = torch.optim.SGD(net.parameters(), lr)

     13 

---> 14     train(net, opt, Xd, epo, bs, cuda, tXn, tX, tT, vX, vT,lr)

     15 



<ipython-input-26-6b574a9e8af6> in train(model, optimizer, Xd, epo, bs, cuda, Xn, tX, tT, vX, vT, lr)

     26             #t = t.cuda()

     27             optimizer.zero_grad()

---> 28             z = model(x)

     29             bat_loss = criterion(z, t)

     30             bat_loss.backward()



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)

    489             result = self._slow_forward(*input, **kwargs)

    490         else:

--> 491             result = self.forward(*input, **kwargs)

    492         for hook in self._forward_hooks.values():

    493             hook_result = hook(self, input, result)



<ipython-input-22-b4bc2e0b39b8> in forward(self, X)

     10         H0 = torch.zeros(self.n_H, X.size(0), self.Wh)

     11         C0 = torch.zeros(self.n_H, X.size(0), self.Wh)

---> 12         O, (Hn, Cn), = self.lstm1(X, (H0, C0))

     13         O = self.linear1(O[:, -1, :])

     14         return O



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)

    489             result = self._slow_forward(*input, **kwargs)

    490         else:

--> 491             result = self.forward(*input, **kwargs)

    492         for hook in self._forward_hooks.values():

    493             hook_result = hook(self, input, result)



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in forward(self, input, hx)

    190             flat_weight=flat_weight

    191         )

--> 192         output, hidden = func(input, self.all_weights, hx, batch_sizes)

    193         if is_packed:

    194             output = PackedSequence(output, batch_sizes)



/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/rnn.py in forward(input, *fargs, **fkwargs)

    321             func = decorator(func)

    322 

--> 323         return func(input, *fargs, **fkwargs)

    324 

    325     return forward



/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/rnn.py in forward(input, weight, hx, batch_sizes)

    285             batch_first, dropout, train, bool(bidirectional),

    286             list(batch_sizes.data) if variable_length else (),

--> 287             dropout_ts)

    288 

    289         if cx is not None:



RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

cuda runtime error (77) (other tries):

RuntimeError                              Traceback (most recent call last)

<ipython-input-28-53476e08e017> in <module>()

      1 main('mnist', 'to', 'ndd', Xd=16, epo=5, bs=100, tXn=-1, vXn=300,

----> 2      lr=0.05, suf="s1", n_class=10, cuda=True)



<ipython-input-23-918584456207> in main(ds, framework, format, Xd, epo, bs, tXn, vXn, lr, suf, n_class, cuda)

     12     opt = torch.optim.SGD(net.parameters(), lr)

     13 

---> 14     train(net, opt, Xd, epo, bs, cuda, tXn, tX, tT, vX, vT,lr)

     15 



<ipython-input-26-6b574a9e8af6> in train(model, optimizer, Xd, epo, bs, cuda, Xn, tX, tT, vX, vT, lr)

      4     if cuda and torch.cuda.is_available():

      5         print("tX type (before):", tX.type())

----> 6         model.cuda()

      7         tX = tX.cuda()

      8         tT = tT.cuda()



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in cuda(self, device)

    247             Module: self

    248         """

--> 249         return self._apply(lambda t: t.cuda(device))

    250 

    251     def cpu(self):



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)

    174     def _apply(self, fn):

    175         for module in self.children():

--> 176             module._apply(fn)

    177 

    178         for param in self._parameters.values():



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in _apply(self, fn)

    109 

    110     def _apply(self, fn):

--> 111         ret = super(RNNBase, self)._apply(fn)

    112         self.flatten_parameters()

    113         return ret



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _apply(self, fn)

    180                 # Tensors stored in modules are graph leaves, and we don't

    181                 # want to create copy nodes, so we have to unpack the data.

--> 182                 param.data = fn(param.data)

    183                 if param._grad is not None:

    184                     param._grad.data = fn(param._grad.data)



/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in <lambda>(t)

    247             Module: self

    248         """

--> 249         return self._apply(lambda t: t.cuda(device))

    250 

    251     def cpu(self):



RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

gpu pytorch google-colaboratory

edited Nov 21 at 15:50

asked Nov 19 at 17:46

u2gilles

1,40211634

edited Nov 21 at 15:50

asked Nov 19 at 17:46

u2gilles

1,40211634

edited Nov 21 at 15:50

asked Nov 19 at 17:46

u2gilles

1,40211634

asked Nov 19 at 17:46

u2gilles

1,40211634

asked Nov 19 at 17:46

u2gilles

1,40211634

2

Can you share a self-contained notebook that reproduces the problem?
– Bob Smith
Nov 19 at 17:51

Bob, I updated my post. To be honest, when I took the trace this morning, it worked once !!! then failed again. Strange. The trace shows that it failed to put the model on GPU, but I also tested that it failed to put pytorch tensors.
– u2gilles
Nov 20 at 1:21

1

Can you share a complete, self-contained notebook? It will significantly simplify diagnosis.
– Bob Smith
Nov 20 at 1:27

Bob, please find a link to a standalone ipynb file to reproduce the problem if you like (without mounting google-drive) : drive.google.com/open?id=1enYkRsAuotTGsoce93XP2gAIuK6-i9ub
– u2gilles
Nov 21 at 4:25

I find the same problem on colab, but for me any attempt to convert a tensor to the gpu results in this error. After looking online, I'm still unsure what is wrong. Maybe colab gpus are being quirky?
– Superman
Nov 25 at 18:41

|
show 1 more comment

2

Can you share a self-contained notebook that reproduces the problem?
– Bob Smith
Nov 19 at 17:51

Bob, I updated my post. To be honest, when I took the trace this morning, it worked once !!! then failed again. Strange. The trace shows that it failed to put the model on GPU, but I also tested that it failed to put pytorch tensors.
– u2gilles
Nov 20 at 1:21

1

Can you share a complete, self-contained notebook? It will significantly simplify diagnosis.
– Bob Smith
Nov 20 at 1:27

Bob, please find a link to a standalone ipynb file to reproduce the problem if you like (without mounting google-drive) : drive.google.com/open?id=1enYkRsAuotTGsoce93XP2gAIuK6-i9ub
– u2gilles
Nov 21 at 4:25

I find the same problem on colab, but for me any attempt to convert a tensor to the gpu results in this error. After looking online, I'm still unsure what is wrong. Maybe colab gpus are being quirky?
– Superman
Nov 25 at 18:41

Can you share a self-contained notebook that reproduces the problem?
– Bob Smith
Nov 19 at 17:51

Bob, I updated my post. To be honest, when I took the trace this morning, it worked once !!! then failed again. Strange. The trace shows that it failed to put the model on GPU, but I also tested that it failed to put pytorch tensors.
– u2gilles
Nov 20 at 1:21

Can you share a complete, self-contained notebook? It will significantly simplify diagnosis.
– Bob Smith
Nov 20 at 1:27

Bob, please find a link to a standalone ipynb file to reproduce the problem if you like (without mounting google-drive) : drive.google.com/open?id=1enYkRsAuotTGsoce93XP2gAIuK6-i9ub
– u2gilles
Nov 21 at 4:25

I find the same problem on colab, but for me any attempt to convert a tensor to the gpu results in this error. After looking online, I'm still unsure what is wrong. Maybe colab gpus are being quirky?
– Superman
Nov 25 at 18:41

|
show 1 more comment

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

It now works with Pytorch 1.0 using:

!pip3 install https://download.pytorch.org/whl/cu80/torch-1.0.0-cp36-cp36m-linux_x86_64.whl

answered 2 days ago

u2gilles

1,40211634

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53380068%2fpytorch-on-google-colaboratory-gpu-illegal-memory-access%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

It now works with Pytorch 1.0 using:

!pip3 install https://download.pytorch.org/whl/cu80/torch-1.0.0-cp36-cp36m-linux_x86_64.whl

answered 2 days ago

u2gilles

1,40211634

add a comment |

up vote
0
down vote

accepted

It now works with Pytorch 1.0 using:

!pip3 install https://download.pytorch.org/whl/cu80/torch-1.0.0-cp36-cp36m-linux_x86_64.whl

answered 2 days ago

u2gilles

1,40211634

add a comment |

up vote
0
down vote

accepted

It now works with Pytorch 1.0 using:

!pip3 install https://download.pytorch.org/whl/cu80/torch-1.0.0-cp36-cp36m-linux_x86_64.whl

answered 2 days ago

u2gilles

1,40211634

It now works with Pytorch 1.0 using:

!pip3 install https://download.pytorch.org/whl/cu80/torch-1.0.0-cp36-cp36m-linux_x86_64.whl

answered 2 days ago

u2gilles

1,40211634

answered 2 days ago

u2gilles

1,40211634

answered 2 days ago

u2gilles

1,40211634

answered 2 days ago

u2gilles

1,40211634

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

a wGLgRPtivLZakrOiHioXAWMHp3UY0 pB9DCTm5sgSMWx9cHlVLfDTAjx8P9BkqAlqL,JfNkf6zU6b

搜尋此網誌

Argthtjtr