Testing Accuracy is less than half of Trainings Accuracy, and manual testing result is way off











up vote
1
down vote

favorite












Firstly I have to admit that I am total beginner of PyTorch and CNN image classification.



I am making an app to classify cat breeds.



The image sets I gathered have around 300-500 per breed, with a total of 62 breeds, plus one other set which represents non-cats which contains 600 samples. I have split the samples into training and testing respectively at 4:1 ratio.



The training results are quite disappointing. The training accuracy can reach as much as 90% but the testing results are only 39%.



Here are the hyperparameters:



LR is 0.1, momentum is 0.1, and batch_size is 128, wideresnet is using 40 layers of widen factor of 10.



Please see the source code at:



https://github.com/silver-xu/wideresnet-trial



I have searched all over the internet and pretty much 90% of the articles are about pre-compiled datasets like cifar or MNIST. As a result lots of the codes which I found only is optimised for one type of dataset only.



Thanks for all the help! Critics are also Welcome!



Here is training output for Epoch 5:



Epoch: [5][0/170]       Time 0.237 (0.237)      Loss 3.3054 (3.3054)    Prec@1 13.281 (13.281)
Epoch: [5][10/170] Time 0.229 (0.228) Loss 3.2665 (3.3118) Prec@1 14.844 (13.920)
Epoch: [5][20/170] Time 0.227 (0.227) Loss 3.0962 (3.2856) Prec@1 17.969 (14.695)
Epoch: [5][30/170] Time 0.228 (0.227) Loss 3.3670 (3.2853) Prec@1 10.938 (14.844)
Epoch: [5][40/170] Time 0.229 (0.227) Loss 3.3259 (3.2917) Prec@1 15.625 (15.282)
Epoch: [5][50/170] Time 0.228 (0.227) Loss 3.2016 (3.2931) Prec@1 14.844 (14.859)
Epoch: [5][60/170] Time 0.227 (0.227) Loss 3.3739 (3.3071) Prec@1 11.719 (14.677)
Epoch: [5][70/170] Time 0.227 (0.227) Loss 3.4417 (3.3042) Prec@1 15.625 (14.833)
Epoch: [5][80/170] Time 0.226 (0.227) Loss 3.2507 (3.2996) Prec@1 10.938 (14.911)
Epoch: [5][90/170] Time 0.224 (0.227) Loss 3.2627 (3.2978) Prec@1 14.844 (15.093)
Epoch: [5][100/170] Time 0.226 (0.227) Loss 3.3668 (3.2946) Prec@1 14.062 (15.060)
Epoch: [5][110/170] Time 0.225 (0.227) Loss 3.2839 (3.2915) Prec@1 10.156 (14.921)
Epoch: [5][120/170] Time 0.227 (0.227) Loss 3.3308 (3.2906) Prec@1 11.719 (14.837)
Epoch: [5][130/170] Time 0.224 (0.227) Loss 3.1656 (3.2885) Prec@1 21.875 (14.909)
Epoch: [5][140/170] Time 0.226 (0.227) Loss 3.2521 (3.2851) Prec@1 20.312 (14.966)
Epoch: [5][150/170] Time 0.227 (0.227) Loss 3.1261 (3.2825) Prec@1 14.844 (14.989)
Epoch: [5][160/170] Time 0.227 (0.227) Loss 3.4400 (3.2802) Prec@1 10.938 (15.018)
Test: [0/43] Time 0.262 (0.262) Loss 3.6978 (3.6978) Prec@1 8.594 (8.594)
Test: [10/43] Time 0.074 (0.091) Loss 3.3584 (3.3736) Prec@1 17.188 (13.139)
Test: [20/43] Time 0.074 (0.083) Loss 3.3834 (3.4058) Prec@1 12.500 (12.537)
Test: [30/43] Time 0.074 (0.080) Loss 3.4457 (3.3994) Prec@1 14.844 (12.802)
Test: [40/43] Time 0.074 (0.079) Loss 3.2851 (3.3946) Prec@1 16.406 (13.281)
* Prec@1 13.130









share|improve this question
























  • Have you trained with a validation set? How does the validation loss compare to the training loss?
    – Charles Landau
    Nov 19 at 0:04










  • Great! It sounds like you may be overfitting and I would look for early divergence in validation loss (e.g. in epoch < 5) as another indicator of overfitting.
    – Charles Landau
    Nov 19 at 0:21










  • Here you go. Greatly appreciated for your help.
    – Silver Xu
    Nov 19 at 0:33






  • 1




    You don't appear to be training with a validation set, see this discussion for some basic approaches to train, validation, test patterns in pytorch github.com/pytorch/pytorch/issues/1106
    – Charles Landau
    Nov 19 at 0:57










  • @Charles Landau I've done what you commented above. The result improved to around 60%. Is there anything I can improve further?
    – Silver Xu
    Nov 19 at 10:52















up vote
1
down vote

favorite












Firstly I have to admit that I am total beginner of PyTorch and CNN image classification.



I am making an app to classify cat breeds.



The image sets I gathered have around 300-500 per breed, with a total of 62 breeds, plus one other set which represents non-cats which contains 600 samples. I have split the samples into training and testing respectively at 4:1 ratio.



The training results are quite disappointing. The training accuracy can reach as much as 90% but the testing results are only 39%.



Here are the hyperparameters:



LR is 0.1, momentum is 0.1, and batch_size is 128, wideresnet is using 40 layers of widen factor of 10.



Please see the source code at:



https://github.com/silver-xu/wideresnet-trial



I have searched all over the internet and pretty much 90% of the articles are about pre-compiled datasets like cifar or MNIST. As a result lots of the codes which I found only is optimised for one type of dataset only.



Thanks for all the help! Critics are also Welcome!



Here is training output for Epoch 5:



Epoch: [5][0/170]       Time 0.237 (0.237)      Loss 3.3054 (3.3054)    Prec@1 13.281 (13.281)
Epoch: [5][10/170] Time 0.229 (0.228) Loss 3.2665 (3.3118) Prec@1 14.844 (13.920)
Epoch: [5][20/170] Time 0.227 (0.227) Loss 3.0962 (3.2856) Prec@1 17.969 (14.695)
Epoch: [5][30/170] Time 0.228 (0.227) Loss 3.3670 (3.2853) Prec@1 10.938 (14.844)
Epoch: [5][40/170] Time 0.229 (0.227) Loss 3.3259 (3.2917) Prec@1 15.625 (15.282)
Epoch: [5][50/170] Time 0.228 (0.227) Loss 3.2016 (3.2931) Prec@1 14.844 (14.859)
Epoch: [5][60/170] Time 0.227 (0.227) Loss 3.3739 (3.3071) Prec@1 11.719 (14.677)
Epoch: [5][70/170] Time 0.227 (0.227) Loss 3.4417 (3.3042) Prec@1 15.625 (14.833)
Epoch: [5][80/170] Time 0.226 (0.227) Loss 3.2507 (3.2996) Prec@1 10.938 (14.911)
Epoch: [5][90/170] Time 0.224 (0.227) Loss 3.2627 (3.2978) Prec@1 14.844 (15.093)
Epoch: [5][100/170] Time 0.226 (0.227) Loss 3.3668 (3.2946) Prec@1 14.062 (15.060)
Epoch: [5][110/170] Time 0.225 (0.227) Loss 3.2839 (3.2915) Prec@1 10.156 (14.921)
Epoch: [5][120/170] Time 0.227 (0.227) Loss 3.3308 (3.2906) Prec@1 11.719 (14.837)
Epoch: [5][130/170] Time 0.224 (0.227) Loss 3.1656 (3.2885) Prec@1 21.875 (14.909)
Epoch: [5][140/170] Time 0.226 (0.227) Loss 3.2521 (3.2851) Prec@1 20.312 (14.966)
Epoch: [5][150/170] Time 0.227 (0.227) Loss 3.1261 (3.2825) Prec@1 14.844 (14.989)
Epoch: [5][160/170] Time 0.227 (0.227) Loss 3.4400 (3.2802) Prec@1 10.938 (15.018)
Test: [0/43] Time 0.262 (0.262) Loss 3.6978 (3.6978) Prec@1 8.594 (8.594)
Test: [10/43] Time 0.074 (0.091) Loss 3.3584 (3.3736) Prec@1 17.188 (13.139)
Test: [20/43] Time 0.074 (0.083) Loss 3.3834 (3.4058) Prec@1 12.500 (12.537)
Test: [30/43] Time 0.074 (0.080) Loss 3.4457 (3.3994) Prec@1 14.844 (12.802)
Test: [40/43] Time 0.074 (0.079) Loss 3.2851 (3.3946) Prec@1 16.406 (13.281)
* Prec@1 13.130









share|improve this question
























  • Have you trained with a validation set? How does the validation loss compare to the training loss?
    – Charles Landau
    Nov 19 at 0:04










  • Great! It sounds like you may be overfitting and I would look for early divergence in validation loss (e.g. in epoch < 5) as another indicator of overfitting.
    – Charles Landau
    Nov 19 at 0:21










  • Here you go. Greatly appreciated for your help.
    – Silver Xu
    Nov 19 at 0:33






  • 1




    You don't appear to be training with a validation set, see this discussion for some basic approaches to train, validation, test patterns in pytorch github.com/pytorch/pytorch/issues/1106
    – Charles Landau
    Nov 19 at 0:57










  • @Charles Landau I've done what you commented above. The result improved to around 60%. Is there anything I can improve further?
    – Silver Xu
    Nov 19 at 10:52













up vote
1
down vote

favorite









up vote
1
down vote

favorite











Firstly I have to admit that I am total beginner of PyTorch and CNN image classification.



I am making an app to classify cat breeds.



The image sets I gathered have around 300-500 per breed, with a total of 62 breeds, plus one other set which represents non-cats which contains 600 samples. I have split the samples into training and testing respectively at 4:1 ratio.



The training results are quite disappointing. The training accuracy can reach as much as 90% but the testing results are only 39%.



Here are the hyperparameters:



LR is 0.1, momentum is 0.1, and batch_size is 128, wideresnet is using 40 layers of widen factor of 10.



Please see the source code at:



https://github.com/silver-xu/wideresnet-trial



I have searched all over the internet and pretty much 90% of the articles are about pre-compiled datasets like cifar or MNIST. As a result lots of the codes which I found only is optimised for one type of dataset only.



Thanks for all the help! Critics are also Welcome!



Here is training output for Epoch 5:



Epoch: [5][0/170]       Time 0.237 (0.237)      Loss 3.3054 (3.3054)    Prec@1 13.281 (13.281)
Epoch: [5][10/170] Time 0.229 (0.228) Loss 3.2665 (3.3118) Prec@1 14.844 (13.920)
Epoch: [5][20/170] Time 0.227 (0.227) Loss 3.0962 (3.2856) Prec@1 17.969 (14.695)
Epoch: [5][30/170] Time 0.228 (0.227) Loss 3.3670 (3.2853) Prec@1 10.938 (14.844)
Epoch: [5][40/170] Time 0.229 (0.227) Loss 3.3259 (3.2917) Prec@1 15.625 (15.282)
Epoch: [5][50/170] Time 0.228 (0.227) Loss 3.2016 (3.2931) Prec@1 14.844 (14.859)
Epoch: [5][60/170] Time 0.227 (0.227) Loss 3.3739 (3.3071) Prec@1 11.719 (14.677)
Epoch: [5][70/170] Time 0.227 (0.227) Loss 3.4417 (3.3042) Prec@1 15.625 (14.833)
Epoch: [5][80/170] Time 0.226 (0.227) Loss 3.2507 (3.2996) Prec@1 10.938 (14.911)
Epoch: [5][90/170] Time 0.224 (0.227) Loss 3.2627 (3.2978) Prec@1 14.844 (15.093)
Epoch: [5][100/170] Time 0.226 (0.227) Loss 3.3668 (3.2946) Prec@1 14.062 (15.060)
Epoch: [5][110/170] Time 0.225 (0.227) Loss 3.2839 (3.2915) Prec@1 10.156 (14.921)
Epoch: [5][120/170] Time 0.227 (0.227) Loss 3.3308 (3.2906) Prec@1 11.719 (14.837)
Epoch: [5][130/170] Time 0.224 (0.227) Loss 3.1656 (3.2885) Prec@1 21.875 (14.909)
Epoch: [5][140/170] Time 0.226 (0.227) Loss 3.2521 (3.2851) Prec@1 20.312 (14.966)
Epoch: [5][150/170] Time 0.227 (0.227) Loss 3.1261 (3.2825) Prec@1 14.844 (14.989)
Epoch: [5][160/170] Time 0.227 (0.227) Loss 3.4400 (3.2802) Prec@1 10.938 (15.018)
Test: [0/43] Time 0.262 (0.262) Loss 3.6978 (3.6978) Prec@1 8.594 (8.594)
Test: [10/43] Time 0.074 (0.091) Loss 3.3584 (3.3736) Prec@1 17.188 (13.139)
Test: [20/43] Time 0.074 (0.083) Loss 3.3834 (3.4058) Prec@1 12.500 (12.537)
Test: [30/43] Time 0.074 (0.080) Loss 3.4457 (3.3994) Prec@1 14.844 (12.802)
Test: [40/43] Time 0.074 (0.079) Loss 3.2851 (3.3946) Prec@1 16.406 (13.281)
* Prec@1 13.130









share|improve this question















Firstly I have to admit that I am total beginner of PyTorch and CNN image classification.



I am making an app to classify cat breeds.



The image sets I gathered have around 300-500 per breed, with a total of 62 breeds, plus one other set which represents non-cats which contains 600 samples. I have split the samples into training and testing respectively at 4:1 ratio.



The training results are quite disappointing. The training accuracy can reach as much as 90% but the testing results are only 39%.



Here are the hyperparameters:



LR is 0.1, momentum is 0.1, and batch_size is 128, wideresnet is using 40 layers of widen factor of 10.



Please see the source code at:



https://github.com/silver-xu/wideresnet-trial



I have searched all over the internet and pretty much 90% of the articles are about pre-compiled datasets like cifar or MNIST. As a result lots of the codes which I found only is optimised for one type of dataset only.



Thanks for all the help! Critics are also Welcome!



Here is training output for Epoch 5:



Epoch: [5][0/170]       Time 0.237 (0.237)      Loss 3.3054 (3.3054)    Prec@1 13.281 (13.281)
Epoch: [5][10/170] Time 0.229 (0.228) Loss 3.2665 (3.3118) Prec@1 14.844 (13.920)
Epoch: [5][20/170] Time 0.227 (0.227) Loss 3.0962 (3.2856) Prec@1 17.969 (14.695)
Epoch: [5][30/170] Time 0.228 (0.227) Loss 3.3670 (3.2853) Prec@1 10.938 (14.844)
Epoch: [5][40/170] Time 0.229 (0.227) Loss 3.3259 (3.2917) Prec@1 15.625 (15.282)
Epoch: [5][50/170] Time 0.228 (0.227) Loss 3.2016 (3.2931) Prec@1 14.844 (14.859)
Epoch: [5][60/170] Time 0.227 (0.227) Loss 3.3739 (3.3071) Prec@1 11.719 (14.677)
Epoch: [5][70/170] Time 0.227 (0.227) Loss 3.4417 (3.3042) Prec@1 15.625 (14.833)
Epoch: [5][80/170] Time 0.226 (0.227) Loss 3.2507 (3.2996) Prec@1 10.938 (14.911)
Epoch: [5][90/170] Time 0.224 (0.227) Loss 3.2627 (3.2978) Prec@1 14.844 (15.093)
Epoch: [5][100/170] Time 0.226 (0.227) Loss 3.3668 (3.2946) Prec@1 14.062 (15.060)
Epoch: [5][110/170] Time 0.225 (0.227) Loss 3.2839 (3.2915) Prec@1 10.156 (14.921)
Epoch: [5][120/170] Time 0.227 (0.227) Loss 3.3308 (3.2906) Prec@1 11.719 (14.837)
Epoch: [5][130/170] Time 0.224 (0.227) Loss 3.1656 (3.2885) Prec@1 21.875 (14.909)
Epoch: [5][140/170] Time 0.226 (0.227) Loss 3.2521 (3.2851) Prec@1 20.312 (14.966)
Epoch: [5][150/170] Time 0.227 (0.227) Loss 3.1261 (3.2825) Prec@1 14.844 (14.989)
Epoch: [5][160/170] Time 0.227 (0.227) Loss 3.4400 (3.2802) Prec@1 10.938 (15.018)
Test: [0/43] Time 0.262 (0.262) Loss 3.6978 (3.6978) Prec@1 8.594 (8.594)
Test: [10/43] Time 0.074 (0.091) Loss 3.3584 (3.3736) Prec@1 17.188 (13.139)
Test: [20/43] Time 0.074 (0.083) Loss 3.3834 (3.4058) Prec@1 12.500 (12.537)
Test: [30/43] Time 0.074 (0.080) Loss 3.4457 (3.3994) Prec@1 14.844 (12.802)
Test: [40/43] Time 0.074 (0.079) Loss 3.2851 (3.3946) Prec@1 16.406 (13.281)
* Prec@1 13.130






python pytorch






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 at 0:56

























asked Nov 18 at 23:24









Silver Xu

62




62












  • Have you trained with a validation set? How does the validation loss compare to the training loss?
    – Charles Landau
    Nov 19 at 0:04










  • Great! It sounds like you may be overfitting and I would look for early divergence in validation loss (e.g. in epoch < 5) as another indicator of overfitting.
    – Charles Landau
    Nov 19 at 0:21










  • Here you go. Greatly appreciated for your help.
    – Silver Xu
    Nov 19 at 0:33






  • 1




    You don't appear to be training with a validation set, see this discussion for some basic approaches to train, validation, test patterns in pytorch github.com/pytorch/pytorch/issues/1106
    – Charles Landau
    Nov 19 at 0:57










  • @Charles Landau I've done what you commented above. The result improved to around 60%. Is there anything I can improve further?
    – Silver Xu
    Nov 19 at 10:52


















  • Have you trained with a validation set? How does the validation loss compare to the training loss?
    – Charles Landau
    Nov 19 at 0:04










  • Great! It sounds like you may be overfitting and I would look for early divergence in validation loss (e.g. in epoch < 5) as another indicator of overfitting.
    – Charles Landau
    Nov 19 at 0:21










  • Here you go. Greatly appreciated for your help.
    – Silver Xu
    Nov 19 at 0:33






  • 1




    You don't appear to be training with a validation set, see this discussion for some basic approaches to train, validation, test patterns in pytorch github.com/pytorch/pytorch/issues/1106
    – Charles Landau
    Nov 19 at 0:57










  • @Charles Landau I've done what you commented above. The result improved to around 60%. Is there anything I can improve further?
    – Silver Xu
    Nov 19 at 10:52
















Have you trained with a validation set? How does the validation loss compare to the training loss?
– Charles Landau
Nov 19 at 0:04




Have you trained with a validation set? How does the validation loss compare to the training loss?
– Charles Landau
Nov 19 at 0:04












Great! It sounds like you may be overfitting and I would look for early divergence in validation loss (e.g. in epoch < 5) as another indicator of overfitting.
– Charles Landau
Nov 19 at 0:21




Great! It sounds like you may be overfitting and I would look for early divergence in validation loss (e.g. in epoch < 5) as another indicator of overfitting.
– Charles Landau
Nov 19 at 0:21












Here you go. Greatly appreciated for your help.
– Silver Xu
Nov 19 at 0:33




Here you go. Greatly appreciated for your help.
– Silver Xu
Nov 19 at 0:33




1




1




You don't appear to be training with a validation set, see this discussion for some basic approaches to train, validation, test patterns in pytorch github.com/pytorch/pytorch/issues/1106
– Charles Landau
Nov 19 at 0:57




You don't appear to be training with a validation set, see this discussion for some basic approaches to train, validation, test patterns in pytorch github.com/pytorch/pytorch/issues/1106
– Charles Landau
Nov 19 at 0:57












@Charles Landau I've done what you commented above. The result improved to around 60%. Is there anything I can improve further?
– Silver Xu
Nov 19 at 10:52




@Charles Landau I've done what you commented above. The result improved to around 60%. Is there anything I can improve further?
– Silver Xu
Nov 19 at 10:52

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53366458%2ftesting-accuracy-is-less-than-half-of-trainings-accuracy-and-manual-testing-res%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53366458%2ftesting-accuracy-is-less-than-half-of-trainings-accuracy-and-manual-testing-res%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

Origin of the phrase “under your belt”?