openmpi runtime error: Hello World run on hosts












1














I'm trying to setup a cluster. Up to now I'm testing it only with 1 master and 1 slave. Running the script from the master it starts printing the HelloWorld, but then I get the following error:



Primary job  terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.


it keeps printing HelloWorld and after a while:



mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: 
Process name: [[62648,1],2]
Exit code: 2


Then the code stops. By chance I tried to run the script from the slave and it works. I can't figure out why.
I've set passwordless SSH and running a file located in a nfs-mounted folder.
Can you help me?



Thanks










share|improve this question






















  • please post your code !
    – Gilles Gouaillardet
    Nov 20 at 0:00










  • It is a simple HelloWorld in Python: while True: print('HelloWorld') Then I do: mpirun -np 4 -hostfile myhosts python3 helloworld.py Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.
    – Fabio Semeraro
    Nov 20 at 0:04












  • can you simply python3 helloworld.py on all your nodes ?
    – Gilles Gouaillardet
    Nov 20 at 0:09










  • In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.
    – Fabio Semeraro
    Nov 20 at 0:18










  • this program basically overflows stdout, so I am not sure of what you expect. what if you mpirun ... hostname ? if it works, then I suggest you try a mpi4py helloworld.
    – Gilles Gouaillardet
    Nov 20 at 0:24
















1














I'm trying to setup a cluster. Up to now I'm testing it only with 1 master and 1 slave. Running the script from the master it starts printing the HelloWorld, but then I get the following error:



Primary job  terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.


it keeps printing HelloWorld and after a while:



mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: 
Process name: [[62648,1],2]
Exit code: 2


Then the code stops. By chance I tried to run the script from the slave and it works. I can't figure out why.
I've set passwordless SSH and running a file located in a nfs-mounted folder.
Can you help me?



Thanks










share|improve this question






















  • please post your code !
    – Gilles Gouaillardet
    Nov 20 at 0:00










  • It is a simple HelloWorld in Python: while True: print('HelloWorld') Then I do: mpirun -np 4 -hostfile myhosts python3 helloworld.py Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.
    – Fabio Semeraro
    Nov 20 at 0:04












  • can you simply python3 helloworld.py on all your nodes ?
    – Gilles Gouaillardet
    Nov 20 at 0:09










  • In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.
    – Fabio Semeraro
    Nov 20 at 0:18










  • this program basically overflows stdout, so I am not sure of what you expect. what if you mpirun ... hostname ? if it works, then I suggest you try a mpi4py helloworld.
    – Gilles Gouaillardet
    Nov 20 at 0:24














1












1








1







I'm trying to setup a cluster. Up to now I'm testing it only with 1 master and 1 slave. Running the script from the master it starts printing the HelloWorld, but then I get the following error:



Primary job  terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.


it keeps printing HelloWorld and after a while:



mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: 
Process name: [[62648,1],2]
Exit code: 2


Then the code stops. By chance I tried to run the script from the slave and it works. I can't figure out why.
I've set passwordless SSH and running a file located in a nfs-mounted folder.
Can you help me?



Thanks










share|improve this question













I'm trying to setup a cluster. Up to now I'm testing it only with 1 master and 1 slave. Running the script from the master it starts printing the HelloWorld, but then I get the following error:



Primary job  terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.


it keeps printing HelloWorld and after a while:



mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: 
Process name: [[62648,1],2]
Exit code: 2


Then the code stops. By chance I tried to run the script from the slave and it works. I can't figure out why.
I've set passwordless SSH and running a file located in a nfs-mounted folder.
Can you help me?



Thanks







parallel-processing cluster-computing openmpi






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 19 at 22:01









Fabio Semeraro

62




62












  • please post your code !
    – Gilles Gouaillardet
    Nov 20 at 0:00










  • It is a simple HelloWorld in Python: while True: print('HelloWorld') Then I do: mpirun -np 4 -hostfile myhosts python3 helloworld.py Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.
    – Fabio Semeraro
    Nov 20 at 0:04












  • can you simply python3 helloworld.py on all your nodes ?
    – Gilles Gouaillardet
    Nov 20 at 0:09










  • In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.
    – Fabio Semeraro
    Nov 20 at 0:18










  • this program basically overflows stdout, so I am not sure of what you expect. what if you mpirun ... hostname ? if it works, then I suggest you try a mpi4py helloworld.
    – Gilles Gouaillardet
    Nov 20 at 0:24


















  • please post your code !
    – Gilles Gouaillardet
    Nov 20 at 0:00










  • It is a simple HelloWorld in Python: while True: print('HelloWorld') Then I do: mpirun -np 4 -hostfile myhosts python3 helloworld.py Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.
    – Fabio Semeraro
    Nov 20 at 0:04












  • can you simply python3 helloworld.py on all your nodes ?
    – Gilles Gouaillardet
    Nov 20 at 0:09










  • In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.
    – Fabio Semeraro
    Nov 20 at 0:18










  • this program basically overflows stdout, so I am not sure of what you expect. what if you mpirun ... hostname ? if it works, then I suggest you try a mpi4py helloworld.
    – Gilles Gouaillardet
    Nov 20 at 0:24
















please post your code !
– Gilles Gouaillardet
Nov 20 at 0:00




please post your code !
– Gilles Gouaillardet
Nov 20 at 0:00












It is a simple HelloWorld in Python: while True: print('HelloWorld') Then I do: mpirun -np 4 -hostfile myhosts python3 helloworld.py Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.
– Fabio Semeraro
Nov 20 at 0:04






It is a simple HelloWorld in Python: while True: print('HelloWorld') Then I do: mpirun -np 4 -hostfile myhosts python3 helloworld.py Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.
– Fabio Semeraro
Nov 20 at 0:04














can you simply python3 helloworld.py on all your nodes ?
– Gilles Gouaillardet
Nov 20 at 0:09




can you simply python3 helloworld.py on all your nodes ?
– Gilles Gouaillardet
Nov 20 at 0:09












In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.
– Fabio Semeraro
Nov 20 at 0:18




In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.
– Fabio Semeraro
Nov 20 at 0:18












this program basically overflows stdout, so I am not sure of what you expect. what if you mpirun ... hostname ? if it works, then I suggest you try a mpi4py helloworld.
– Gilles Gouaillardet
Nov 20 at 0:24




this program basically overflows stdout, so I am not sure of what you expect. what if you mpirun ... hostname ? if it works, then I suggest you try a mpi4py helloworld.
– Gilles Gouaillardet
Nov 20 at 0:24












1 Answer
1






active

oldest

votes


















0














SOLVED: I've parsed all configurations files I've modified and finally there was a mistake in /etc/hosts. This is about the program working if launched from the node to the master and not viceversa. Regarding the program stopping, it is somehow related to the node not able to find the file to run. Fixed this setting up again the nfs.
Thanks for your help, hope this can be useful to other users.






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383262%2fopenmpi-runtime-error-hello-world-run-on-hosts%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    SOLVED: I've parsed all configurations files I've modified and finally there was a mistake in /etc/hosts. This is about the program working if launched from the node to the master and not viceversa. Regarding the program stopping, it is somehow related to the node not able to find the file to run. Fixed this setting up again the nfs.
    Thanks for your help, hope this can be useful to other users.






    share|improve this answer


























      0














      SOLVED: I've parsed all configurations files I've modified and finally there was a mistake in /etc/hosts. This is about the program working if launched from the node to the master and not viceversa. Regarding the program stopping, it is somehow related to the node not able to find the file to run. Fixed this setting up again the nfs.
      Thanks for your help, hope this can be useful to other users.






      share|improve this answer
























        0












        0








        0






        SOLVED: I've parsed all configurations files I've modified and finally there was a mistake in /etc/hosts. This is about the program working if launched from the node to the master and not viceversa. Regarding the program stopping, it is somehow related to the node not able to find the file to run. Fixed this setting up again the nfs.
        Thanks for your help, hope this can be useful to other users.






        share|improve this answer












        SOLVED: I've parsed all configurations files I've modified and finally there was a mistake in /etc/hosts. This is about the program working if launched from the node to the master and not viceversa. Regarding the program stopping, it is somehow related to the node not able to find the file to run. Fixed this setting up again the nfs.
        Thanks for your help, hope this can be useful to other users.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 16 at 20:17









        Fabio Semeraro

        62




        62






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383262%2fopenmpi-runtime-error-hello-world-run-on-hosts%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

            Alcedinidae

            Origin of the phrase “under your belt”?