Is continually writing to a file detrimental to the performance of a program?












0















Imagine a parallel "high performance program" that reads in files, each process performs a task on the input data and then each process writes an output for the task to a single shared output file before repeating this procedure.



In terms of performance, is it inefficient to write the outputs to a file as each process finishes a task?



Would it be more efficient to store the results in an array and write the array to an output file at the end?










share|improve this question




















  • 1





    You could try creating a MVCE for both scenarios and do some measurements (e. g. use a performance profiler and/or a memory profiler and/or some Stopwatch instances).

    – Uwe Keim
    Nov 22 '18 at 22:18













  • Please do not add "Thank you" to questions.

    – Uwe Keim
    Nov 22 '18 at 22:18













  • The answer depends on context. It can be either way. Code it both ways and benchmark them to see.

    – Craig Estey
    Nov 22 '18 at 22:24











  • If your platform supports memory-mapped files, you don't have to choose.

    – EOF
    Nov 22 '18 at 23:28











  • I read the first paragraph and now I'm imagining 65,356 processes each taking its turn to write to the single shared output file. Even with far fewer processes in play this is going to look pretty much like a serial program and it's not a good outline design.

    – High Performance Mark
    Nov 23 '18 at 8:32
















0















Imagine a parallel "high performance program" that reads in files, each process performs a task on the input data and then each process writes an output for the task to a single shared output file before repeating this procedure.



In terms of performance, is it inefficient to write the outputs to a file as each process finishes a task?



Would it be more efficient to store the results in an array and write the array to an output file at the end?










share|improve this question




















  • 1





    You could try creating a MVCE for both scenarios and do some measurements (e. g. use a performance profiler and/or a memory profiler and/or some Stopwatch instances).

    – Uwe Keim
    Nov 22 '18 at 22:18













  • Please do not add "Thank you" to questions.

    – Uwe Keim
    Nov 22 '18 at 22:18













  • The answer depends on context. It can be either way. Code it both ways and benchmark them to see.

    – Craig Estey
    Nov 22 '18 at 22:24











  • If your platform supports memory-mapped files, you don't have to choose.

    – EOF
    Nov 22 '18 at 23:28











  • I read the first paragraph and now I'm imagining 65,356 processes each taking its turn to write to the single shared output file. Even with far fewer processes in play this is going to look pretty much like a serial program and it's not a good outline design.

    – High Performance Mark
    Nov 23 '18 at 8:32














0












0








0








Imagine a parallel "high performance program" that reads in files, each process performs a task on the input data and then each process writes an output for the task to a single shared output file before repeating this procedure.



In terms of performance, is it inefficient to write the outputs to a file as each process finishes a task?



Would it be more efficient to store the results in an array and write the array to an output file at the end?










share|improve this question
















Imagine a parallel "high performance program" that reads in files, each process performs a task on the input data and then each process writes an output for the task to a single shared output file before repeating this procedure.



In terms of performance, is it inefficient to write the outputs to a file as each process finishes a task?



Would it be more efficient to store the results in an array and write the array to an output file at the end?







c parallel-processing io hpc






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 '18 at 22:39







HCF3301

















asked Nov 22 '18 at 22:17









HCF3301HCF3301

616




616








  • 1





    You could try creating a MVCE for both scenarios and do some measurements (e. g. use a performance profiler and/or a memory profiler and/or some Stopwatch instances).

    – Uwe Keim
    Nov 22 '18 at 22:18













  • Please do not add "Thank you" to questions.

    – Uwe Keim
    Nov 22 '18 at 22:18













  • The answer depends on context. It can be either way. Code it both ways and benchmark them to see.

    – Craig Estey
    Nov 22 '18 at 22:24











  • If your platform supports memory-mapped files, you don't have to choose.

    – EOF
    Nov 22 '18 at 23:28











  • I read the first paragraph and now I'm imagining 65,356 processes each taking its turn to write to the single shared output file. Even with far fewer processes in play this is going to look pretty much like a serial program and it's not a good outline design.

    – High Performance Mark
    Nov 23 '18 at 8:32














  • 1





    You could try creating a MVCE for both scenarios and do some measurements (e. g. use a performance profiler and/or a memory profiler and/or some Stopwatch instances).

    – Uwe Keim
    Nov 22 '18 at 22:18













  • Please do not add "Thank you" to questions.

    – Uwe Keim
    Nov 22 '18 at 22:18













  • The answer depends on context. It can be either way. Code it both ways and benchmark them to see.

    – Craig Estey
    Nov 22 '18 at 22:24











  • If your platform supports memory-mapped files, you don't have to choose.

    – EOF
    Nov 22 '18 at 23:28











  • I read the first paragraph and now I'm imagining 65,356 processes each taking its turn to write to the single shared output file. Even with far fewer processes in play this is going to look pretty much like a serial program and it's not a good outline design.

    – High Performance Mark
    Nov 23 '18 at 8:32








1




1





You could try creating a MVCE for both scenarios and do some measurements (e. g. use a performance profiler and/or a memory profiler and/or some Stopwatch instances).

– Uwe Keim
Nov 22 '18 at 22:18







You could try creating a MVCE for both scenarios and do some measurements (e. g. use a performance profiler and/or a memory profiler and/or some Stopwatch instances).

– Uwe Keim
Nov 22 '18 at 22:18















Please do not add "Thank you" to questions.

– Uwe Keim
Nov 22 '18 at 22:18







Please do not add "Thank you" to questions.

– Uwe Keim
Nov 22 '18 at 22:18















The answer depends on context. It can be either way. Code it both ways and benchmark them to see.

– Craig Estey
Nov 22 '18 at 22:24





The answer depends on context. It can be either way. Code it both ways and benchmark them to see.

– Craig Estey
Nov 22 '18 at 22:24













If your platform supports memory-mapped files, you don't have to choose.

– EOF
Nov 22 '18 at 23:28





If your platform supports memory-mapped files, you don't have to choose.

– EOF
Nov 22 '18 at 23:28













I read the first paragraph and now I'm imagining 65,356 processes each taking its turn to write to the single shared output file. Even with far fewer processes in play this is going to look pretty much like a serial program and it's not a good outline design.

– High Performance Mark
Nov 23 '18 at 8:32





I read the first paragraph and now I'm imagining 65,356 processes each taking its turn to write to the single shared output file. Even with far fewer processes in play this is going to look pretty much like a serial program and it's not a good outline design.

– High Performance Mark
Nov 23 '18 at 8:32












2 Answers
2






active

oldest

votes


















1














This is problem where full IO disk read write has to be exploited without delays from client processes or threads. If Std C library calls are used it uses memory buffer that gets flushed at newline or fflush() call is made. If the data is not big enough using array is efficient that can be written to file at the end so performance demanding task will not suffer IO delays.






share|improve this answer































    -2














    Files are usually slower than RAM. However, how much slower? If it's less than 1% slowdown, most people would not care. If it's 50% slowdown, some people would still not care.



    As always with performance, measure it one way and the other, and then decide whether the difference is significant. The decision will usually depend on factors which are highly application-specific.






    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53438574%2fis-continually-writing-to-a-file-detrimental-to-the-performance-of-a-program%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      This is problem where full IO disk read write has to be exploited without delays from client processes or threads. If Std C library calls are used it uses memory buffer that gets flushed at newline or fflush() call is made. If the data is not big enough using array is efficient that can be written to file at the end so performance demanding task will not suffer IO delays.






      share|improve this answer




























        1














        This is problem where full IO disk read write has to be exploited without delays from client processes or threads. If Std C library calls are used it uses memory buffer that gets flushed at newline or fflush() call is made. If the data is not big enough using array is efficient that can be written to file at the end so performance demanding task will not suffer IO delays.






        share|improve this answer


























          1












          1








          1







          This is problem where full IO disk read write has to be exploited without delays from client processes or threads. If Std C library calls are used it uses memory buffer that gets flushed at newline or fflush() call is made. If the data is not big enough using array is efficient that can be written to file at the end so performance demanding task will not suffer IO delays.






          share|improve this answer













          This is problem where full IO disk read write has to be exploited without delays from client processes or threads. If Std C library calls are used it uses memory buffer that gets flushed at newline or fflush() call is made. If the data is not big enough using array is efficient that can be written to file at the end so performance demanding task will not suffer IO delays.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 23 '18 at 2:43









          anandanand

          1377




          1377

























              -2














              Files are usually slower than RAM. However, how much slower? If it's less than 1% slowdown, most people would not care. If it's 50% slowdown, some people would still not care.



              As always with performance, measure it one way and the other, and then decide whether the difference is significant. The decision will usually depend on factors which are highly application-specific.






              share|improve this answer




























                -2














                Files are usually slower than RAM. However, how much slower? If it's less than 1% slowdown, most people would not care. If it's 50% slowdown, some people would still not care.



                As always with performance, measure it one way and the other, and then decide whether the difference is significant. The decision will usually depend on factors which are highly application-specific.






                share|improve this answer


























                  -2












                  -2








                  -2







                  Files are usually slower than RAM. However, how much slower? If it's less than 1% slowdown, most people would not care. If it's 50% slowdown, some people would still not care.



                  As always with performance, measure it one way and the other, and then decide whether the difference is significant. The decision will usually depend on factors which are highly application-specific.






                  share|improve this answer













                  Files are usually slower than RAM. However, how much slower? If it's less than 1% slowdown, most people would not care. If it's 50% slowdown, some people would still not care.



                  As always with performance, measure it one way and the other, and then decide whether the difference is significant. The decision will usually depend on factors which are highly application-specific.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 22 '18 at 22:22









                  anatolyganatolyg

                  17k44693




                  17k44693






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53438574%2fis-continually-writing-to-a-file-detrimental-to-the-performance-of-a-program%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                      Alcedinidae

                      Origin of the phrase “under your belt”?