How can I handle big data best when having a Scala mathematical simulation with data visualization in R?












0














I have a mathematical simulation written in scala (random numbers, small calculations, lots of going through collections and producing a lot of data). Currently I produce some csv files as output. Then I load them into R and plot the information. But csv is probably not the best option for sharing big data. My problem is that I don't know how to improve my current approach.



Shall I use a database? Which one? MariaDB?



Shall I calculate the data that shall be plotted in scala while my simulation is running? Without calculating plotting data my program needs 20s for 500000 simulation steps. With calculations it needs more than 3min. But I could use Threads for the calculations. Or shall I give R the pure data and do the calculations on this data in R?



Shall I use Hadoop and Spark? Together with a database?



I am quite confused and hope you have some best practices for me.










share|improve this question


















  • 3




    Hi @Temerita, Even if I believe this is a great question to be discussed, it may not be appropriated for StackOverflow for being "too board" and "Mainly opinion based". - I would suggest you to search for articles about it and made little Proof Of Concepts of what you find.
    – Luis Miguel Mejía Suárez
    Nov 20 '18 at 15:29










  • You should also only adress one question.
    – Christoph
    Nov 20 '18 at 15:37






  • 2




    I agree! Great q's @Temerita. There's a Data Science SE datascience.stackexchange.com that's similar to here but doesn't have the code+data narrow focus we have here. I'd suggest posting over there (your login works there too) and give it a go!
    – hrbrmstr
    Nov 20 '18 at 15:37










  • @LuisMiguelMejíaSuárez You're right it is mainly opinion based but I don't search for the overall best solution but only for one. For me it is enough when one person thinks solution x is the best one. I just need a decision not made by me. I have too less experience for making any decision on this topic by myself.
    – Temerita
    Nov 20 '18 at 15:39










  • @hrbrmstr thanks for the advice, I'll give it a try :) .
    – Temerita
    Nov 20 '18 at 15:40
















0














I have a mathematical simulation written in scala (random numbers, small calculations, lots of going through collections and producing a lot of data). Currently I produce some csv files as output. Then I load them into R and plot the information. But csv is probably not the best option for sharing big data. My problem is that I don't know how to improve my current approach.



Shall I use a database? Which one? MariaDB?



Shall I calculate the data that shall be plotted in scala while my simulation is running? Without calculating plotting data my program needs 20s for 500000 simulation steps. With calculations it needs more than 3min. But I could use Threads for the calculations. Or shall I give R the pure data and do the calculations on this data in R?



Shall I use Hadoop and Spark? Together with a database?



I am quite confused and hope you have some best practices for me.










share|improve this question


















  • 3




    Hi @Temerita, Even if I believe this is a great question to be discussed, it may not be appropriated for StackOverflow for being "too board" and "Mainly opinion based". - I would suggest you to search for articles about it and made little Proof Of Concepts of what you find.
    – Luis Miguel Mejía Suárez
    Nov 20 '18 at 15:29










  • You should also only adress one question.
    – Christoph
    Nov 20 '18 at 15:37






  • 2




    I agree! Great q's @Temerita. There's a Data Science SE datascience.stackexchange.com that's similar to here but doesn't have the code+data narrow focus we have here. I'd suggest posting over there (your login works there too) and give it a go!
    – hrbrmstr
    Nov 20 '18 at 15:37










  • @LuisMiguelMejíaSuárez You're right it is mainly opinion based but I don't search for the overall best solution but only for one. For me it is enough when one person thinks solution x is the best one. I just need a decision not made by me. I have too less experience for making any decision on this topic by myself.
    – Temerita
    Nov 20 '18 at 15:39










  • @hrbrmstr thanks for the advice, I'll give it a try :) .
    – Temerita
    Nov 20 '18 at 15:40














0












0








0







I have a mathematical simulation written in scala (random numbers, small calculations, lots of going through collections and producing a lot of data). Currently I produce some csv files as output. Then I load them into R and plot the information. But csv is probably not the best option for sharing big data. My problem is that I don't know how to improve my current approach.



Shall I use a database? Which one? MariaDB?



Shall I calculate the data that shall be plotted in scala while my simulation is running? Without calculating plotting data my program needs 20s for 500000 simulation steps. With calculations it needs more than 3min. But I could use Threads for the calculations. Or shall I give R the pure data and do the calculations on this data in R?



Shall I use Hadoop and Spark? Together with a database?



I am quite confused and hope you have some best practices for me.










share|improve this question













I have a mathematical simulation written in scala (random numbers, small calculations, lots of going through collections and producing a lot of data). Currently I produce some csv files as output. Then I load them into R and plot the information. But csv is probably not the best option for sharing big data. My problem is that I don't know how to improve my current approach.



Shall I use a database? Which one? MariaDB?



Shall I calculate the data that shall be plotted in scala while my simulation is running? Without calculating plotting data my program needs 20s for 500000 simulation steps. With calculations it needs more than 3min. But I could use Threads for the calculations. Or shall I give R the pure data and do the calculations on this data in R?



Shall I use Hadoop and Spark? Together with a database?



I am quite confused and hope you have some best practices for me.







r database scala apache-spark bigdata






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 20 '18 at 15:25









TemeritaTemerita

366




366








  • 3




    Hi @Temerita, Even if I believe this is a great question to be discussed, it may not be appropriated for StackOverflow for being "too board" and "Mainly opinion based". - I would suggest you to search for articles about it and made little Proof Of Concepts of what you find.
    – Luis Miguel Mejía Suárez
    Nov 20 '18 at 15:29










  • You should also only adress one question.
    – Christoph
    Nov 20 '18 at 15:37






  • 2




    I agree! Great q's @Temerita. There's a Data Science SE datascience.stackexchange.com that's similar to here but doesn't have the code+data narrow focus we have here. I'd suggest posting over there (your login works there too) and give it a go!
    – hrbrmstr
    Nov 20 '18 at 15:37










  • @LuisMiguelMejíaSuárez You're right it is mainly opinion based but I don't search for the overall best solution but only for one. For me it is enough when one person thinks solution x is the best one. I just need a decision not made by me. I have too less experience for making any decision on this topic by myself.
    – Temerita
    Nov 20 '18 at 15:39










  • @hrbrmstr thanks for the advice, I'll give it a try :) .
    – Temerita
    Nov 20 '18 at 15:40














  • 3




    Hi @Temerita, Even if I believe this is a great question to be discussed, it may not be appropriated for StackOverflow for being "too board" and "Mainly opinion based". - I would suggest you to search for articles about it and made little Proof Of Concepts of what you find.
    – Luis Miguel Mejía Suárez
    Nov 20 '18 at 15:29










  • You should also only adress one question.
    – Christoph
    Nov 20 '18 at 15:37






  • 2




    I agree! Great q's @Temerita. There's a Data Science SE datascience.stackexchange.com that's similar to here but doesn't have the code+data narrow focus we have here. I'd suggest posting over there (your login works there too) and give it a go!
    – hrbrmstr
    Nov 20 '18 at 15:37










  • @LuisMiguelMejíaSuárez You're right it is mainly opinion based but I don't search for the overall best solution but only for one. For me it is enough when one person thinks solution x is the best one. I just need a decision not made by me. I have too less experience for making any decision on this topic by myself.
    – Temerita
    Nov 20 '18 at 15:39










  • @hrbrmstr thanks for the advice, I'll give it a try :) .
    – Temerita
    Nov 20 '18 at 15:40








3




3




Hi @Temerita, Even if I believe this is a great question to be discussed, it may not be appropriated for StackOverflow for being "too board" and "Mainly opinion based". - I would suggest you to search for articles about it and made little Proof Of Concepts of what you find.
– Luis Miguel Mejía Suárez
Nov 20 '18 at 15:29




Hi @Temerita, Even if I believe this is a great question to be discussed, it may not be appropriated for StackOverflow for being "too board" and "Mainly opinion based". - I would suggest you to search for articles about it and made little Proof Of Concepts of what you find.
– Luis Miguel Mejía Suárez
Nov 20 '18 at 15:29












You should also only adress one question.
– Christoph
Nov 20 '18 at 15:37




You should also only adress one question.
– Christoph
Nov 20 '18 at 15:37




2




2




I agree! Great q's @Temerita. There's a Data Science SE datascience.stackexchange.com that's similar to here but doesn't have the code+data narrow focus we have here. I'd suggest posting over there (your login works there too) and give it a go!
– hrbrmstr
Nov 20 '18 at 15:37




I agree! Great q's @Temerita. There's a Data Science SE datascience.stackexchange.com that's similar to here but doesn't have the code+data narrow focus we have here. I'd suggest posting over there (your login works there too) and give it a go!
– hrbrmstr
Nov 20 '18 at 15:37












@LuisMiguelMejíaSuárez You're right it is mainly opinion based but I don't search for the overall best solution but only for one. For me it is enough when one person thinks solution x is the best one. I just need a decision not made by me. I have too less experience for making any decision on this topic by myself.
– Temerita
Nov 20 '18 at 15:39




@LuisMiguelMejíaSuárez You're right it is mainly opinion based but I don't search for the overall best solution but only for one. For me it is enough when one person thinks solution x is the best one. I just need a decision not made by me. I have too less experience for making any decision on this topic by myself.
– Temerita
Nov 20 '18 at 15:39












@hrbrmstr thanks for the advice, I'll give it a try :) .
– Temerita
Nov 20 '18 at 15:40




@hrbrmstr thanks for the advice, I'll give it a try :) .
– Temerita
Nov 20 '18 at 15:40












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53396243%2fhow-can-i-handle-big-data-best-when-having-a-scala-mathematical-simulation-with%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53396243%2fhow-can-i-handle-big-data-best-when-having-a-scala-mathematical-simulation-with%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

If I really need a card on my start hand, how many mulligans make sense? [duplicate]

Alcedinidae

Can an atomic nucleus contain both particles and antiparticles? [duplicate]