How can I handle big data best when having a Scala mathematical simulation with data visualization in R?
I have a mathematical simulation written in scala (random numbers, small calculations, lots of going through collections and producing a lot of data). Currently I produce some csv files as output. Then I load them into R and plot the information. But csv is probably not the best option for sharing big data. My problem is that I don't know how to improve my current approach.
Shall I use a database? Which one? MariaDB?
Shall I calculate the data that shall be plotted in scala while my simulation is running? Without calculating plotting data my program needs 20s for 500000 simulation steps. With calculations it needs more than 3min. But I could use Threads for the calculations. Or shall I give R the pure data and do the calculations on this data in R?
Shall I use Hadoop and Spark? Together with a database?
I am quite confused and hope you have some best practices for me.
r database scala apache-spark bigdata
|
show 1 more comment
I have a mathematical simulation written in scala (random numbers, small calculations, lots of going through collections and producing a lot of data). Currently I produce some csv files as output. Then I load them into R and plot the information. But csv is probably not the best option for sharing big data. My problem is that I don't know how to improve my current approach.
Shall I use a database? Which one? MariaDB?
Shall I calculate the data that shall be plotted in scala while my simulation is running? Without calculating plotting data my program needs 20s for 500000 simulation steps. With calculations it needs more than 3min. But I could use Threads for the calculations. Or shall I give R the pure data and do the calculations on this data in R?
Shall I use Hadoop and Spark? Together with a database?
I am quite confused and hope you have some best practices for me.
r database scala apache-spark bigdata
3
Hi @Temerita, Even if I believe this is a great question to be discussed, it may not be appropriated for StackOverflow for being "too board" and "Mainly opinion based". - I would suggest you to search for articles about it and made little Proof Of Concepts of what you find.
– Luis Miguel Mejía Suárez
Nov 20 '18 at 15:29
You should also only adress one question.
– Christoph
Nov 20 '18 at 15:37
2
I agree! Great q's @Temerita. There's a Data Science SE datascience.stackexchange.com that's similar to here but doesn't have the code+data narrow focus we have here. I'd suggest posting over there (your login works there too) and give it a go!
– hrbrmstr
Nov 20 '18 at 15:37
@LuisMiguelMejíaSuárez You're right it is mainly opinion based but I don't search for the overall best solution but only for one. For me it is enough when one person thinks solution x is the best one. I just need a decision not made by me. I have too less experience for making any decision on this topic by myself.
– Temerita
Nov 20 '18 at 15:39
@hrbrmstr thanks for the advice, I'll give it a try :) .
– Temerita
Nov 20 '18 at 15:40
|
show 1 more comment
I have a mathematical simulation written in scala (random numbers, small calculations, lots of going through collections and producing a lot of data). Currently I produce some csv files as output. Then I load them into R and plot the information. But csv is probably not the best option for sharing big data. My problem is that I don't know how to improve my current approach.
Shall I use a database? Which one? MariaDB?
Shall I calculate the data that shall be plotted in scala while my simulation is running? Without calculating plotting data my program needs 20s for 500000 simulation steps. With calculations it needs more than 3min. But I could use Threads for the calculations. Or shall I give R the pure data and do the calculations on this data in R?
Shall I use Hadoop and Spark? Together with a database?
I am quite confused and hope you have some best practices for me.
r database scala apache-spark bigdata
I have a mathematical simulation written in scala (random numbers, small calculations, lots of going through collections and producing a lot of data). Currently I produce some csv files as output. Then I load them into R and plot the information. But csv is probably not the best option for sharing big data. My problem is that I don't know how to improve my current approach.
Shall I use a database? Which one? MariaDB?
Shall I calculate the data that shall be plotted in scala while my simulation is running? Without calculating plotting data my program needs 20s for 500000 simulation steps. With calculations it needs more than 3min. But I could use Threads for the calculations. Or shall I give R the pure data and do the calculations on this data in R?
Shall I use Hadoop and Spark? Together with a database?
I am quite confused and hope you have some best practices for me.
r database scala apache-spark bigdata
r database scala apache-spark bigdata
asked Nov 20 '18 at 15:25
TemeritaTemerita
366
366
3
Hi @Temerita, Even if I believe this is a great question to be discussed, it may not be appropriated for StackOverflow for being "too board" and "Mainly opinion based". - I would suggest you to search for articles about it and made little Proof Of Concepts of what you find.
– Luis Miguel Mejía Suárez
Nov 20 '18 at 15:29
You should also only adress one question.
– Christoph
Nov 20 '18 at 15:37
2
I agree! Great q's @Temerita. There's a Data Science SE datascience.stackexchange.com that's similar to here but doesn't have the code+data narrow focus we have here. I'd suggest posting over there (your login works there too) and give it a go!
– hrbrmstr
Nov 20 '18 at 15:37
@LuisMiguelMejíaSuárez You're right it is mainly opinion based but I don't search for the overall best solution but only for one. For me it is enough when one person thinks solution x is the best one. I just need a decision not made by me. I have too less experience for making any decision on this topic by myself.
– Temerita
Nov 20 '18 at 15:39
@hrbrmstr thanks for the advice, I'll give it a try :) .
– Temerita
Nov 20 '18 at 15:40
|
show 1 more comment
3
Hi @Temerita, Even if I believe this is a great question to be discussed, it may not be appropriated for StackOverflow for being "too board" and "Mainly opinion based". - I would suggest you to search for articles about it and made little Proof Of Concepts of what you find.
– Luis Miguel Mejía Suárez
Nov 20 '18 at 15:29
You should also only adress one question.
– Christoph
Nov 20 '18 at 15:37
2
I agree! Great q's @Temerita. There's a Data Science SE datascience.stackexchange.com that's similar to here but doesn't have the code+data narrow focus we have here. I'd suggest posting over there (your login works there too) and give it a go!
– hrbrmstr
Nov 20 '18 at 15:37
@LuisMiguelMejíaSuárez You're right it is mainly opinion based but I don't search for the overall best solution but only for one. For me it is enough when one person thinks solution x is the best one. I just need a decision not made by me. I have too less experience for making any decision on this topic by myself.
– Temerita
Nov 20 '18 at 15:39
@hrbrmstr thanks for the advice, I'll give it a try :) .
– Temerita
Nov 20 '18 at 15:40
3
3
Hi @Temerita, Even if I believe this is a great question to be discussed, it may not be appropriated for StackOverflow for being "too board" and "Mainly opinion based". - I would suggest you to search for articles about it and made little Proof Of Concepts of what you find.
– Luis Miguel Mejía Suárez
Nov 20 '18 at 15:29
Hi @Temerita, Even if I believe this is a great question to be discussed, it may not be appropriated for StackOverflow for being "too board" and "Mainly opinion based". - I would suggest you to search for articles about it and made little Proof Of Concepts of what you find.
– Luis Miguel Mejía Suárez
Nov 20 '18 at 15:29
You should also only adress one question.
– Christoph
Nov 20 '18 at 15:37
You should also only adress one question.
– Christoph
Nov 20 '18 at 15:37
2
2
I agree! Great q's @Temerita. There's a Data Science SE datascience.stackexchange.com that's similar to here but doesn't have the code+data narrow focus we have here. I'd suggest posting over there (your login works there too) and give it a go!
– hrbrmstr
Nov 20 '18 at 15:37
I agree! Great q's @Temerita. There's a Data Science SE datascience.stackexchange.com that's similar to here but doesn't have the code+data narrow focus we have here. I'd suggest posting over there (your login works there too) and give it a go!
– hrbrmstr
Nov 20 '18 at 15:37
@LuisMiguelMejíaSuárez You're right it is mainly opinion based but I don't search for the overall best solution but only for one. For me it is enough when one person thinks solution x is the best one. I just need a decision not made by me. I have too less experience for making any decision on this topic by myself.
– Temerita
Nov 20 '18 at 15:39
@LuisMiguelMejíaSuárez You're right it is mainly opinion based but I don't search for the overall best solution but only for one. For me it is enough when one person thinks solution x is the best one. I just need a decision not made by me. I have too less experience for making any decision on this topic by myself.
– Temerita
Nov 20 '18 at 15:39
@hrbrmstr thanks for the advice, I'll give it a try :) .
– Temerita
Nov 20 '18 at 15:40
@hrbrmstr thanks for the advice, I'll give it a try :) .
– Temerita
Nov 20 '18 at 15:40
|
show 1 more comment
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53396243%2fhow-can-i-handle-big-data-best-when-having-a-scala-mathematical-simulation-with%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53396243%2fhow-can-i-handle-big-data-best-when-having-a-scala-mathematical-simulation-with%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
Hi @Temerita, Even if I believe this is a great question to be discussed, it may not be appropriated for StackOverflow for being "too board" and "Mainly opinion based". - I would suggest you to search for articles about it and made little Proof Of Concepts of what you find.
– Luis Miguel Mejía Suárez
Nov 20 '18 at 15:29
You should also only adress one question.
– Christoph
Nov 20 '18 at 15:37
2
I agree! Great q's @Temerita. There's a Data Science SE datascience.stackexchange.com that's similar to here but doesn't have the code+data narrow focus we have here. I'd suggest posting over there (your login works there too) and give it a go!
– hrbrmstr
Nov 20 '18 at 15:37
@LuisMiguelMejíaSuárez You're right it is mainly opinion based but I don't search for the overall best solution but only for one. For me it is enough when one person thinks solution x is the best one. I just need a decision not made by me. I have too less experience for making any decision on this topic by myself.
– Temerita
Nov 20 '18 at 15:39
@hrbrmstr thanks for the advice, I'll give it a try :) .
– Temerita
Nov 20 '18 at 15:40