Schedule YARN application on active/standby nodes












0















I would like to have a cluster that is split to 2 sub-clusters: "active" nodes and "standby" nodes.
Normally, when an application is scheduled I would like it to run on the "active" nodes. But if no "active" node is healthy, I would like it to run on the "standby" nodes.



Is there a way to achieve such behavior in YARN?



To give a bit more details, the "active" nodes of the cluster will be located in different zone than the the "standby" nodes (but not so far from them).
Thus we try to achieve multi-zone high availability for our application. Meaning, upon disaster in the "active" zone, the application will be recovered and scheduled on the "standby" zone.










share|improve this question























  • What version of Hadoop are you running?

    – tk421
    Nov 22 '18 at 1:31











  • Currently we are just checking our options. We are open for any version that gives us that functionality. Thanks.

    – Shay
    Nov 22 '18 at 8:32
















0















I would like to have a cluster that is split to 2 sub-clusters: "active" nodes and "standby" nodes.
Normally, when an application is scheduled I would like it to run on the "active" nodes. But if no "active" node is healthy, I would like it to run on the "standby" nodes.



Is there a way to achieve such behavior in YARN?



To give a bit more details, the "active" nodes of the cluster will be located in different zone than the the "standby" nodes (but not so far from them).
Thus we try to achieve multi-zone high availability for our application. Meaning, upon disaster in the "active" zone, the application will be recovered and scheduled on the "standby" zone.










share|improve this question























  • What version of Hadoop are you running?

    – tk421
    Nov 22 '18 at 1:31











  • Currently we are just checking our options. We are open for any version that gives us that functionality. Thanks.

    – Shay
    Nov 22 '18 at 8:32














0












0








0








I would like to have a cluster that is split to 2 sub-clusters: "active" nodes and "standby" nodes.
Normally, when an application is scheduled I would like it to run on the "active" nodes. But if no "active" node is healthy, I would like it to run on the "standby" nodes.



Is there a way to achieve such behavior in YARN?



To give a bit more details, the "active" nodes of the cluster will be located in different zone than the the "standby" nodes (but not so far from them).
Thus we try to achieve multi-zone high availability for our application. Meaning, upon disaster in the "active" zone, the application will be recovered and scheduled on the "standby" zone.










share|improve this question














I would like to have a cluster that is split to 2 sub-clusters: "active" nodes and "standby" nodes.
Normally, when an application is scheduled I would like it to run on the "active" nodes. But if no "active" node is healthy, I would like it to run on the "standby" nodes.



Is there a way to achieve such behavior in YARN?



To give a bit more details, the "active" nodes of the cluster will be located in different zone than the the "standby" nodes (but not so far from them).
Thus we try to achieve multi-zone high availability for our application. Meaning, upon disaster in the "active" zone, the application will be recovered and scheduled on the "standby" zone.







yarn






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 21 '18 at 21:28









ShayShay

13910




13910













  • What version of Hadoop are you running?

    – tk421
    Nov 22 '18 at 1:31











  • Currently we are just checking our options. We are open for any version that gives us that functionality. Thanks.

    – Shay
    Nov 22 '18 at 8:32



















  • What version of Hadoop are you running?

    – tk421
    Nov 22 '18 at 1:31











  • Currently we are just checking our options. We are open for any version that gives us that functionality. Thanks.

    – Shay
    Nov 22 '18 at 8:32

















What version of Hadoop are you running?

– tk421
Nov 22 '18 at 1:31





What version of Hadoop are you running?

– tk421
Nov 22 '18 at 1:31













Currently we are just checking our options. We are open for any version that gives us that functionality. Thanks.

– Shay
Nov 22 '18 at 8:32





Currently we are just checking our options. We are open for any version that gives us that functionality. Thanks.

– Shay
Nov 22 '18 at 8:32












1 Answer
1






active

oldest

votes


















1














To route jobs to specific nodes, you will need Node Labels. Capacity Scheduler has had them for a while (2.6 or earlier), but for Fair Scheduler I think they were planning on supporting them in Hadoop 3.x.



Another option to consider is YARN federation where you have more than one YARN cluster so your 2nd would be in zone 2 and you can re-route your job to zone 2 if zone 1 has issues.



References




  • YARN Node Labels

  • Hadoop: YARN Federation






share|improve this answer
























  • Thanks @tk421. Using Node Labels, can I configure something like "prefer selecting nodes with 'active' labels, and if not healthy select others"? As far as I understood - I can't (though, in k8s it is possible).

    – Shay
    Nov 29 '18 at 17:48











  • You'd have to do this via scheduling queues. Node health is part of YARN automatically meaning if your Node Manager is unusable, it will mark itself as unavailable. See hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/….

    – tk421
    Nov 29 '18 at 20:06











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420709%2fschedule-yarn-application-on-active-standby-nodes%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














To route jobs to specific nodes, you will need Node Labels. Capacity Scheduler has had them for a while (2.6 or earlier), but for Fair Scheduler I think they were planning on supporting them in Hadoop 3.x.



Another option to consider is YARN federation where you have more than one YARN cluster so your 2nd would be in zone 2 and you can re-route your job to zone 2 if zone 1 has issues.



References




  • YARN Node Labels

  • Hadoop: YARN Federation






share|improve this answer
























  • Thanks @tk421. Using Node Labels, can I configure something like "prefer selecting nodes with 'active' labels, and if not healthy select others"? As far as I understood - I can't (though, in k8s it is possible).

    – Shay
    Nov 29 '18 at 17:48











  • You'd have to do this via scheduling queues. Node health is part of YARN automatically meaning if your Node Manager is unusable, it will mark itself as unavailable. See hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/….

    – tk421
    Nov 29 '18 at 20:06
















1














To route jobs to specific nodes, you will need Node Labels. Capacity Scheduler has had them for a while (2.6 or earlier), but for Fair Scheduler I think they were planning on supporting them in Hadoop 3.x.



Another option to consider is YARN federation where you have more than one YARN cluster so your 2nd would be in zone 2 and you can re-route your job to zone 2 if zone 1 has issues.



References




  • YARN Node Labels

  • Hadoop: YARN Federation






share|improve this answer
























  • Thanks @tk421. Using Node Labels, can I configure something like "prefer selecting nodes with 'active' labels, and if not healthy select others"? As far as I understood - I can't (though, in k8s it is possible).

    – Shay
    Nov 29 '18 at 17:48











  • You'd have to do this via scheduling queues. Node health is part of YARN automatically meaning if your Node Manager is unusable, it will mark itself as unavailable. See hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/….

    – tk421
    Nov 29 '18 at 20:06














1












1








1







To route jobs to specific nodes, you will need Node Labels. Capacity Scheduler has had them for a while (2.6 or earlier), but for Fair Scheduler I think they were planning on supporting them in Hadoop 3.x.



Another option to consider is YARN federation where you have more than one YARN cluster so your 2nd would be in zone 2 and you can re-route your job to zone 2 if zone 1 has issues.



References




  • YARN Node Labels

  • Hadoop: YARN Federation






share|improve this answer













To route jobs to specific nodes, you will need Node Labels. Capacity Scheduler has had them for a while (2.6 or earlier), but for Fair Scheduler I think they were planning on supporting them in Hadoop 3.x.



Another option to consider is YARN federation where you have more than one YARN cluster so your 2nd would be in zone 2 and you can re-route your job to zone 2 if zone 1 has issues.



References




  • YARN Node Labels

  • Hadoop: YARN Federation







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 28 '18 at 22:20









tk421tk421

3,50231526




3,50231526













  • Thanks @tk421. Using Node Labels, can I configure something like "prefer selecting nodes with 'active' labels, and if not healthy select others"? As far as I understood - I can't (though, in k8s it is possible).

    – Shay
    Nov 29 '18 at 17:48











  • You'd have to do this via scheduling queues. Node health is part of YARN automatically meaning if your Node Manager is unusable, it will mark itself as unavailable. See hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/….

    – tk421
    Nov 29 '18 at 20:06



















  • Thanks @tk421. Using Node Labels, can I configure something like "prefer selecting nodes with 'active' labels, and if not healthy select others"? As far as I understood - I can't (though, in k8s it is possible).

    – Shay
    Nov 29 '18 at 17:48











  • You'd have to do this via scheduling queues. Node health is part of YARN automatically meaning if your Node Manager is unusable, it will mark itself as unavailable. See hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/….

    – tk421
    Nov 29 '18 at 20:06

















Thanks @tk421. Using Node Labels, can I configure something like "prefer selecting nodes with 'active' labels, and if not healthy select others"? As far as I understood - I can't (though, in k8s it is possible).

– Shay
Nov 29 '18 at 17:48





Thanks @tk421. Using Node Labels, can I configure something like "prefer selecting nodes with 'active' labels, and if not healthy select others"? As far as I understood - I can't (though, in k8s it is possible).

– Shay
Nov 29 '18 at 17:48













You'd have to do this via scheduling queues. Node health is part of YARN automatically meaning if your Node Manager is unusable, it will mark itself as unavailable. See hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/….

– tk421
Nov 29 '18 at 20:06





You'd have to do this via scheduling queues. Node health is part of YARN automatically meaning if your Node Manager is unusable, it will mark itself as unavailable. See hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/….

– tk421
Nov 29 '18 at 20:06




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420709%2fschedule-yarn-application-on-active-standby-nodes%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Paul Cézanne

UIScrollView CustomStickyHeader Resize height generates problems when scroll is too fast

Angular material date-picker (MatDatepicker) auto completes the date on focus out