What is the recommended approach towards multi-tenant databases in Cassandra?












1















I'm thinking of creating a multi-tenant app using Apache Cassandra.



I can think of three strategies:




  1. All tenants in the same keyspace using tenant-specific fields for security

  2. table per tenant in a single shared DB

  3. Keyspace per tenant


The voice in my head is suggesting that I go with option 3.



Thoughts and implications, anyone?










share|improve this question

























  • Not sure why Spring-Data-Cassandra is tagged here, as the question has nothing to do with it. But I'll say that you should really use the DataStax Java Driver. The Spring-Data-Cassandra driver uses large batches and unbound queries to mimic some of the functionality from the relational world. So Spring-Data-Cassandra is a definite no in my book; especially in a multi-tenant cluster.

    – Aaron
    Nov 21 '18 at 14:09











  • Support regarding not using spring-data-cassandra :-)

    – Alex Ott
    Nov 21 '18 at 14:20











  • How many tennants?

    – phact
    Nov 21 '18 at 14:36











  • there will be 40+ tenants expecting

    – Jagan
    Nov 22 '18 at 7:26
















1















I'm thinking of creating a multi-tenant app using Apache Cassandra.



I can think of three strategies:




  1. All tenants in the same keyspace using tenant-specific fields for security

  2. table per tenant in a single shared DB

  3. Keyspace per tenant


The voice in my head is suggesting that I go with option 3.



Thoughts and implications, anyone?










share|improve this question

























  • Not sure why Spring-Data-Cassandra is tagged here, as the question has nothing to do with it. But I'll say that you should really use the DataStax Java Driver. The Spring-Data-Cassandra driver uses large batches and unbound queries to mimic some of the functionality from the relational world. So Spring-Data-Cassandra is a definite no in my book; especially in a multi-tenant cluster.

    – Aaron
    Nov 21 '18 at 14:09











  • Support regarding not using spring-data-cassandra :-)

    – Alex Ott
    Nov 21 '18 at 14:20











  • How many tennants?

    – phact
    Nov 21 '18 at 14:36











  • there will be 40+ tenants expecting

    – Jagan
    Nov 22 '18 at 7:26














1












1








1








I'm thinking of creating a multi-tenant app using Apache Cassandra.



I can think of three strategies:




  1. All tenants in the same keyspace using tenant-specific fields for security

  2. table per tenant in a single shared DB

  3. Keyspace per tenant


The voice in my head is suggesting that I go with option 3.



Thoughts and implications, anyone?










share|improve this question
















I'm thinking of creating a multi-tenant app using Apache Cassandra.



I can think of three strategies:




  1. All tenants in the same keyspace using tenant-specific fields for security

  2. table per tenant in a single shared DB

  3. Keyspace per tenant


The voice in my head is suggesting that I go with option 3.



Thoughts and implications, anyone?







cassandra cassandra-2.0 cassandra-3.0 spring-data-cassandra






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 7:56









Alex Ott

27.9k35273




27.9k35273










asked Nov 21 '18 at 7:33









JaganJagan

4016




4016













  • Not sure why Spring-Data-Cassandra is tagged here, as the question has nothing to do with it. But I'll say that you should really use the DataStax Java Driver. The Spring-Data-Cassandra driver uses large batches and unbound queries to mimic some of the functionality from the relational world. So Spring-Data-Cassandra is a definite no in my book; especially in a multi-tenant cluster.

    – Aaron
    Nov 21 '18 at 14:09











  • Support regarding not using spring-data-cassandra :-)

    – Alex Ott
    Nov 21 '18 at 14:20











  • How many tennants?

    – phact
    Nov 21 '18 at 14:36











  • there will be 40+ tenants expecting

    – Jagan
    Nov 22 '18 at 7:26



















  • Not sure why Spring-Data-Cassandra is tagged here, as the question has nothing to do with it. But I'll say that you should really use the DataStax Java Driver. The Spring-Data-Cassandra driver uses large batches and unbound queries to mimic some of the functionality from the relational world. So Spring-Data-Cassandra is a definite no in my book; especially in a multi-tenant cluster.

    – Aaron
    Nov 21 '18 at 14:09











  • Support regarding not using spring-data-cassandra :-)

    – Alex Ott
    Nov 21 '18 at 14:20











  • How many tennants?

    – phact
    Nov 21 '18 at 14:36











  • there will be 40+ tenants expecting

    – Jagan
    Nov 22 '18 at 7:26

















Not sure why Spring-Data-Cassandra is tagged here, as the question has nothing to do with it. But I'll say that you should really use the DataStax Java Driver. The Spring-Data-Cassandra driver uses large batches and unbound queries to mimic some of the functionality from the relational world. So Spring-Data-Cassandra is a definite no in my book; especially in a multi-tenant cluster.

– Aaron
Nov 21 '18 at 14:09





Not sure why Spring-Data-Cassandra is tagged here, as the question has nothing to do with it. But I'll say that you should really use the DataStax Java Driver. The Spring-Data-Cassandra driver uses large batches and unbound queries to mimic some of the functionality from the relational world. So Spring-Data-Cassandra is a definite no in my book; especially in a multi-tenant cluster.

– Aaron
Nov 21 '18 at 14:09













Support regarding not using spring-data-cassandra :-)

– Alex Ott
Nov 21 '18 at 14:20





Support regarding not using spring-data-cassandra :-)

– Alex Ott
Nov 21 '18 at 14:20













How many tennants?

– phact
Nov 21 '18 at 14:36





How many tennants?

– phact
Nov 21 '18 at 14:36













there will be 40+ tenants expecting

– Jagan
Nov 22 '18 at 7:26





there will be 40+ tenants expecting

– Jagan
Nov 22 '18 at 7:26












2 Answers
2






active

oldest

votes


















4














There are several considerations that you need to take into account:



Option 1: In pure Cassandra this option will work only if access to database will be always through "proxy" - the API, for example, that will enforce filtering on tenant field. Otherwise, if you provide an CQL access, then everybody can read all data. In this case, you need also to create data model carefully, to have tenant as a part of composite partition key. DataStax Enterprise (DSE) has additional functionality called row-level access control (RLAC) that allows to set permissions on the table level.



Options 2 & 3: are quite similar, except that when you have a keyspace per tenant, then you have flexibility to setup different replication strategy - this could be useful to store customer's data in different data centers bound to different geographic regions. But in both cases there are limitations on the number of tables in the cluster - reasonable number of tables is around 200, with "hard stop" on more than 500. The reason - you need an additional resources, such as memory, to keep auxiliary data structures (bloom filter, etc.) for every table, and this will consume both heap & off-heap memory.






share|improve this answer
























  • Thanks for the suggestion. Please share if you have any working examples on the multi-tenancy with Cassandra with Option 3.

    – Jagan
    Nov 22 '18 at 7:28











  • it's just standard Cassandra functionality - you create keyspace and configure data centers

    – Alex Ott
    Nov 22 '18 at 9:15



















4














I've done this for a few years now at large-scale in the retail space. So my belief is that the recommended way to handle multi-tenancy in Cassandra, is not to. No matter how you do it, the tenants will be hit by the "noisy neighbor" problem. Just wait until one tenant runs a BATCH update with 60k writes batched to the same table, and everyone else's performance falls off.



But the bigger problem, is that there's no way you can guarantee that each tenant will even have a similar ratio of reads to writes. In fact they will likely be quite different. That's going to be a problem for options #1 and #2, as disk IOPs will be going to the same directory.



Option #3 is really the only way it realistically works. But again, all it takes is one ill-considered BATCH write to crush everyone. Also, want to upgrade your cluster? Now you have to coordinate it with multiple teams, instead of just one. Using SSL? Make sure multiple teams get the right certificate, instead of just one.



When we have new teams use Cassandra, each team gets their own cluster. That way, they can't hurt anyone else, and we can support them with fewer question marks about who is doing what.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53407203%2fwhat-is-the-recommended-approach-towards-multi-tenant-databases-in-cassandra%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    4














    There are several considerations that you need to take into account:



    Option 1: In pure Cassandra this option will work only if access to database will be always through "proxy" - the API, for example, that will enforce filtering on tenant field. Otherwise, if you provide an CQL access, then everybody can read all data. In this case, you need also to create data model carefully, to have tenant as a part of composite partition key. DataStax Enterprise (DSE) has additional functionality called row-level access control (RLAC) that allows to set permissions on the table level.



    Options 2 & 3: are quite similar, except that when you have a keyspace per tenant, then you have flexibility to setup different replication strategy - this could be useful to store customer's data in different data centers bound to different geographic regions. But in both cases there are limitations on the number of tables in the cluster - reasonable number of tables is around 200, with "hard stop" on more than 500. The reason - you need an additional resources, such as memory, to keep auxiliary data structures (bloom filter, etc.) for every table, and this will consume both heap & off-heap memory.






    share|improve this answer
























    • Thanks for the suggestion. Please share if you have any working examples on the multi-tenancy with Cassandra with Option 3.

      – Jagan
      Nov 22 '18 at 7:28











    • it's just standard Cassandra functionality - you create keyspace and configure data centers

      – Alex Ott
      Nov 22 '18 at 9:15
















    4














    There are several considerations that you need to take into account:



    Option 1: In pure Cassandra this option will work only if access to database will be always through "proxy" - the API, for example, that will enforce filtering on tenant field. Otherwise, if you provide an CQL access, then everybody can read all data. In this case, you need also to create data model carefully, to have tenant as a part of composite partition key. DataStax Enterprise (DSE) has additional functionality called row-level access control (RLAC) that allows to set permissions on the table level.



    Options 2 & 3: are quite similar, except that when you have a keyspace per tenant, then you have flexibility to setup different replication strategy - this could be useful to store customer's data in different data centers bound to different geographic regions. But in both cases there are limitations on the number of tables in the cluster - reasonable number of tables is around 200, with "hard stop" on more than 500. The reason - you need an additional resources, such as memory, to keep auxiliary data structures (bloom filter, etc.) for every table, and this will consume both heap & off-heap memory.






    share|improve this answer
























    • Thanks for the suggestion. Please share if you have any working examples on the multi-tenancy with Cassandra with Option 3.

      – Jagan
      Nov 22 '18 at 7:28











    • it's just standard Cassandra functionality - you create keyspace and configure data centers

      – Alex Ott
      Nov 22 '18 at 9:15














    4












    4








    4







    There are several considerations that you need to take into account:



    Option 1: In pure Cassandra this option will work only if access to database will be always through "proxy" - the API, for example, that will enforce filtering on tenant field. Otherwise, if you provide an CQL access, then everybody can read all data. In this case, you need also to create data model carefully, to have tenant as a part of composite partition key. DataStax Enterprise (DSE) has additional functionality called row-level access control (RLAC) that allows to set permissions on the table level.



    Options 2 & 3: are quite similar, except that when you have a keyspace per tenant, then you have flexibility to setup different replication strategy - this could be useful to store customer's data in different data centers bound to different geographic regions. But in both cases there are limitations on the number of tables in the cluster - reasonable number of tables is around 200, with "hard stop" on more than 500. The reason - you need an additional resources, such as memory, to keep auxiliary data structures (bloom filter, etc.) for every table, and this will consume both heap & off-heap memory.






    share|improve this answer













    There are several considerations that you need to take into account:



    Option 1: In pure Cassandra this option will work only if access to database will be always through "proxy" - the API, for example, that will enforce filtering on tenant field. Otherwise, if you provide an CQL access, then everybody can read all data. In this case, you need also to create data model carefully, to have tenant as a part of composite partition key. DataStax Enterprise (DSE) has additional functionality called row-level access control (RLAC) that allows to set permissions on the table level.



    Options 2 & 3: are quite similar, except that when you have a keyspace per tenant, then you have flexibility to setup different replication strategy - this could be useful to store customer's data in different data centers bound to different geographic regions. But in both cases there are limitations on the number of tables in the cluster - reasonable number of tables is around 200, with "hard stop" on more than 500. The reason - you need an additional resources, such as memory, to keep auxiliary data structures (bloom filter, etc.) for every table, and this will consume both heap & off-heap memory.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 21 '18 at 8:06









    Alex OttAlex Ott

    27.9k35273




    27.9k35273













    • Thanks for the suggestion. Please share if you have any working examples on the multi-tenancy with Cassandra with Option 3.

      – Jagan
      Nov 22 '18 at 7:28











    • it's just standard Cassandra functionality - you create keyspace and configure data centers

      – Alex Ott
      Nov 22 '18 at 9:15



















    • Thanks for the suggestion. Please share if you have any working examples on the multi-tenancy with Cassandra with Option 3.

      – Jagan
      Nov 22 '18 at 7:28











    • it's just standard Cassandra functionality - you create keyspace and configure data centers

      – Alex Ott
      Nov 22 '18 at 9:15

















    Thanks for the suggestion. Please share if you have any working examples on the multi-tenancy with Cassandra with Option 3.

    – Jagan
    Nov 22 '18 at 7:28





    Thanks for the suggestion. Please share if you have any working examples on the multi-tenancy with Cassandra with Option 3.

    – Jagan
    Nov 22 '18 at 7:28













    it's just standard Cassandra functionality - you create keyspace and configure data centers

    – Alex Ott
    Nov 22 '18 at 9:15





    it's just standard Cassandra functionality - you create keyspace and configure data centers

    – Alex Ott
    Nov 22 '18 at 9:15













    4














    I've done this for a few years now at large-scale in the retail space. So my belief is that the recommended way to handle multi-tenancy in Cassandra, is not to. No matter how you do it, the tenants will be hit by the "noisy neighbor" problem. Just wait until one tenant runs a BATCH update with 60k writes batched to the same table, and everyone else's performance falls off.



    But the bigger problem, is that there's no way you can guarantee that each tenant will even have a similar ratio of reads to writes. In fact they will likely be quite different. That's going to be a problem for options #1 and #2, as disk IOPs will be going to the same directory.



    Option #3 is really the only way it realistically works. But again, all it takes is one ill-considered BATCH write to crush everyone. Also, want to upgrade your cluster? Now you have to coordinate it with multiple teams, instead of just one. Using SSL? Make sure multiple teams get the right certificate, instead of just one.



    When we have new teams use Cassandra, each team gets their own cluster. That way, they can't hurt anyone else, and we can support them with fewer question marks about who is doing what.






    share|improve this answer




























      4














      I've done this for a few years now at large-scale in the retail space. So my belief is that the recommended way to handle multi-tenancy in Cassandra, is not to. No matter how you do it, the tenants will be hit by the "noisy neighbor" problem. Just wait until one tenant runs a BATCH update with 60k writes batched to the same table, and everyone else's performance falls off.



      But the bigger problem, is that there's no way you can guarantee that each tenant will even have a similar ratio of reads to writes. In fact they will likely be quite different. That's going to be a problem for options #1 and #2, as disk IOPs will be going to the same directory.



      Option #3 is really the only way it realistically works. But again, all it takes is one ill-considered BATCH write to crush everyone. Also, want to upgrade your cluster? Now you have to coordinate it with multiple teams, instead of just one. Using SSL? Make sure multiple teams get the right certificate, instead of just one.



      When we have new teams use Cassandra, each team gets their own cluster. That way, they can't hurt anyone else, and we can support them with fewer question marks about who is doing what.






      share|improve this answer


























        4












        4








        4







        I've done this for a few years now at large-scale in the retail space. So my belief is that the recommended way to handle multi-tenancy in Cassandra, is not to. No matter how you do it, the tenants will be hit by the "noisy neighbor" problem. Just wait until one tenant runs a BATCH update with 60k writes batched to the same table, and everyone else's performance falls off.



        But the bigger problem, is that there's no way you can guarantee that each tenant will even have a similar ratio of reads to writes. In fact they will likely be quite different. That's going to be a problem for options #1 and #2, as disk IOPs will be going to the same directory.



        Option #3 is really the only way it realistically works. But again, all it takes is one ill-considered BATCH write to crush everyone. Also, want to upgrade your cluster? Now you have to coordinate it with multiple teams, instead of just one. Using SSL? Make sure multiple teams get the right certificate, instead of just one.



        When we have new teams use Cassandra, each team gets their own cluster. That way, they can't hurt anyone else, and we can support them with fewer question marks about who is doing what.






        share|improve this answer













        I've done this for a few years now at large-scale in the retail space. So my belief is that the recommended way to handle multi-tenancy in Cassandra, is not to. No matter how you do it, the tenants will be hit by the "noisy neighbor" problem. Just wait until one tenant runs a BATCH update with 60k writes batched to the same table, and everyone else's performance falls off.



        But the bigger problem, is that there's no way you can guarantee that each tenant will even have a similar ratio of reads to writes. In fact they will likely be quite different. That's going to be a problem for options #1 and #2, as disk IOPs will be going to the same directory.



        Option #3 is really the only way it realistically works. But again, all it takes is one ill-considered BATCH write to crush everyone. Also, want to upgrade your cluster? Now you have to coordinate it with multiple teams, instead of just one. Using SSL? Make sure multiple teams get the right certificate, instead of just one.



        When we have new teams use Cassandra, each team gets their own cluster. That way, they can't hurt anyone else, and we can support them with fewer question marks about who is doing what.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 21 '18 at 14:06









        AaronAaron

        34.3k107298




        34.3k107298






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53407203%2fwhat-is-the-recommended-approach-towards-multi-tenant-databases-in-cassandra%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            If I really need a card on my start hand, how many mulligans make sense? [duplicate]

            Alcedinidae

            Can an atomic nucleus contain both particles and antiparticles? [duplicate]