What is the recommended approach towards multi-tenant databases in Cassandra?
I'm thinking of creating a multi-tenant app using Apache Cassandra.
I can think of three strategies:
- All tenants in the same keyspace using tenant-specific fields for security
- table per tenant in a single shared DB
- Keyspace per tenant
The voice in my head is suggesting that I go with option 3.
Thoughts and implications, anyone?
cassandra cassandra-2.0 cassandra-3.0 spring-data-cassandra
add a comment |
I'm thinking of creating a multi-tenant app using Apache Cassandra.
I can think of three strategies:
- All tenants in the same keyspace using tenant-specific fields for security
- table per tenant in a single shared DB
- Keyspace per tenant
The voice in my head is suggesting that I go with option 3.
Thoughts and implications, anyone?
cassandra cassandra-2.0 cassandra-3.0 spring-data-cassandra
Not sure why Spring-Data-Cassandra is tagged here, as the question has nothing to do with it. But I'll say that you should really use the DataStax Java Driver. The Spring-Data-Cassandra driver uses large batches and unbound queries to mimic some of the functionality from the relational world. So Spring-Data-Cassandra is a definite no in my book; especially in a multi-tenant cluster.
– Aaron
Nov 21 '18 at 14:09
Support regarding not using spring-data-cassandra :-)
– Alex Ott
Nov 21 '18 at 14:20
How many tennants?
– phact
Nov 21 '18 at 14:36
there will be 40+ tenants expecting
– Jagan
Nov 22 '18 at 7:26
add a comment |
I'm thinking of creating a multi-tenant app using Apache Cassandra.
I can think of three strategies:
- All tenants in the same keyspace using tenant-specific fields for security
- table per tenant in a single shared DB
- Keyspace per tenant
The voice in my head is suggesting that I go with option 3.
Thoughts and implications, anyone?
cassandra cassandra-2.0 cassandra-3.0 spring-data-cassandra
I'm thinking of creating a multi-tenant app using Apache Cassandra.
I can think of three strategies:
- All tenants in the same keyspace using tenant-specific fields for security
- table per tenant in a single shared DB
- Keyspace per tenant
The voice in my head is suggesting that I go with option 3.
Thoughts and implications, anyone?
cassandra cassandra-2.0 cassandra-3.0 spring-data-cassandra
cassandra cassandra-2.0 cassandra-3.0 spring-data-cassandra
edited Nov 21 '18 at 7:56
Alex Ott
27.9k35273
27.9k35273
asked Nov 21 '18 at 7:33
JaganJagan
4016
4016
Not sure why Spring-Data-Cassandra is tagged here, as the question has nothing to do with it. But I'll say that you should really use the DataStax Java Driver. The Spring-Data-Cassandra driver uses large batches and unbound queries to mimic some of the functionality from the relational world. So Spring-Data-Cassandra is a definite no in my book; especially in a multi-tenant cluster.
– Aaron
Nov 21 '18 at 14:09
Support regarding not using spring-data-cassandra :-)
– Alex Ott
Nov 21 '18 at 14:20
How many tennants?
– phact
Nov 21 '18 at 14:36
there will be 40+ tenants expecting
– Jagan
Nov 22 '18 at 7:26
add a comment |
Not sure why Spring-Data-Cassandra is tagged here, as the question has nothing to do with it. But I'll say that you should really use the DataStax Java Driver. The Spring-Data-Cassandra driver uses large batches and unbound queries to mimic some of the functionality from the relational world. So Spring-Data-Cassandra is a definite no in my book; especially in a multi-tenant cluster.
– Aaron
Nov 21 '18 at 14:09
Support regarding not using spring-data-cassandra :-)
– Alex Ott
Nov 21 '18 at 14:20
How many tennants?
– phact
Nov 21 '18 at 14:36
there will be 40+ tenants expecting
– Jagan
Nov 22 '18 at 7:26
Not sure why Spring-Data-Cassandra is tagged here, as the question has nothing to do with it. But I'll say that you should really use the DataStax Java Driver. The Spring-Data-Cassandra driver uses large batches and unbound queries to mimic some of the functionality from the relational world. So Spring-Data-Cassandra is a definite no in my book; especially in a multi-tenant cluster.
– Aaron
Nov 21 '18 at 14:09
Not sure why Spring-Data-Cassandra is tagged here, as the question has nothing to do with it. But I'll say that you should really use the DataStax Java Driver. The Spring-Data-Cassandra driver uses large batches and unbound queries to mimic some of the functionality from the relational world. So Spring-Data-Cassandra is a definite no in my book; especially in a multi-tenant cluster.
– Aaron
Nov 21 '18 at 14:09
Support regarding not using spring-data-cassandra :-)
– Alex Ott
Nov 21 '18 at 14:20
Support regarding not using spring-data-cassandra :-)
– Alex Ott
Nov 21 '18 at 14:20
How many tennants?
– phact
Nov 21 '18 at 14:36
How many tennants?
– phact
Nov 21 '18 at 14:36
there will be 40+ tenants expecting
– Jagan
Nov 22 '18 at 7:26
there will be 40+ tenants expecting
– Jagan
Nov 22 '18 at 7:26
add a comment |
2 Answers
2
active
oldest
votes
There are several considerations that you need to take into account:
Option 1: In pure Cassandra this option will work only if access to database will be always through "proxy" - the API, for example, that will enforce filtering on tenant field. Otherwise, if you provide an CQL access, then everybody can read all data. In this case, you need also to create data model carefully, to have tenant as a part of composite partition key. DataStax Enterprise (DSE) has additional functionality called row-level access control (RLAC) that allows to set permissions on the table level.
Options 2 & 3: are quite similar, except that when you have a keyspace per tenant, then you have flexibility to setup different replication strategy - this could be useful to store customer's data in different data centers bound to different geographic regions. But in both cases there are limitations on the number of tables in the cluster - reasonable number of tables is around 200, with "hard stop" on more than 500. The reason - you need an additional resources, such as memory, to keep auxiliary data structures (bloom filter, etc.) for every table, and this will consume both heap & off-heap memory.
Thanks for the suggestion. Please share if you have any working examples on the multi-tenancy with Cassandra with Option 3.
– Jagan
Nov 22 '18 at 7:28
it's just standard Cassandra functionality - you create keyspace and configure data centers
– Alex Ott
Nov 22 '18 at 9:15
add a comment |
I've done this for a few years now at large-scale in the retail space. So my belief is that the recommended way to handle multi-tenancy in Cassandra, is not to. No matter how you do it, the tenants will be hit by the "noisy neighbor" problem. Just wait until one tenant runs a BATCH update with 60k writes batched to the same table, and everyone else's performance falls off.
But the bigger problem, is that there's no way you can guarantee that each tenant will even have a similar ratio of reads to writes. In fact they will likely be quite different. That's going to be a problem for options #1 and #2, as disk IOPs will be going to the same directory.
Option #3 is really the only way it realistically works. But again, all it takes is one ill-considered BATCH write to crush everyone. Also, want to upgrade your cluster? Now you have to coordinate it with multiple teams, instead of just one. Using SSL? Make sure multiple teams get the right certificate, instead of just one.
When we have new teams use Cassandra, each team gets their own cluster. That way, they can't hurt anyone else, and we can support them with fewer question marks about who is doing what.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53407203%2fwhat-is-the-recommended-approach-towards-multi-tenant-databases-in-cassandra%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
There are several considerations that you need to take into account:
Option 1: In pure Cassandra this option will work only if access to database will be always through "proxy" - the API, for example, that will enforce filtering on tenant field. Otherwise, if you provide an CQL access, then everybody can read all data. In this case, you need also to create data model carefully, to have tenant as a part of composite partition key. DataStax Enterprise (DSE) has additional functionality called row-level access control (RLAC) that allows to set permissions on the table level.
Options 2 & 3: are quite similar, except that when you have a keyspace per tenant, then you have flexibility to setup different replication strategy - this could be useful to store customer's data in different data centers bound to different geographic regions. But in both cases there are limitations on the number of tables in the cluster - reasonable number of tables is around 200, with "hard stop" on more than 500. The reason - you need an additional resources, such as memory, to keep auxiliary data structures (bloom filter, etc.) for every table, and this will consume both heap & off-heap memory.
Thanks for the suggestion. Please share if you have any working examples on the multi-tenancy with Cassandra with Option 3.
– Jagan
Nov 22 '18 at 7:28
it's just standard Cassandra functionality - you create keyspace and configure data centers
– Alex Ott
Nov 22 '18 at 9:15
add a comment |
There are several considerations that you need to take into account:
Option 1: In pure Cassandra this option will work only if access to database will be always through "proxy" - the API, for example, that will enforce filtering on tenant field. Otherwise, if you provide an CQL access, then everybody can read all data. In this case, you need also to create data model carefully, to have tenant as a part of composite partition key. DataStax Enterprise (DSE) has additional functionality called row-level access control (RLAC) that allows to set permissions on the table level.
Options 2 & 3: are quite similar, except that when you have a keyspace per tenant, then you have flexibility to setup different replication strategy - this could be useful to store customer's data in different data centers bound to different geographic regions. But in both cases there are limitations on the number of tables in the cluster - reasonable number of tables is around 200, with "hard stop" on more than 500. The reason - you need an additional resources, such as memory, to keep auxiliary data structures (bloom filter, etc.) for every table, and this will consume both heap & off-heap memory.
Thanks for the suggestion. Please share if you have any working examples on the multi-tenancy with Cassandra with Option 3.
– Jagan
Nov 22 '18 at 7:28
it's just standard Cassandra functionality - you create keyspace and configure data centers
– Alex Ott
Nov 22 '18 at 9:15
add a comment |
There are several considerations that you need to take into account:
Option 1: In pure Cassandra this option will work only if access to database will be always through "proxy" - the API, for example, that will enforce filtering on tenant field. Otherwise, if you provide an CQL access, then everybody can read all data. In this case, you need also to create data model carefully, to have tenant as a part of composite partition key. DataStax Enterprise (DSE) has additional functionality called row-level access control (RLAC) that allows to set permissions on the table level.
Options 2 & 3: are quite similar, except that when you have a keyspace per tenant, then you have flexibility to setup different replication strategy - this could be useful to store customer's data in different data centers bound to different geographic regions. But in both cases there are limitations on the number of tables in the cluster - reasonable number of tables is around 200, with "hard stop" on more than 500. The reason - you need an additional resources, such as memory, to keep auxiliary data structures (bloom filter, etc.) for every table, and this will consume both heap & off-heap memory.
There are several considerations that you need to take into account:
Option 1: In pure Cassandra this option will work only if access to database will be always through "proxy" - the API, for example, that will enforce filtering on tenant field. Otherwise, if you provide an CQL access, then everybody can read all data. In this case, you need also to create data model carefully, to have tenant as a part of composite partition key. DataStax Enterprise (DSE) has additional functionality called row-level access control (RLAC) that allows to set permissions on the table level.
Options 2 & 3: are quite similar, except that when you have a keyspace per tenant, then you have flexibility to setup different replication strategy - this could be useful to store customer's data in different data centers bound to different geographic regions. But in both cases there are limitations on the number of tables in the cluster - reasonable number of tables is around 200, with "hard stop" on more than 500. The reason - you need an additional resources, such as memory, to keep auxiliary data structures (bloom filter, etc.) for every table, and this will consume both heap & off-heap memory.
answered Nov 21 '18 at 8:06
Alex OttAlex Ott
27.9k35273
27.9k35273
Thanks for the suggestion. Please share if you have any working examples on the multi-tenancy with Cassandra with Option 3.
– Jagan
Nov 22 '18 at 7:28
it's just standard Cassandra functionality - you create keyspace and configure data centers
– Alex Ott
Nov 22 '18 at 9:15
add a comment |
Thanks for the suggestion. Please share if you have any working examples on the multi-tenancy with Cassandra with Option 3.
– Jagan
Nov 22 '18 at 7:28
it's just standard Cassandra functionality - you create keyspace and configure data centers
– Alex Ott
Nov 22 '18 at 9:15
Thanks for the suggestion. Please share if you have any working examples on the multi-tenancy with Cassandra with Option 3.
– Jagan
Nov 22 '18 at 7:28
Thanks for the suggestion. Please share if you have any working examples on the multi-tenancy with Cassandra with Option 3.
– Jagan
Nov 22 '18 at 7:28
it's just standard Cassandra functionality - you create keyspace and configure data centers
– Alex Ott
Nov 22 '18 at 9:15
it's just standard Cassandra functionality - you create keyspace and configure data centers
– Alex Ott
Nov 22 '18 at 9:15
add a comment |
I've done this for a few years now at large-scale in the retail space. So my belief is that the recommended way to handle multi-tenancy in Cassandra, is not to. No matter how you do it, the tenants will be hit by the "noisy neighbor" problem. Just wait until one tenant runs a BATCH update with 60k writes batched to the same table, and everyone else's performance falls off.
But the bigger problem, is that there's no way you can guarantee that each tenant will even have a similar ratio of reads to writes. In fact they will likely be quite different. That's going to be a problem for options #1 and #2, as disk IOPs will be going to the same directory.
Option #3 is really the only way it realistically works. But again, all it takes is one ill-considered BATCH write to crush everyone. Also, want to upgrade your cluster? Now you have to coordinate it with multiple teams, instead of just one. Using SSL? Make sure multiple teams get the right certificate, instead of just one.
When we have new teams use Cassandra, each team gets their own cluster. That way, they can't hurt anyone else, and we can support them with fewer question marks about who is doing what.
add a comment |
I've done this for a few years now at large-scale in the retail space. So my belief is that the recommended way to handle multi-tenancy in Cassandra, is not to. No matter how you do it, the tenants will be hit by the "noisy neighbor" problem. Just wait until one tenant runs a BATCH update with 60k writes batched to the same table, and everyone else's performance falls off.
But the bigger problem, is that there's no way you can guarantee that each tenant will even have a similar ratio of reads to writes. In fact they will likely be quite different. That's going to be a problem for options #1 and #2, as disk IOPs will be going to the same directory.
Option #3 is really the only way it realistically works. But again, all it takes is one ill-considered BATCH write to crush everyone. Also, want to upgrade your cluster? Now you have to coordinate it with multiple teams, instead of just one. Using SSL? Make sure multiple teams get the right certificate, instead of just one.
When we have new teams use Cassandra, each team gets their own cluster. That way, they can't hurt anyone else, and we can support them with fewer question marks about who is doing what.
add a comment |
I've done this for a few years now at large-scale in the retail space. So my belief is that the recommended way to handle multi-tenancy in Cassandra, is not to. No matter how you do it, the tenants will be hit by the "noisy neighbor" problem. Just wait until one tenant runs a BATCH update with 60k writes batched to the same table, and everyone else's performance falls off.
But the bigger problem, is that there's no way you can guarantee that each tenant will even have a similar ratio of reads to writes. In fact they will likely be quite different. That's going to be a problem for options #1 and #2, as disk IOPs will be going to the same directory.
Option #3 is really the only way it realistically works. But again, all it takes is one ill-considered BATCH write to crush everyone. Also, want to upgrade your cluster? Now you have to coordinate it with multiple teams, instead of just one. Using SSL? Make sure multiple teams get the right certificate, instead of just one.
When we have new teams use Cassandra, each team gets their own cluster. That way, they can't hurt anyone else, and we can support them with fewer question marks about who is doing what.
I've done this for a few years now at large-scale in the retail space. So my belief is that the recommended way to handle multi-tenancy in Cassandra, is not to. No matter how you do it, the tenants will be hit by the "noisy neighbor" problem. Just wait until one tenant runs a BATCH update with 60k writes batched to the same table, and everyone else's performance falls off.
But the bigger problem, is that there's no way you can guarantee that each tenant will even have a similar ratio of reads to writes. In fact they will likely be quite different. That's going to be a problem for options #1 and #2, as disk IOPs will be going to the same directory.
Option #3 is really the only way it realistically works. But again, all it takes is one ill-considered BATCH write to crush everyone. Also, want to upgrade your cluster? Now you have to coordinate it with multiple teams, instead of just one. Using SSL? Make sure multiple teams get the right certificate, instead of just one.
When we have new teams use Cassandra, each team gets their own cluster. That way, they can't hurt anyone else, and we can support them with fewer question marks about who is doing what.
answered Nov 21 '18 at 14:06
AaronAaron
34.3k107298
34.3k107298
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53407203%2fwhat-is-the-recommended-approach-towards-multi-tenant-databases-in-cassandra%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Not sure why Spring-Data-Cassandra is tagged here, as the question has nothing to do with it. But I'll say that you should really use the DataStax Java Driver. The Spring-Data-Cassandra driver uses large batches and unbound queries to mimic some of the functionality from the relational world. So Spring-Data-Cassandra is a definite no in my book; especially in a multi-tenant cluster.
– Aaron
Nov 21 '18 at 14:09
Support regarding not using spring-data-cassandra :-)
– Alex Ott
Nov 21 '18 at 14:20
How many tennants?
– phact
Nov 21 '18 at 14:36
there will be 40+ tenants expecting
– Jagan
Nov 22 '18 at 7:26