Does Format Preserving Encryption have significant advantages over a randomly generated lookup table?
$begingroup$
I have a need to anonymise phone numbers so that I can carry out testing and analysis work on telecoms data sets and comply with GDPR. I typically receive a batch of a few hundred thousand events containing phone numbers, and need to anonymise all the phone numbers in that batch under the following conditions:
A definable prefix of the number should remain the same - I should only transform the right-most n digits. n will typically be between 2 and 6
I should always transform the digits in the same way - abcd should always be mapped to efgh
The transformation should be one-to-one - efgh should be that output for only one input
I should only output digits
If I then receive a further batch of events condition 2 is removed - I can use a new transformation for a new batch.
I've considered two approaches to this requirement:
Randomly create a lookup table mapping each n digit string to another n digit string - I will do this for all n digit strings as a one-time exercise prior to encrypting a batch, and generate a new lookup table for a new batch
Use one of the Format Preserving Encryption algorithms - e.g. Format-preserving, Feistel-based encryption (FFX) as implemented in the PyFFX or libffx libraries, or FE1 in the botan library
In some cases I may need to reconstruct the original phone numbers in the future in which case I will store the encryption key or lookup table in some secure fashion. In some cases I know I will not need to reconstruct them in which case I will discard the key directly after encryption.
Is there any advantage to using approach 2 over approach 1? I can see the following possibilities:
Using a random number generator in 1 may be slightly less robust than using FPE
The encryption key for FPE can be substantially smaller than the mapping table in point 1 if there is a need to retain it
A publicly available FPE implementation may contain fewer vulnerabilities than a home-grown random lookup implementation
Are there any other considerations I should have in mind?
substitution-cipher format-preserving
New contributor
sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
I have a need to anonymise phone numbers so that I can carry out testing and analysis work on telecoms data sets and comply with GDPR. I typically receive a batch of a few hundred thousand events containing phone numbers, and need to anonymise all the phone numbers in that batch under the following conditions:
A definable prefix of the number should remain the same - I should only transform the right-most n digits. n will typically be between 2 and 6
I should always transform the digits in the same way - abcd should always be mapped to efgh
The transformation should be one-to-one - efgh should be that output for only one input
I should only output digits
If I then receive a further batch of events condition 2 is removed - I can use a new transformation for a new batch.
I've considered two approaches to this requirement:
Randomly create a lookup table mapping each n digit string to another n digit string - I will do this for all n digit strings as a one-time exercise prior to encrypting a batch, and generate a new lookup table for a new batch
Use one of the Format Preserving Encryption algorithms - e.g. Format-preserving, Feistel-based encryption (FFX) as implemented in the PyFFX or libffx libraries, or FE1 in the botan library
In some cases I may need to reconstruct the original phone numbers in the future in which case I will store the encryption key or lookup table in some secure fashion. In some cases I know I will not need to reconstruct them in which case I will discard the key directly after encryption.
Is there any advantage to using approach 2 over approach 1? I can see the following possibilities:
Using a random number generator in 1 may be slightly less robust than using FPE
The encryption key for FPE can be substantially smaller than the mapping table in point 1 if there is a need to retain it
A publicly available FPE implementation may contain fewer vulnerabilities than a home-grown random lookup implementation
Are there any other considerations I should have in mind?
substitution-cipher format-preserving
New contributor
sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
I have a need to anonymise phone numbers so that I can carry out testing and analysis work on telecoms data sets and comply with GDPR. I typically receive a batch of a few hundred thousand events containing phone numbers, and need to anonymise all the phone numbers in that batch under the following conditions:
A definable prefix of the number should remain the same - I should only transform the right-most n digits. n will typically be between 2 and 6
I should always transform the digits in the same way - abcd should always be mapped to efgh
The transformation should be one-to-one - efgh should be that output for only one input
I should only output digits
If I then receive a further batch of events condition 2 is removed - I can use a new transformation for a new batch.
I've considered two approaches to this requirement:
Randomly create a lookup table mapping each n digit string to another n digit string - I will do this for all n digit strings as a one-time exercise prior to encrypting a batch, and generate a new lookup table for a new batch
Use one of the Format Preserving Encryption algorithms - e.g. Format-preserving, Feistel-based encryption (FFX) as implemented in the PyFFX or libffx libraries, or FE1 in the botan library
In some cases I may need to reconstruct the original phone numbers in the future in which case I will store the encryption key or lookup table in some secure fashion. In some cases I know I will not need to reconstruct them in which case I will discard the key directly after encryption.
Is there any advantage to using approach 2 over approach 1? I can see the following possibilities:
Using a random number generator in 1 may be slightly less robust than using FPE
The encryption key for FPE can be substantially smaller than the mapping table in point 1 if there is a need to retain it
A publicly available FPE implementation may contain fewer vulnerabilities than a home-grown random lookup implementation
Are there any other considerations I should have in mind?
substitution-cipher format-preserving
New contributor
sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
I have a need to anonymise phone numbers so that I can carry out testing and analysis work on telecoms data sets and comply with GDPR. I typically receive a batch of a few hundred thousand events containing phone numbers, and need to anonymise all the phone numbers in that batch under the following conditions:
A definable prefix of the number should remain the same - I should only transform the right-most n digits. n will typically be between 2 and 6
I should always transform the digits in the same way - abcd should always be mapped to efgh
The transformation should be one-to-one - efgh should be that output for only one input
I should only output digits
If I then receive a further batch of events condition 2 is removed - I can use a new transformation for a new batch.
I've considered two approaches to this requirement:
Randomly create a lookup table mapping each n digit string to another n digit string - I will do this for all n digit strings as a one-time exercise prior to encrypting a batch, and generate a new lookup table for a new batch
Use one of the Format Preserving Encryption algorithms - e.g. Format-preserving, Feistel-based encryption (FFX) as implemented in the PyFFX or libffx libraries, or FE1 in the botan library
In some cases I may need to reconstruct the original phone numbers in the future in which case I will store the encryption key or lookup table in some secure fashion. In some cases I know I will not need to reconstruct them in which case I will discard the key directly after encryption.
Is there any advantage to using approach 2 over approach 1? I can see the following possibilities:
Using a random number generator in 1 may be slightly less robust than using FPE
The encryption key for FPE can be substantially smaller than the mapping table in point 1 if there is a need to retain it
A publicly available FPE implementation may contain fewer vulnerabilities than a home-grown random lookup implementation
Are there any other considerations I should have in mind?
substitution-cipher format-preserving
substitution-cipher format-preserving
New contributor
sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited yesterday
sumidid
New contributor
sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked yesterday
sumididsumidid
212
212
New contributor
sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
any other considerations?
Yes.
- In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.
- The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.
- The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.
- When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.
- Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).
- FPE's performance is predictable, when some map implementation have no specified worst-case performance.
- FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.
Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.
$endgroup$
1
$begingroup$
Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
$endgroup$
– sumidid
yesterday
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "281"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
sumidid is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f67049%2fdoes-format-preserving-encryption-have-significant-advantages-over-a-randomly-ge%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
any other considerations?
Yes.
- In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.
- The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.
- The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.
- When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.
- Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).
- FPE's performance is predictable, when some map implementation have no specified worst-case performance.
- FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.
Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.
$endgroup$
1
$begingroup$
Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
$endgroup$
– sumidid
yesterday
add a comment |
$begingroup$
any other considerations?
Yes.
- In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.
- The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.
- The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.
- When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.
- Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).
- FPE's performance is predictable, when some map implementation have no specified worst-case performance.
- FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.
Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.
$endgroup$
1
$begingroup$
Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
$endgroup$
– sumidid
yesterday
add a comment |
$begingroup$
any other considerations?
Yes.
- In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.
- The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.
- The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.
- When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.
- Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).
- FPE's performance is predictable, when some map implementation have no specified worst-case performance.
- FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.
Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.
$endgroup$
any other considerations?
Yes.
- In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.
- The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.
- The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.
- When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.
- Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).
- FPE's performance is predictable, when some map implementation have no specified worst-case performance.
- FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.
Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.
edited yesterday
answered yesterday
fgrieufgrieu
79.3k7169336
79.3k7169336
1
$begingroup$
Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
$endgroup$
– sumidid
yesterday
add a comment |
1
$begingroup$
Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
$endgroup$
– sumidid
yesterday
1
1
$begingroup$
Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
$endgroup$
– sumidid
yesterday
$begingroup$
Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
$endgroup$
– sumidid
yesterday
add a comment |
sumidid is a new contributor. Be nice, and check out our Code of Conduct.
sumidid is a new contributor. Be nice, and check out our Code of Conduct.
sumidid is a new contributor. Be nice, and check out our Code of Conduct.
sumidid is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Cryptography Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f67049%2fdoes-format-preserving-encryption-have-significant-advantages-over-a-randomly-ge%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown