Does Format Preserving Encryption have significant advantages over a randomly generated lookup table?












4












$begingroup$


I have a need to anonymise phone numbers so that I can carry out testing and analysis work on telecoms data sets and comply with GDPR. I typically receive a batch of a few hundred thousand events containing phone numbers, and need to anonymise all the phone numbers in that batch under the following conditions:




  1. A definable prefix of the number should remain the same - I should only transform the right-most n digits. n will typically be between 2 and 6


  2. I should always transform the digits in the same way - abcd should always be mapped to efgh


  3. The transformation should be one-to-one - efgh should be that output for only one input


  4. I should only output digits



If I then receive a further batch of events condition 2 is removed - I can use a new transformation for a new batch.



I've considered two approaches to this requirement:




  1. Randomly create a lookup table mapping each n digit string to another n digit string - I will do this for all n digit strings as a one-time exercise prior to encrypting a batch, and generate a new lookup table for a new batch


  2. Use one of the Format Preserving Encryption algorithms - e.g. Format-preserving, Feistel-based encryption (FFX) as implemented in the PyFFX or libffx libraries, or FE1 in the botan library



In some cases I may need to reconstruct the original phone numbers in the future in which case I will store the encryption key or lookup table in some secure fashion. In some cases I know I will not need to reconstruct them in which case I will discard the key directly after encryption.



Is there any advantage to using approach 2 over approach 1? I can see the following possibilities:




  1. Using a random number generator in 1 may be slightly less robust than using FPE


  2. The encryption key for FPE can be substantially smaller than the mapping table in point 1 if there is a need to retain it


  3. A publicly available FPE implementation may contain fewer vulnerabilities than a home-grown random lookup implementation



Are there any other considerations I should have in mind?










share|improve this question









New contributor




sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    4












    $begingroup$


    I have a need to anonymise phone numbers so that I can carry out testing and analysis work on telecoms data sets and comply with GDPR. I typically receive a batch of a few hundred thousand events containing phone numbers, and need to anonymise all the phone numbers in that batch under the following conditions:




    1. A definable prefix of the number should remain the same - I should only transform the right-most n digits. n will typically be between 2 and 6


    2. I should always transform the digits in the same way - abcd should always be mapped to efgh


    3. The transformation should be one-to-one - efgh should be that output for only one input


    4. I should only output digits



    If I then receive a further batch of events condition 2 is removed - I can use a new transformation for a new batch.



    I've considered two approaches to this requirement:




    1. Randomly create a lookup table mapping each n digit string to another n digit string - I will do this for all n digit strings as a one-time exercise prior to encrypting a batch, and generate a new lookup table for a new batch


    2. Use one of the Format Preserving Encryption algorithms - e.g. Format-preserving, Feistel-based encryption (FFX) as implemented in the PyFFX or libffx libraries, or FE1 in the botan library



    In some cases I may need to reconstruct the original phone numbers in the future in which case I will store the encryption key or lookup table in some secure fashion. In some cases I know I will not need to reconstruct them in which case I will discard the key directly after encryption.



    Is there any advantage to using approach 2 over approach 1? I can see the following possibilities:




    1. Using a random number generator in 1 may be slightly less robust than using FPE


    2. The encryption key for FPE can be substantially smaller than the mapping table in point 1 if there is a need to retain it


    3. A publicly available FPE implementation may contain fewer vulnerabilities than a home-grown random lookup implementation



    Are there any other considerations I should have in mind?










    share|improve this question









    New contributor




    sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      4












      4








      4





      $begingroup$


      I have a need to anonymise phone numbers so that I can carry out testing and analysis work on telecoms data sets and comply with GDPR. I typically receive a batch of a few hundred thousand events containing phone numbers, and need to anonymise all the phone numbers in that batch under the following conditions:




      1. A definable prefix of the number should remain the same - I should only transform the right-most n digits. n will typically be between 2 and 6


      2. I should always transform the digits in the same way - abcd should always be mapped to efgh


      3. The transformation should be one-to-one - efgh should be that output for only one input


      4. I should only output digits



      If I then receive a further batch of events condition 2 is removed - I can use a new transformation for a new batch.



      I've considered two approaches to this requirement:




      1. Randomly create a lookup table mapping each n digit string to another n digit string - I will do this for all n digit strings as a one-time exercise prior to encrypting a batch, and generate a new lookup table for a new batch


      2. Use one of the Format Preserving Encryption algorithms - e.g. Format-preserving, Feistel-based encryption (FFX) as implemented in the PyFFX or libffx libraries, or FE1 in the botan library



      In some cases I may need to reconstruct the original phone numbers in the future in which case I will store the encryption key or lookup table in some secure fashion. In some cases I know I will not need to reconstruct them in which case I will discard the key directly after encryption.



      Is there any advantage to using approach 2 over approach 1? I can see the following possibilities:




      1. Using a random number generator in 1 may be slightly less robust than using FPE


      2. The encryption key for FPE can be substantially smaller than the mapping table in point 1 if there is a need to retain it


      3. A publicly available FPE implementation may contain fewer vulnerabilities than a home-grown random lookup implementation



      Are there any other considerations I should have in mind?










      share|improve this question









      New contributor




      sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      I have a need to anonymise phone numbers so that I can carry out testing and analysis work on telecoms data sets and comply with GDPR. I typically receive a batch of a few hundred thousand events containing phone numbers, and need to anonymise all the phone numbers in that batch under the following conditions:




      1. A definable prefix of the number should remain the same - I should only transform the right-most n digits. n will typically be between 2 and 6


      2. I should always transform the digits in the same way - abcd should always be mapped to efgh


      3. The transformation should be one-to-one - efgh should be that output for only one input


      4. I should only output digits



      If I then receive a further batch of events condition 2 is removed - I can use a new transformation for a new batch.



      I've considered two approaches to this requirement:




      1. Randomly create a lookup table mapping each n digit string to another n digit string - I will do this for all n digit strings as a one-time exercise prior to encrypting a batch, and generate a new lookup table for a new batch


      2. Use one of the Format Preserving Encryption algorithms - e.g. Format-preserving, Feistel-based encryption (FFX) as implemented in the PyFFX or libffx libraries, or FE1 in the botan library



      In some cases I may need to reconstruct the original phone numbers in the future in which case I will store the encryption key or lookup table in some secure fashion. In some cases I know I will not need to reconstruct them in which case I will discard the key directly after encryption.



      Is there any advantage to using approach 2 over approach 1? I can see the following possibilities:




      1. Using a random number generator in 1 may be slightly less robust than using FPE


      2. The encryption key for FPE can be substantially smaller than the mapping table in point 1 if there is a need to retain it


      3. A publicly available FPE implementation may contain fewer vulnerabilities than a home-grown random lookup implementation



      Are there any other considerations I should have in mind?







      substitution-cipher format-preserving






      share|improve this question









      New contributor




      sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited yesterday







      sumidid













      New contributor




      sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked yesterday









      sumididsumidid

      212




      212




      New contributor




      sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      sumidid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          1 Answer
          1






          active

          oldest

          votes


















          4












          $begingroup$


          any other considerations?




          Yes.




          • In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.

          • The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.

          • The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.

          • When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.

          • Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).

          • FPE's performance is predictable, when some map implementation have no specified worst-case performance.

          • FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.


          Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.






          share|improve this answer











          $endgroup$









          • 1




            $begingroup$
            Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
            $endgroup$
            – sumidid
            yesterday











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "281"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          sumidid is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f67049%2fdoes-format-preserving-encryption-have-significant-advantages-over-a-randomly-ge%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          4












          $begingroup$


          any other considerations?




          Yes.




          • In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.

          • The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.

          • The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.

          • When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.

          • Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).

          • FPE's performance is predictable, when some map implementation have no specified worst-case performance.

          • FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.


          Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.






          share|improve this answer











          $endgroup$









          • 1




            $begingroup$
            Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
            $endgroup$
            – sumidid
            yesterday
















          4












          $begingroup$


          any other considerations?




          Yes.




          • In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.

          • The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.

          • The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.

          • When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.

          • Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).

          • FPE's performance is predictable, when some map implementation have no specified worst-case performance.

          • FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.


          Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.






          share|improve this answer











          $endgroup$









          • 1




            $begingroup$
            Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
            $endgroup$
            – sumidid
            yesterday














          4












          4








          4





          $begingroup$


          any other considerations?




          Yes.




          • In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.

          • The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.

          • The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.

          • When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.

          • Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).

          • FPE's performance is predictable, when some map implementation have no specified worst-case performance.

          • FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.


          Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.






          share|improve this answer











          $endgroup$




          any other considerations?




          Yes.




          • In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.

          • The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.

          • The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.

          • When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.

          • Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).

          • FPE's performance is predictable, when some map implementation have no specified worst-case performance.

          • FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.


          Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited yesterday

























          answered yesterday









          fgrieufgrieu

          79.3k7169336




          79.3k7169336








          • 1




            $begingroup$
            Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
            $endgroup$
            – sumidid
            yesterday














          • 1




            $begingroup$
            Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
            $endgroup$
            – sumidid
            yesterday








          1




          1




          $begingroup$
          Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
          $endgroup$
          – sumidid
          yesterday




          $begingroup$
          Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
          $endgroup$
          – sumidid
          yesterday










          sumidid is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          sumidid is a new contributor. Be nice, and check out our Code of Conduct.













          sumidid is a new contributor. Be nice, and check out our Code of Conduct.












          sumidid is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Cryptography Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f67049%2fdoes-format-preserving-encryption-have-significant-advantages-over-a-randomly-ge%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Paul Cézanne

          UIScrollView CustomStickyHeader Resize height generates problems when scroll is too fast

          Angular material date-picker (MatDatepicker) auto completes the date on focus out