Parse all strings of specific length?












2














I've exported my email archive of 10 years which is very large.



I want to parse all the text for any string that is 64 characters long in search of a bitcoin private key.



How can I parse strings of a certain length in characters?










share|improve this question
























  • What format are the emails in? Plain text? Maildir?
    – Sparhawk
    Jan 6 at 0:13










  • @Sparhawk I'm still downloading the files, I'm hoping they are in txt or something I can cat and parse with a pipe.
    – Philip Kirkbride
    Jan 6 at 0:14










  • @Sparhawk file type is mbox, hoping to convert it into txt for easy parsing
    – Philip Kirkbride
    Jan 6 at 0:34






  • 1




    What do you mean by "parse" here? Do you just want to find all strings of exactly 64 characters and then parse them? And how are strings defined? Can we assume you mean things that are delineated with whitespace? So anything with a space, a tab, a newline etc on either side of it? And what operating system are you using? Is it Linux? Can we assume you have access to GNU tools?
    – terdon
    Jan 6 at 0:43








  • 1




    The entire email file is a "string of certain length"; IMHO, the string you're looking for consists of certain characters and is delimited in some way. What can you say about the string besides it being 64 of some character?
    – Jeff Schaller
    Jan 6 at 3:31
















2














I've exported my email archive of 10 years which is very large.



I want to parse all the text for any string that is 64 characters long in search of a bitcoin private key.



How can I parse strings of a certain length in characters?










share|improve this question
























  • What format are the emails in? Plain text? Maildir?
    – Sparhawk
    Jan 6 at 0:13










  • @Sparhawk I'm still downloading the files, I'm hoping they are in txt or something I can cat and parse with a pipe.
    – Philip Kirkbride
    Jan 6 at 0:14










  • @Sparhawk file type is mbox, hoping to convert it into txt for easy parsing
    – Philip Kirkbride
    Jan 6 at 0:34






  • 1




    What do you mean by "parse" here? Do you just want to find all strings of exactly 64 characters and then parse them? And how are strings defined? Can we assume you mean things that are delineated with whitespace? So anything with a space, a tab, a newline etc on either side of it? And what operating system are you using? Is it Linux? Can we assume you have access to GNU tools?
    – terdon
    Jan 6 at 0:43








  • 1




    The entire email file is a "string of certain length"; IMHO, the string you're looking for consists of certain characters and is delimited in some way. What can you say about the string besides it being 64 of some character?
    – Jeff Schaller
    Jan 6 at 3:31














2












2








2


2





I've exported my email archive of 10 years which is very large.



I want to parse all the text for any string that is 64 characters long in search of a bitcoin private key.



How can I parse strings of a certain length in characters?










share|improve this question















I've exported my email archive of 10 years which is very large.



I want to parse all the text for any string that is 64 characters long in search of a bitcoin private key.



How can I parse strings of a certain length in characters?







text-processing files wildcards pattern-matching






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 6 at 1:09









terdon

129k31253428




129k31253428










asked Jan 6 at 0:06









Philip KirkbridePhilip Kirkbride

2,4212984




2,4212984












  • What format are the emails in? Plain text? Maildir?
    – Sparhawk
    Jan 6 at 0:13










  • @Sparhawk I'm still downloading the files, I'm hoping they are in txt or something I can cat and parse with a pipe.
    – Philip Kirkbride
    Jan 6 at 0:14










  • @Sparhawk file type is mbox, hoping to convert it into txt for easy parsing
    – Philip Kirkbride
    Jan 6 at 0:34






  • 1




    What do you mean by "parse" here? Do you just want to find all strings of exactly 64 characters and then parse them? And how are strings defined? Can we assume you mean things that are delineated with whitespace? So anything with a space, a tab, a newline etc on either side of it? And what operating system are you using? Is it Linux? Can we assume you have access to GNU tools?
    – terdon
    Jan 6 at 0:43








  • 1




    The entire email file is a "string of certain length"; IMHO, the string you're looking for consists of certain characters and is delimited in some way. What can you say about the string besides it being 64 of some character?
    – Jeff Schaller
    Jan 6 at 3:31


















  • What format are the emails in? Plain text? Maildir?
    – Sparhawk
    Jan 6 at 0:13










  • @Sparhawk I'm still downloading the files, I'm hoping they are in txt or something I can cat and parse with a pipe.
    – Philip Kirkbride
    Jan 6 at 0:14










  • @Sparhawk file type is mbox, hoping to convert it into txt for easy parsing
    – Philip Kirkbride
    Jan 6 at 0:34






  • 1




    What do you mean by "parse" here? Do you just want to find all strings of exactly 64 characters and then parse them? And how are strings defined? Can we assume you mean things that are delineated with whitespace? So anything with a space, a tab, a newline etc on either side of it? And what operating system are you using? Is it Linux? Can we assume you have access to GNU tools?
    – terdon
    Jan 6 at 0:43








  • 1




    The entire email file is a "string of certain length"; IMHO, the string you're looking for consists of certain characters and is delimited in some way. What can you say about the string besides it being 64 of some character?
    – Jeff Schaller
    Jan 6 at 3:31
















What format are the emails in? Plain text? Maildir?
– Sparhawk
Jan 6 at 0:13




What format are the emails in? Plain text? Maildir?
– Sparhawk
Jan 6 at 0:13












@Sparhawk I'm still downloading the files, I'm hoping they are in txt or something I can cat and parse with a pipe.
– Philip Kirkbride
Jan 6 at 0:14




@Sparhawk I'm still downloading the files, I'm hoping they are in txt or something I can cat and parse with a pipe.
– Philip Kirkbride
Jan 6 at 0:14












@Sparhawk file type is mbox, hoping to convert it into txt for easy parsing
– Philip Kirkbride
Jan 6 at 0:34




@Sparhawk file type is mbox, hoping to convert it into txt for easy parsing
– Philip Kirkbride
Jan 6 at 0:34




1




1




What do you mean by "parse" here? Do you just want to find all strings of exactly 64 characters and then parse them? And how are strings defined? Can we assume you mean things that are delineated with whitespace? So anything with a space, a tab, a newline etc on either side of it? And what operating system are you using? Is it Linux? Can we assume you have access to GNU tools?
– terdon
Jan 6 at 0:43






What do you mean by "parse" here? Do you just want to find all strings of exactly 64 characters and then parse them? And how are strings defined? Can we assume you mean things that are delineated with whitespace? So anything with a space, a tab, a newline etc on either side of it? And what operating system are you using? Is it Linux? Can we assume you have access to GNU tools?
– terdon
Jan 6 at 0:43






1




1




The entire email file is a "string of certain length"; IMHO, the string you're looking for consists of certain characters and is delimited in some way. What can you say about the string besides it being 64 of some character?
– Jeff Schaller
Jan 6 at 3:31




The entire email file is a "string of certain length"; IMHO, the string you're looking for consists of certain characters and is delimited in some way. What can you say about the string besides it being 64 of some character?
– Jeff Schaller
Jan 6 at 3:31










4 Answers
4






active

oldest

votes


















3














If you have GNU grep (default on Linux), you can do:



grep -Po '(^|s)S{64}(s|$)' file


The -P enables Perl Compatible Regular Expressions, which give us b (word-boundaries) S (non-whitespace) and {N} (find exactly N characters), and the -o means "print only the matching part of the line. Then, we look for stretches of non-whitespace that are exactly 64 characters long that are either at the beginning of the line (^) or after whitespace ('s) and which end either at the end of the line ($) or with another whitespace character.



Note that the result will include any whitespace characters at the beginning and end of the string, so if you want to parse this further, you might want to use this instead:



grep -Po '(^|s)KS{64}(?=s|$)'


That will look for a whitespace character or the beginning of the string (s|^), then discard it K and then look for 64 non-whitespace characters followed by (the (?=foo) is called a "lookahead" and will not be included in the match) either a whitespace character, or the end of the line.






share|improve this answer























  • pcre also gives us negative lookahead and lookbehind assertions: grep -Po '(?<!S)S{64}(?!S)' is enough to find runs of 64 non-spaces; but please read my answer for why that's probably not what's intended.
    – pizdelect
    Jan 6 at 1:23










  • @pizdelect yes, I know negative lookbehinds, but I find the K approach much more readable and elegant.
    – terdon
    Jan 6 at 3:09










  • elegance is not my strong point, but your 1st example will also include the delimiting spaces in the output, and the 2nd example will not match words which start at the beginning of the line.
    – pizdelect
    Jan 6 at 3:31










  • @pizdelect well yes, that's why I included the second example with the lookarounds. But you're right about the second, that's a typo (see the description which I ncludes the start), thanks for pointing it out. Fixed now.
    – terdon
    Jan 6 at 3:37



















7














If you mean to search for a 256-bit number in hexadecimal form (64 chars from the range 0-9 and A-F -- one of the formats in which a bitcoin private key could appear), this should do:



egrep -aro '<[A-F0-9]{64}>' files and dirs ...


Add the -i option or also include the a-f range if some of the keys are in lowercase.



For the general problem of finding runs of characters from the same class having a specified length, you would better use pcre regexps, which could be used with GNU grep with the -P option. For instance, to find runs of uppercase letters from any charset, of min length of 2 and max length of 4, and which are delimited by chars which are not uppercase letters:



echo ÁRVÍZtűrő tükörFÚRÓgép |
LC_CTYPE=en_US.UTF-8 grep -Po '(?<!p{Lu})p{Lu}{2,4}(?!p{Lu})'
FÚRÓ


Replace p{Lu} with p{Ll} for lowercase letters, S for non-spaces, etc. See here and here for the full list.



(?<!...) and (?!...) are negative lookbehind and lookahead zero-width assertions; e.g. (?<!<)w(?!>) will match a "word" character when not bracketed by < and >. The < zero-width assertion from vi could be implemented by (?<!w)(?=w).






share|improve this answer































    6














    If you want to find all words of length 64 from /path/to/file, you can use



    tr -c '[[:alnum:]]' 'n' < /path/to/file | grep '^.{64}$'


    This replaces all non-alphanumeric characters by newlines, so each word is on its own line. Then it filters this result to include only the words of length 64.






    share|improve this answer























    • What about dot (.), comma (,), colon (:), semicolon (;) and many other usual punctuation characters, shoudln't those also be converted to a newline ?
      – Isaac
      Jan 6 at 1:52










    • @Isaac why? Why are you assuming they can't appear inside the target string?
      – terdon
      Jan 6 at 3:08






    • 1




      @terdon Because I assume that the target string is a bitcoin private key which is usually a 64 hex character string. Hmmm, ooops, sorry, the OP stated that: I am not assuming then.
      – Isaac
      Jan 6 at 5:05






    • 1




      @Isaac You're not wrong
      – Fox
      2 days ago



















    2














    It seems that grep is the correct tool to "search" for an string. What is left to do is to define such string with a regex. The first issue is to define the limits of a word. It is not as simple as "an space", as a book, a lamp use , as word delimiter, in the same concept, many other characters, or even the start or end of a line could act as word delimiter. There are some word delimiters in GNU grep:





    • < word start.


    • > word end.


    • b word boundary.


    All of them assume that a word is a sequence of [a-zA-Z0-9_] characters. If that is ok for you, this regex could work:



     grep -o '<.{64}>' file


    If you could use extended regex, the could be reduced:



     grep -oE '<.{64}>' file


    That selects from a "word start" (<), 64 ({64}) characters (.), to a "word end" (>) and prints only the matching (-o) parts.



    However, the dot (.) will match any character, that may be too much.



    If you want to be more strict on the selection (hex digits), use:



     grep -oE '<[0-9a-fA-F]{64}>' file


    Which will allow hex digits in lowercase or uppercase. But if you really want to be strict, as some non-ASCII characters might be included, use:



     LC_ALL=C grep -oE '<[0-9a-fA-F]{64}>' file


    Some implementations of grep (as grep -P) do not have a "start of word" or "end of word" (as < and >) but have "word boundary" (as b):



    grep -oP 'b[0-9a-fA-F]{64}b' file


    There are some languages that accept the POSIX word boundaries [[:<:]] and [[:>:]], but not perl, and only from PCRE 8.34.



    And there are a lot more flavors of "word boundaries".






    share|improve this answer























      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "106"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492725%2fparse-all-strings-of-specific-length%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      3














      If you have GNU grep (default on Linux), you can do:



      grep -Po '(^|s)S{64}(s|$)' file


      The -P enables Perl Compatible Regular Expressions, which give us b (word-boundaries) S (non-whitespace) and {N} (find exactly N characters), and the -o means "print only the matching part of the line. Then, we look for stretches of non-whitespace that are exactly 64 characters long that are either at the beginning of the line (^) or after whitespace ('s) and which end either at the end of the line ($) or with another whitespace character.



      Note that the result will include any whitespace characters at the beginning and end of the string, so if you want to parse this further, you might want to use this instead:



      grep -Po '(^|s)KS{64}(?=s|$)'


      That will look for a whitespace character or the beginning of the string (s|^), then discard it K and then look for 64 non-whitespace characters followed by (the (?=foo) is called a "lookahead" and will not be included in the match) either a whitespace character, or the end of the line.






      share|improve this answer























      • pcre also gives us negative lookahead and lookbehind assertions: grep -Po '(?<!S)S{64}(?!S)' is enough to find runs of 64 non-spaces; but please read my answer for why that's probably not what's intended.
        – pizdelect
        Jan 6 at 1:23










      • @pizdelect yes, I know negative lookbehinds, but I find the K approach much more readable and elegant.
        – terdon
        Jan 6 at 3:09










      • elegance is not my strong point, but your 1st example will also include the delimiting spaces in the output, and the 2nd example will not match words which start at the beginning of the line.
        – pizdelect
        Jan 6 at 3:31










      • @pizdelect well yes, that's why I included the second example with the lookarounds. But you're right about the second, that's a typo (see the description which I ncludes the start), thanks for pointing it out. Fixed now.
        – terdon
        Jan 6 at 3:37
















      3














      If you have GNU grep (default on Linux), you can do:



      grep -Po '(^|s)S{64}(s|$)' file


      The -P enables Perl Compatible Regular Expressions, which give us b (word-boundaries) S (non-whitespace) and {N} (find exactly N characters), and the -o means "print only the matching part of the line. Then, we look for stretches of non-whitespace that are exactly 64 characters long that are either at the beginning of the line (^) or after whitespace ('s) and which end either at the end of the line ($) or with another whitespace character.



      Note that the result will include any whitespace characters at the beginning and end of the string, so if you want to parse this further, you might want to use this instead:



      grep -Po '(^|s)KS{64}(?=s|$)'


      That will look for a whitespace character or the beginning of the string (s|^), then discard it K and then look for 64 non-whitespace characters followed by (the (?=foo) is called a "lookahead" and will not be included in the match) either a whitespace character, or the end of the line.






      share|improve this answer























      • pcre also gives us negative lookahead and lookbehind assertions: grep -Po '(?<!S)S{64}(?!S)' is enough to find runs of 64 non-spaces; but please read my answer for why that's probably not what's intended.
        – pizdelect
        Jan 6 at 1:23










      • @pizdelect yes, I know negative lookbehinds, but I find the K approach much more readable and elegant.
        – terdon
        Jan 6 at 3:09










      • elegance is not my strong point, but your 1st example will also include the delimiting spaces in the output, and the 2nd example will not match words which start at the beginning of the line.
        – pizdelect
        Jan 6 at 3:31










      • @pizdelect well yes, that's why I included the second example with the lookarounds. But you're right about the second, that's a typo (see the description which I ncludes the start), thanks for pointing it out. Fixed now.
        – terdon
        Jan 6 at 3:37














      3












      3








      3






      If you have GNU grep (default on Linux), you can do:



      grep -Po '(^|s)S{64}(s|$)' file


      The -P enables Perl Compatible Regular Expressions, which give us b (word-boundaries) S (non-whitespace) and {N} (find exactly N characters), and the -o means "print only the matching part of the line. Then, we look for stretches of non-whitespace that are exactly 64 characters long that are either at the beginning of the line (^) or after whitespace ('s) and which end either at the end of the line ($) or with another whitespace character.



      Note that the result will include any whitespace characters at the beginning and end of the string, so if you want to parse this further, you might want to use this instead:



      grep -Po '(^|s)KS{64}(?=s|$)'


      That will look for a whitespace character or the beginning of the string (s|^), then discard it K and then look for 64 non-whitespace characters followed by (the (?=foo) is called a "lookahead" and will not be included in the match) either a whitespace character, or the end of the line.






      share|improve this answer














      If you have GNU grep (default on Linux), you can do:



      grep -Po '(^|s)S{64}(s|$)' file


      The -P enables Perl Compatible Regular Expressions, which give us b (word-boundaries) S (non-whitespace) and {N} (find exactly N characters), and the -o means "print only the matching part of the line. Then, we look for stretches of non-whitespace that are exactly 64 characters long that are either at the beginning of the line (^) or after whitespace ('s) and which end either at the end of the line ($) or with another whitespace character.



      Note that the result will include any whitespace characters at the beginning and end of the string, so if you want to parse this further, you might want to use this instead:



      grep -Po '(^|s)KS{64}(?=s|$)'


      That will look for a whitespace character or the beginning of the string (s|^), then discard it K and then look for 64 non-whitespace characters followed by (the (?=foo) is called a "lookahead" and will not be included in the match) either a whitespace character, or the end of the line.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Jan 6 at 3:37

























      answered Jan 6 at 0:47









      terdonterdon

      129k31253428




      129k31253428












      • pcre also gives us negative lookahead and lookbehind assertions: grep -Po '(?<!S)S{64}(?!S)' is enough to find runs of 64 non-spaces; but please read my answer for why that's probably not what's intended.
        – pizdelect
        Jan 6 at 1:23










      • @pizdelect yes, I know negative lookbehinds, but I find the K approach much more readable and elegant.
        – terdon
        Jan 6 at 3:09










      • elegance is not my strong point, but your 1st example will also include the delimiting spaces in the output, and the 2nd example will not match words which start at the beginning of the line.
        – pizdelect
        Jan 6 at 3:31










      • @pizdelect well yes, that's why I included the second example with the lookarounds. But you're right about the second, that's a typo (see the description which I ncludes the start), thanks for pointing it out. Fixed now.
        – terdon
        Jan 6 at 3:37


















      • pcre also gives us negative lookahead and lookbehind assertions: grep -Po '(?<!S)S{64}(?!S)' is enough to find runs of 64 non-spaces; but please read my answer for why that's probably not what's intended.
        – pizdelect
        Jan 6 at 1:23










      • @pizdelect yes, I know negative lookbehinds, but I find the K approach much more readable and elegant.
        – terdon
        Jan 6 at 3:09










      • elegance is not my strong point, but your 1st example will also include the delimiting spaces in the output, and the 2nd example will not match words which start at the beginning of the line.
        – pizdelect
        Jan 6 at 3:31










      • @pizdelect well yes, that's why I included the second example with the lookarounds. But you're right about the second, that's a typo (see the description which I ncludes the start), thanks for pointing it out. Fixed now.
        – terdon
        Jan 6 at 3:37
















      pcre also gives us negative lookahead and lookbehind assertions: grep -Po '(?<!S)S{64}(?!S)' is enough to find runs of 64 non-spaces; but please read my answer for why that's probably not what's intended.
      – pizdelect
      Jan 6 at 1:23




      pcre also gives us negative lookahead and lookbehind assertions: grep -Po '(?<!S)S{64}(?!S)' is enough to find runs of 64 non-spaces; but please read my answer for why that's probably not what's intended.
      – pizdelect
      Jan 6 at 1:23












      @pizdelect yes, I know negative lookbehinds, but I find the K approach much more readable and elegant.
      – terdon
      Jan 6 at 3:09




      @pizdelect yes, I know negative lookbehinds, but I find the K approach much more readable and elegant.
      – terdon
      Jan 6 at 3:09












      elegance is not my strong point, but your 1st example will also include the delimiting spaces in the output, and the 2nd example will not match words which start at the beginning of the line.
      – pizdelect
      Jan 6 at 3:31




      elegance is not my strong point, but your 1st example will also include the delimiting spaces in the output, and the 2nd example will not match words which start at the beginning of the line.
      – pizdelect
      Jan 6 at 3:31












      @pizdelect well yes, that's why I included the second example with the lookarounds. But you're right about the second, that's a typo (see the description which I ncludes the start), thanks for pointing it out. Fixed now.
      – terdon
      Jan 6 at 3:37




      @pizdelect well yes, that's why I included the second example with the lookarounds. But you're right about the second, that's a typo (see the description which I ncludes the start), thanks for pointing it out. Fixed now.
      – terdon
      Jan 6 at 3:37













      7














      If you mean to search for a 256-bit number in hexadecimal form (64 chars from the range 0-9 and A-F -- one of the formats in which a bitcoin private key could appear), this should do:



      egrep -aro '<[A-F0-9]{64}>' files and dirs ...


      Add the -i option or also include the a-f range if some of the keys are in lowercase.



      For the general problem of finding runs of characters from the same class having a specified length, you would better use pcre regexps, which could be used with GNU grep with the -P option. For instance, to find runs of uppercase letters from any charset, of min length of 2 and max length of 4, and which are delimited by chars which are not uppercase letters:



      echo ÁRVÍZtűrő tükörFÚRÓgép |
      LC_CTYPE=en_US.UTF-8 grep -Po '(?<!p{Lu})p{Lu}{2,4}(?!p{Lu})'
      FÚRÓ


      Replace p{Lu} with p{Ll} for lowercase letters, S for non-spaces, etc. See here and here for the full list.



      (?<!...) and (?!...) are negative lookbehind and lookahead zero-width assertions; e.g. (?<!<)w(?!>) will match a "word" character when not bracketed by < and >. The < zero-width assertion from vi could be implemented by (?<!w)(?=w).






      share|improve this answer




























        7














        If you mean to search for a 256-bit number in hexadecimal form (64 chars from the range 0-9 and A-F -- one of the formats in which a bitcoin private key could appear), this should do:



        egrep -aro '<[A-F0-9]{64}>' files and dirs ...


        Add the -i option or also include the a-f range if some of the keys are in lowercase.



        For the general problem of finding runs of characters from the same class having a specified length, you would better use pcre regexps, which could be used with GNU grep with the -P option. For instance, to find runs of uppercase letters from any charset, of min length of 2 and max length of 4, and which are delimited by chars which are not uppercase letters:



        echo ÁRVÍZtűrő tükörFÚRÓgép |
        LC_CTYPE=en_US.UTF-8 grep -Po '(?<!p{Lu})p{Lu}{2,4}(?!p{Lu})'
        FÚRÓ


        Replace p{Lu} with p{Ll} for lowercase letters, S for non-spaces, etc. See here and here for the full list.



        (?<!...) and (?!...) are negative lookbehind and lookahead zero-width assertions; e.g. (?<!<)w(?!>) will match a "word" character when not bracketed by < and >. The < zero-width assertion from vi could be implemented by (?<!w)(?=w).






        share|improve this answer


























          7












          7








          7






          If you mean to search for a 256-bit number in hexadecimal form (64 chars from the range 0-9 and A-F -- one of the formats in which a bitcoin private key could appear), this should do:



          egrep -aro '<[A-F0-9]{64}>' files and dirs ...


          Add the -i option or also include the a-f range if some of the keys are in lowercase.



          For the general problem of finding runs of characters from the same class having a specified length, you would better use pcre regexps, which could be used with GNU grep with the -P option. For instance, to find runs of uppercase letters from any charset, of min length of 2 and max length of 4, and which are delimited by chars which are not uppercase letters:



          echo ÁRVÍZtűrő tükörFÚRÓgép |
          LC_CTYPE=en_US.UTF-8 grep -Po '(?<!p{Lu})p{Lu}{2,4}(?!p{Lu})'
          FÚRÓ


          Replace p{Lu} with p{Ll} for lowercase letters, S for non-spaces, etc. See here and here for the full list.



          (?<!...) and (?!...) are negative lookbehind and lookahead zero-width assertions; e.g. (?<!<)w(?!>) will match a "word" character when not bracketed by < and >. The < zero-width assertion from vi could be implemented by (?<!w)(?=w).






          share|improve this answer














          If you mean to search for a 256-bit number in hexadecimal form (64 chars from the range 0-9 and A-F -- one of the formats in which a bitcoin private key could appear), this should do:



          egrep -aro '<[A-F0-9]{64}>' files and dirs ...


          Add the -i option or also include the a-f range if some of the keys are in lowercase.



          For the general problem of finding runs of characters from the same class having a specified length, you would better use pcre regexps, which could be used with GNU grep with the -P option. For instance, to find runs of uppercase letters from any charset, of min length of 2 and max length of 4, and which are delimited by chars which are not uppercase letters:



          echo ÁRVÍZtűrő tükörFÚRÓgép |
          LC_CTYPE=en_US.UTF-8 grep -Po '(?<!p{Lu})p{Lu}{2,4}(?!p{Lu})'
          FÚRÓ


          Replace p{Lu} with p{Ll} for lowercase letters, S for non-spaces, etc. See here and here for the full list.



          (?<!...) and (?!...) are negative lookbehind and lookahead zero-width assertions; e.g. (?<!<)w(?!>) will match a "word" character when not bracketed by < and >. The < zero-width assertion from vi could be implemented by (?<!w)(?=w).







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 6 at 5:36

























          answered Jan 6 at 1:10









          pizdelectpizdelect

          44016




          44016























              6














              If you want to find all words of length 64 from /path/to/file, you can use



              tr -c '[[:alnum:]]' 'n' < /path/to/file | grep '^.{64}$'


              This replaces all non-alphanumeric characters by newlines, so each word is on its own line. Then it filters this result to include only the words of length 64.






              share|improve this answer























              • What about dot (.), comma (,), colon (:), semicolon (;) and many other usual punctuation characters, shoudln't those also be converted to a newline ?
                – Isaac
                Jan 6 at 1:52










              • @Isaac why? Why are you assuming they can't appear inside the target string?
                – terdon
                Jan 6 at 3:08






              • 1




                @terdon Because I assume that the target string is a bitcoin private key which is usually a 64 hex character string. Hmmm, ooops, sorry, the OP stated that: I am not assuming then.
                – Isaac
                Jan 6 at 5:05






              • 1




                @Isaac You're not wrong
                – Fox
                2 days ago
















              6














              If you want to find all words of length 64 from /path/to/file, you can use



              tr -c '[[:alnum:]]' 'n' < /path/to/file | grep '^.{64}$'


              This replaces all non-alphanumeric characters by newlines, so each word is on its own line. Then it filters this result to include only the words of length 64.






              share|improve this answer























              • What about dot (.), comma (,), colon (:), semicolon (;) and many other usual punctuation characters, shoudln't those also be converted to a newline ?
                – Isaac
                Jan 6 at 1:52










              • @Isaac why? Why are you assuming they can't appear inside the target string?
                – terdon
                Jan 6 at 3:08






              • 1




                @terdon Because I assume that the target string is a bitcoin private key which is usually a 64 hex character string. Hmmm, ooops, sorry, the OP stated that: I am not assuming then.
                – Isaac
                Jan 6 at 5:05






              • 1




                @Isaac You're not wrong
                – Fox
                2 days ago














              6












              6








              6






              If you want to find all words of length 64 from /path/to/file, you can use



              tr -c '[[:alnum:]]' 'n' < /path/to/file | grep '^.{64}$'


              This replaces all non-alphanumeric characters by newlines, so each word is on its own line. Then it filters this result to include only the words of length 64.






              share|improve this answer














              If you want to find all words of length 64 from /path/to/file, you can use



              tr -c '[[:alnum:]]' 'n' < /path/to/file | grep '^.{64}$'


              This replaces all non-alphanumeric characters by newlines, so each word is on its own line. Then it filters this result to include only the words of length 64.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 2 days ago

























              answered Jan 6 at 0:42









              FoxFox

              5,24411232




              5,24411232












              • What about dot (.), comma (,), colon (:), semicolon (;) and many other usual punctuation characters, shoudln't those also be converted to a newline ?
                – Isaac
                Jan 6 at 1:52










              • @Isaac why? Why are you assuming they can't appear inside the target string?
                – terdon
                Jan 6 at 3:08






              • 1




                @terdon Because I assume that the target string is a bitcoin private key which is usually a 64 hex character string. Hmmm, ooops, sorry, the OP stated that: I am not assuming then.
                – Isaac
                Jan 6 at 5:05






              • 1




                @Isaac You're not wrong
                – Fox
                2 days ago


















              • What about dot (.), comma (,), colon (:), semicolon (;) and many other usual punctuation characters, shoudln't those also be converted to a newline ?
                – Isaac
                Jan 6 at 1:52










              • @Isaac why? Why are you assuming they can't appear inside the target string?
                – terdon
                Jan 6 at 3:08






              • 1




                @terdon Because I assume that the target string is a bitcoin private key which is usually a 64 hex character string. Hmmm, ooops, sorry, the OP stated that: I am not assuming then.
                – Isaac
                Jan 6 at 5:05






              • 1




                @Isaac You're not wrong
                – Fox
                2 days ago
















              What about dot (.), comma (,), colon (:), semicolon (;) and many other usual punctuation characters, shoudln't those also be converted to a newline ?
              – Isaac
              Jan 6 at 1:52




              What about dot (.), comma (,), colon (:), semicolon (;) and many other usual punctuation characters, shoudln't those also be converted to a newline ?
              – Isaac
              Jan 6 at 1:52












              @Isaac why? Why are you assuming they can't appear inside the target string?
              – terdon
              Jan 6 at 3:08




              @Isaac why? Why are you assuming they can't appear inside the target string?
              – terdon
              Jan 6 at 3:08




              1




              1




              @terdon Because I assume that the target string is a bitcoin private key which is usually a 64 hex character string. Hmmm, ooops, sorry, the OP stated that: I am not assuming then.
              – Isaac
              Jan 6 at 5:05




              @terdon Because I assume that the target string is a bitcoin private key which is usually a 64 hex character string. Hmmm, ooops, sorry, the OP stated that: I am not assuming then.
              – Isaac
              Jan 6 at 5:05




              1




              1




              @Isaac You're not wrong
              – Fox
              2 days ago




              @Isaac You're not wrong
              – Fox
              2 days ago











              2














              It seems that grep is the correct tool to "search" for an string. What is left to do is to define such string with a regex. The first issue is to define the limits of a word. It is not as simple as "an space", as a book, a lamp use , as word delimiter, in the same concept, many other characters, or even the start or end of a line could act as word delimiter. There are some word delimiters in GNU grep:





              • < word start.


              • > word end.


              • b word boundary.


              All of them assume that a word is a sequence of [a-zA-Z0-9_] characters. If that is ok for you, this regex could work:



               grep -o '<.{64}>' file


              If you could use extended regex, the could be reduced:



               grep -oE '<.{64}>' file


              That selects from a "word start" (<), 64 ({64}) characters (.), to a "word end" (>) and prints only the matching (-o) parts.



              However, the dot (.) will match any character, that may be too much.



              If you want to be more strict on the selection (hex digits), use:



               grep -oE '<[0-9a-fA-F]{64}>' file


              Which will allow hex digits in lowercase or uppercase. But if you really want to be strict, as some non-ASCII characters might be included, use:



               LC_ALL=C grep -oE '<[0-9a-fA-F]{64}>' file


              Some implementations of grep (as grep -P) do not have a "start of word" or "end of word" (as < and >) but have "word boundary" (as b):



              grep -oP 'b[0-9a-fA-F]{64}b' file


              There are some languages that accept the POSIX word boundaries [[:<:]] and [[:>:]], but not perl, and only from PCRE 8.34.



              And there are a lot more flavors of "word boundaries".






              share|improve this answer




























                2














                It seems that grep is the correct tool to "search" for an string. What is left to do is to define such string with a regex. The first issue is to define the limits of a word. It is not as simple as "an space", as a book, a lamp use , as word delimiter, in the same concept, many other characters, or even the start or end of a line could act as word delimiter. There are some word delimiters in GNU grep:





                • < word start.


                • > word end.


                • b word boundary.


                All of them assume that a word is a sequence of [a-zA-Z0-9_] characters. If that is ok for you, this regex could work:



                 grep -o '<.{64}>' file


                If you could use extended regex, the could be reduced:



                 grep -oE '<.{64}>' file


                That selects from a "word start" (<), 64 ({64}) characters (.), to a "word end" (>) and prints only the matching (-o) parts.



                However, the dot (.) will match any character, that may be too much.



                If you want to be more strict on the selection (hex digits), use:



                 grep -oE '<[0-9a-fA-F]{64}>' file


                Which will allow hex digits in lowercase or uppercase. But if you really want to be strict, as some non-ASCII characters might be included, use:



                 LC_ALL=C grep -oE '<[0-9a-fA-F]{64}>' file


                Some implementations of grep (as grep -P) do not have a "start of word" or "end of word" (as < and >) but have "word boundary" (as b):



                grep -oP 'b[0-9a-fA-F]{64}b' file


                There are some languages that accept the POSIX word boundaries [[:<:]] and [[:>:]], but not perl, and only from PCRE 8.34.



                And there are a lot more flavors of "word boundaries".






                share|improve this answer


























                  2












                  2








                  2






                  It seems that grep is the correct tool to "search" for an string. What is left to do is to define such string with a regex. The first issue is to define the limits of a word. It is not as simple as "an space", as a book, a lamp use , as word delimiter, in the same concept, many other characters, or even the start or end of a line could act as word delimiter. There are some word delimiters in GNU grep:





                  • < word start.


                  • > word end.


                  • b word boundary.


                  All of them assume that a word is a sequence of [a-zA-Z0-9_] characters. If that is ok for you, this regex could work:



                   grep -o '<.{64}>' file


                  If you could use extended regex, the could be reduced:



                   grep -oE '<.{64}>' file


                  That selects from a "word start" (<), 64 ({64}) characters (.), to a "word end" (>) and prints only the matching (-o) parts.



                  However, the dot (.) will match any character, that may be too much.



                  If you want to be more strict on the selection (hex digits), use:



                   grep -oE '<[0-9a-fA-F]{64}>' file


                  Which will allow hex digits in lowercase or uppercase. But if you really want to be strict, as some non-ASCII characters might be included, use:



                   LC_ALL=C grep -oE '<[0-9a-fA-F]{64}>' file


                  Some implementations of grep (as grep -P) do not have a "start of word" or "end of word" (as < and >) but have "word boundary" (as b):



                  grep -oP 'b[0-9a-fA-F]{64}b' file


                  There are some languages that accept the POSIX word boundaries [[:<:]] and [[:>:]], but not perl, and only from PCRE 8.34.



                  And there are a lot more flavors of "word boundaries".






                  share|improve this answer














                  It seems that grep is the correct tool to "search" for an string. What is left to do is to define such string with a regex. The first issue is to define the limits of a word. It is not as simple as "an space", as a book, a lamp use , as word delimiter, in the same concept, many other characters, or even the start or end of a line could act as word delimiter. There are some word delimiters in GNU grep:





                  • < word start.


                  • > word end.


                  • b word boundary.


                  All of them assume that a word is a sequence of [a-zA-Z0-9_] characters. If that is ok for you, this regex could work:



                   grep -o '<.{64}>' file


                  If you could use extended regex, the could be reduced:



                   grep -oE '<.{64}>' file


                  That selects from a "word start" (<), 64 ({64}) characters (.), to a "word end" (>) and prints only the matching (-o) parts.



                  However, the dot (.) will match any character, that may be too much.



                  If you want to be more strict on the selection (hex digits), use:



                   grep -oE '<[0-9a-fA-F]{64}>' file


                  Which will allow hex digits in lowercase or uppercase. But if you really want to be strict, as some non-ASCII characters might be included, use:



                   LC_ALL=C grep -oE '<[0-9a-fA-F]{64}>' file


                  Some implementations of grep (as grep -P) do not have a "start of word" or "end of word" (as < and >) but have "word boundary" (as b):



                  grep -oP 'b[0-9a-fA-F]{64}b' file


                  There are some languages that accept the POSIX word boundaries [[:<:]] and [[:>:]], but not perl, and only from PCRE 8.34.



                  And there are a lot more flavors of "word boundaries".







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 2 days ago

























                  answered Jan 6 at 1:50









                  IsaacIsaac

                  11.3k11651




                  11.3k11651






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Unix & Linux Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492725%2fparse-all-strings-of-specific-length%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                      Alcedinidae

                      Origin of the phrase “under your belt”?