Split string into list contains alphabetical bullet list












5















My string contains
text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"



I want to split this in list like
["Baghdad, Iraq","United Arab Emirates (possibly)"]



The code which i have used is not providing me the desired result



re.split('\s*([a-zA-Z\d][).]|•)\s*(?=[A-Z])', text)


Please help me regarding this










share|improve this question

























  • Is it possible to have a string like a) Baghdad, Iraq b) United Arab Emirates (possibly) c) Turkey if UAE is not in (b)?

    – lxop
    Nov 22 '18 at 12:16











  • You are missing the r at the start of your regex pattern string.

    – usr2564301
    Nov 22 '18 at 12:26






  • 1





    [s for s in re.split('\s*([a-zA-Z\d][).]|•)\s*(?=[A-Z])', text) if len(s) > 4]

    – iamklaus
    Nov 22 '18 at 12:30











  • @SarthakNegi that fails for c) A

    – planetmaker
    Nov 22 '18 at 12:46











  • @lxop yes it can also contains c d e so on......

    – Sharjeel Ali Shaukat
    Nov 23 '18 at 5:26
















5















My string contains
text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"



I want to split this in list like
["Baghdad, Iraq","United Arab Emirates (possibly)"]



The code which i have used is not providing me the desired result



re.split('\s*([a-zA-Z\d][).]|•)\s*(?=[A-Z])', text)


Please help me regarding this










share|improve this question

























  • Is it possible to have a string like a) Baghdad, Iraq b) United Arab Emirates (possibly) c) Turkey if UAE is not in (b)?

    – lxop
    Nov 22 '18 at 12:16











  • You are missing the r at the start of your regex pattern string.

    – usr2564301
    Nov 22 '18 at 12:26






  • 1





    [s for s in re.split('\s*([a-zA-Z\d][).]|•)\s*(?=[A-Z])', text) if len(s) > 4]

    – iamklaus
    Nov 22 '18 at 12:30











  • @SarthakNegi that fails for c) A

    – planetmaker
    Nov 22 '18 at 12:46











  • @lxop yes it can also contains c d e so on......

    – Sharjeel Ali Shaukat
    Nov 23 '18 at 5:26














5












5








5








My string contains
text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"



I want to split this in list like
["Baghdad, Iraq","United Arab Emirates (possibly)"]



The code which i have used is not providing me the desired result



re.split('\s*([a-zA-Z\d][).]|•)\s*(?=[A-Z])', text)


Please help me regarding this










share|improve this question
















My string contains
text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"



I want to split this in list like
["Baghdad, Iraq","United Arab Emirates (possibly)"]



The code which i have used is not providing me the desired result



re.split('\s*([a-zA-Z\d][).]|•)\s*(?=[A-Z])', text)


Please help me regarding this







python string python-3.x






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 '18 at 12:24









Patrick Artner

24.8k62443




24.8k62443










asked Nov 22 '18 at 12:12









Sharjeel Ali ShaukatSharjeel Ali Shaukat

395210




395210













  • Is it possible to have a string like a) Baghdad, Iraq b) United Arab Emirates (possibly) c) Turkey if UAE is not in (b)?

    – lxop
    Nov 22 '18 at 12:16











  • You are missing the r at the start of your regex pattern string.

    – usr2564301
    Nov 22 '18 at 12:26






  • 1





    [s for s in re.split('\s*([a-zA-Z\d][).]|•)\s*(?=[A-Z])', text) if len(s) > 4]

    – iamklaus
    Nov 22 '18 at 12:30











  • @SarthakNegi that fails for c) A

    – planetmaker
    Nov 22 '18 at 12:46











  • @lxop yes it can also contains c d e so on......

    – Sharjeel Ali Shaukat
    Nov 23 '18 at 5:26



















  • Is it possible to have a string like a) Baghdad, Iraq b) United Arab Emirates (possibly) c) Turkey if UAE is not in (b)?

    – lxop
    Nov 22 '18 at 12:16











  • You are missing the r at the start of your regex pattern string.

    – usr2564301
    Nov 22 '18 at 12:26






  • 1





    [s for s in re.split('\s*([a-zA-Z\d][).]|•)\s*(?=[A-Z])', text) if len(s) > 4]

    – iamklaus
    Nov 22 '18 at 12:30











  • @SarthakNegi that fails for c) A

    – planetmaker
    Nov 22 '18 at 12:46











  • @lxop yes it can also contains c d e so on......

    – Sharjeel Ali Shaukat
    Nov 23 '18 at 5:26

















Is it possible to have a string like a) Baghdad, Iraq b) United Arab Emirates (possibly) c) Turkey if UAE is not in (b)?

– lxop
Nov 22 '18 at 12:16





Is it possible to have a string like a) Baghdad, Iraq b) United Arab Emirates (possibly) c) Turkey if UAE is not in (b)?

– lxop
Nov 22 '18 at 12:16













You are missing the r at the start of your regex pattern string.

– usr2564301
Nov 22 '18 at 12:26





You are missing the r at the start of your regex pattern string.

– usr2564301
Nov 22 '18 at 12:26




1




1





[s for s in re.split('\s*([a-zA-Z\d][).]|•)\s*(?=[A-Z])', text) if len(s) > 4]

– iamklaus
Nov 22 '18 at 12:30





[s for s in re.split('\s*([a-zA-Z\d][).]|•)\s*(?=[A-Z])', text) if len(s) > 4]

– iamklaus
Nov 22 '18 at 12:30













@SarthakNegi that fails for c) A

– planetmaker
Nov 22 '18 at 12:46





@SarthakNegi that fails for c) A

– planetmaker
Nov 22 '18 at 12:46













@lxop yes it can also contains c d e so on......

– Sharjeel Ali Shaukat
Nov 23 '18 at 5:26





@lxop yes it can also contains c d e so on......

– Sharjeel Ali Shaukat
Nov 23 '18 at 5:26












2 Answers
2






active

oldest

votes


















3














You could create the wanted data for your example using a list comp and a second regex:



import re

text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"

# different 1.regex pattern, same result - refining with 2nd pattern
data = [x for x in re.split(r'((?:^s*[a-zA-Z0-9]))|(?:s+[a-zA-Z0-9])))s*',
text) if x and not re.match(r"s*[a-zA-Z])",x)]
print(data)


Output:



['Baghdad, Iraq', 'United Arab Emirates (possibly)']


See https://regex101.com/r/wxEEQW/1






share|improve this answer































    1














    Instead of re.findall, you can simply use re.split:



    import re
    text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"
    countries = list(filter(None, map(str.rstrip, re.split('w)s', text))))


    Output:



    ['Baghdad, Iraq', 'United Arab Emirates (possibly)']





    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53430766%2fsplit-string-into-list-contains-alphabetical-bullet-list%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      3














      You could create the wanted data for your example using a list comp and a second regex:



      import re

      text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"

      # different 1.regex pattern, same result - refining with 2nd pattern
      data = [x for x in re.split(r'((?:^s*[a-zA-Z0-9]))|(?:s+[a-zA-Z0-9])))s*',
      text) if x and not re.match(r"s*[a-zA-Z])",x)]
      print(data)


      Output:



      ['Baghdad, Iraq', 'United Arab Emirates (possibly)']


      See https://regex101.com/r/wxEEQW/1






      share|improve this answer




























        3














        You could create the wanted data for your example using a list comp and a second regex:



        import re

        text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"

        # different 1.regex pattern, same result - refining with 2nd pattern
        data = [x for x in re.split(r'((?:^s*[a-zA-Z0-9]))|(?:s+[a-zA-Z0-9])))s*',
        text) if x and not re.match(r"s*[a-zA-Z])",x)]
        print(data)


        Output:



        ['Baghdad, Iraq', 'United Arab Emirates (possibly)']


        See https://regex101.com/r/wxEEQW/1






        share|improve this answer


























          3












          3








          3







          You could create the wanted data for your example using a list comp and a second regex:



          import re

          text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"

          # different 1.regex pattern, same result - refining with 2nd pattern
          data = [x for x in re.split(r'((?:^s*[a-zA-Z0-9]))|(?:s+[a-zA-Z0-9])))s*',
          text) if x and not re.match(r"s*[a-zA-Z])",x)]
          print(data)


          Output:



          ['Baghdad, Iraq', 'United Arab Emirates (possibly)']


          See https://regex101.com/r/wxEEQW/1






          share|improve this answer













          You could create the wanted data for your example using a list comp and a second regex:



          import re

          text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"

          # different 1.regex pattern, same result - refining with 2nd pattern
          data = [x for x in re.split(r'((?:^s*[a-zA-Z0-9]))|(?:s+[a-zA-Z0-9])))s*',
          text) if x and not re.match(r"s*[a-zA-Z])",x)]
          print(data)


          Output:



          ['Baghdad, Iraq', 'United Arab Emirates (possibly)']


          See https://regex101.com/r/wxEEQW/1







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 22 '18 at 12:33









          Patrick ArtnerPatrick Artner

          24.8k62443




          24.8k62443

























              1














              Instead of re.findall, you can simply use re.split:



              import re
              text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"
              countries = list(filter(None, map(str.rstrip, re.split('w)s', text))))


              Output:



              ['Baghdad, Iraq', 'United Arab Emirates (possibly)']





              share|improve this answer




























                1














                Instead of re.findall, you can simply use re.split:



                import re
                text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"
                countries = list(filter(None, map(str.rstrip, re.split('w)s', text))))


                Output:



                ['Baghdad, Iraq', 'United Arab Emirates (possibly)']





                share|improve this answer


























                  1












                  1








                  1







                  Instead of re.findall, you can simply use re.split:



                  import re
                  text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"
                  countries = list(filter(None, map(str.rstrip, re.split('w)s', text))))


                  Output:



                  ['Baghdad, Iraq', 'United Arab Emirates (possibly)']





                  share|improve this answer













                  Instead of re.findall, you can simply use re.split:



                  import re
                  text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"
                  countries = list(filter(None, map(str.rstrip, re.split('w)s', text))))


                  Output:



                  ['Baghdad, Iraq', 'United Arab Emirates (possibly)']






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 22 '18 at 15:18









                  Ajax1234Ajax1234

                  42k42853




                  42k42853






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53430766%2fsplit-string-into-list-contains-alphabetical-bullet-list%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                      Alcedinidae

                      Origin of the phrase “under your belt”?