What is the range of Unicode Printable Characters?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







43















Can anybody please tell me what is the range of Unicode printable characters? [e.g. Ascii printable character range is u0020 - u007f]










share|improve this question




















  • 1





    u0000 - u0020 are also unprintable in Unicode

    – Andrey
    Sep 22 '10 at 14:22






  • 4





    More like u0020 - u007e

    – Desmond Hume
    Aug 26 '13 at 12:57






  • 2





    You sure got a lot of hate for this question. I like the idea.

    – jsejcksn
    Jan 7 '16 at 8:52











  • It's a bit odd to use a programming language notation for UTF-16 code units to give a range of ASCII codepoints (but numerically and character-wise, it does work out).

    – Tom Blodget
    Nov 24 '18 at 21:34


















43















Can anybody please tell me what is the range of Unicode printable characters? [e.g. Ascii printable character range is u0020 - u007f]










share|improve this question




















  • 1





    u0000 - u0020 are also unprintable in Unicode

    – Andrey
    Sep 22 '10 at 14:22






  • 4





    More like u0020 - u007e

    – Desmond Hume
    Aug 26 '13 at 12:57






  • 2





    You sure got a lot of hate for this question. I like the idea.

    – jsejcksn
    Jan 7 '16 at 8:52











  • It's a bit odd to use a programming language notation for UTF-16 code units to give a range of ASCII codepoints (but numerically and character-wise, it does work out).

    – Tom Blodget
    Nov 24 '18 at 21:34














43












43








43


6






Can anybody please tell me what is the range of Unicode printable characters? [e.g. Ascii printable character range is u0020 - u007f]










share|improve this question
















Can anybody please tell me what is the range of Unicode printable characters? [e.g. Ascii printable character range is u0020 - u007f]







unicode character-encoding unicode-string






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Oct 22 '15 at 5:33









abc

8,9042490147




8,9042490147










asked Sep 22 '10 at 14:14









Anindya ChatterjeeAnindya Chatterjee

2,54394372




2,54394372








  • 1





    u0000 - u0020 are also unprintable in Unicode

    – Andrey
    Sep 22 '10 at 14:22






  • 4





    More like u0020 - u007e

    – Desmond Hume
    Aug 26 '13 at 12:57






  • 2





    You sure got a lot of hate for this question. I like the idea.

    – jsejcksn
    Jan 7 '16 at 8:52











  • It's a bit odd to use a programming language notation for UTF-16 code units to give a range of ASCII codepoints (but numerically and character-wise, it does work out).

    – Tom Blodget
    Nov 24 '18 at 21:34














  • 1





    u0000 - u0020 are also unprintable in Unicode

    – Andrey
    Sep 22 '10 at 14:22






  • 4





    More like u0020 - u007e

    – Desmond Hume
    Aug 26 '13 at 12:57






  • 2





    You sure got a lot of hate for this question. I like the idea.

    – jsejcksn
    Jan 7 '16 at 8:52











  • It's a bit odd to use a programming language notation for UTF-16 code units to give a range of ASCII codepoints (but numerically and character-wise, it does work out).

    – Tom Blodget
    Nov 24 '18 at 21:34








1




1





u0000 - u0020 are also unprintable in Unicode

– Andrey
Sep 22 '10 at 14:22





u0000 - u0020 are also unprintable in Unicode

– Andrey
Sep 22 '10 at 14:22




4




4





More like u0020 - u007e

– Desmond Hume
Aug 26 '13 at 12:57





More like u0020 - u007e

– Desmond Hume
Aug 26 '13 at 12:57




2




2





You sure got a lot of hate for this question. I like the idea.

– jsejcksn
Jan 7 '16 at 8:52





You sure got a lot of hate for this question. I like the idea.

– jsejcksn
Jan 7 '16 at 8:52













It's a bit odd to use a programming language notation for UTF-16 code units to give a range of ASCII codepoints (but numerically and character-wise, it does work out).

– Tom Blodget
Nov 24 '18 at 21:34





It's a bit odd to use a programming language notation for UTF-16 code units to give a range of ASCII codepoints (but numerically and character-wise, it does work out).

– Tom Blodget
Nov 24 '18 at 21:34












5 Answers
5






active

oldest

votes


















18














See, http://en.wikipedia.org/wiki/Unicode_control_characters



You might want to look especially at C0 and C1 control character http://en.wikipedia.org/wiki/C0_and_C1_control_codes



The wiki says, the C0 control character is in the range U+0000—U+001F and U+007F (which is the same range as ASCII) and C1 control character is in the range U+0080—U+009F



other than C-control character, Unicode also has hundreds of formatting control characters, e.g. zero-width non-joiner, which makes character spacing closer, or bidirectional text control. This formatting control characters are rather scattered.



More importantly, what are you doing that requires you to know Unicode's non-printable characters? More likely than not, whatever you're trying to do is the wrong approach to solve your problem.






share|improve this answer



















  • 4





    I want to create a random unicode string generator which will generate printable characters.

    – Anindya Chatterjee
    Sep 22 '10 at 14:29






  • 5





    Printable by whom? Do you want to include eg. all the Chinese characters? Many users won't have fonts for them, so ‘printing’ them would give you nothing, a blank box, or some other useless replacement character.

    – bobince
    Sep 22 '10 at 20:29






  • 5





    One good reason is to avoid security exploits: bugzilla.mozilla.org/show_bug.cgi?id=968576

    – Neil McGuigan
    Apr 7 '15 at 20:46











  • @bobince My browser can display Chinese characters. Not sure if that was the case in 2010 though.

    – John
    May 16 '18 at 10:55



















13














First, you should remove the word 'UTF8' in your question, it's not pertinent (UTF8 is just one of the encodings of Unicode, it's something orthogonal to your question).



Second: the meaning of "printable/non printable" is less clear in Unicode. Perhaps you mean a "graphical character" ; and one can even dispute if a space is printable/graphical. The non-graphical characters would consist, basically, of control characters: the range 0x00-0x0f plus some others that are scattered.



Anyway, the vast majority of Unicode characters (more than 200.000) are "graphical". But this certainly does not imply that they are printable in your environment.



It seems to me a bad idea, if you intend to generate a "random printable" unicode string, to try to include all "printable" characters.






share|improve this answer































    6














    This is an old question, but it is still valid and I think there is more to usefully, but briefly, say on the subject than is covered by existing answers.



    Unicode



    Unicode defines properties for characters.



    One of these properties is "General Category" which has Major classes and subclasses. The Major classes are Letter, Mark, Punctuation, Symbol, Separator, and Other.



    By knowing the properties of your characters, you can decide whether you consider them printable in your particular context.



    You must always remember that terms like "character" and "printable" are often difficult and have interesting edge-cases.





    Programming Language support



    Some programming languages assist with this problem.



    For example, the Go language has a "unicode" package which provides many useful Unicode-related functions including these two:



    func IsGraphic(r rune) bool

    IsGraphic reports whether the rune is defined as a Graphic by Unicode. Such
    characters include letters, marks, numbers, punctuation, symbols, and spaces,
    from categories L, M, N, P, S, Zs.

    func IsPrint(r rune) bool

    IsPrint reports whether the rune is defined as printable by Go. Such
    characters include letters, marks, numbers, punctuation, symbols, and
    the ASCII space character, from categories L, M, N, P, S and the ASCII
    space character. This categorization is the same as IsGraphic except
    that the only spacing character is ASCII space, U+0020.


    Notice that it says "defined as printable by Go" not by "defined as printable by Unicode". It is almost as if there are some depths the wizards at Unicode dare not plumb.





    Printable



    The more you learn about Unicode, the more you realise how unexpectedly diverse and unfathomably weird human writing systems are.



    In particular whether a particular "character" is printable is not always obvious.



    Is a zero-width space printable? When is a hyphenation point printable? Are there characters whose printability depends on their position in a word or on what characters are adjacent to them? Is a combining-character always printable?





    Footnotes




    ASCII printable character range is u0020 - u007f




    No it isn't. u007f is DEL which is not normally considered a printable character. It is, for example, associated with the keyboard key labelled "DEL" whose earliest purpose was to command the deletion of a character from some medium (display, file etc).



    In fact many 8-bit character sets have many non-consecutive ranges which are non-printable. See for example C0 and C1 controls.






    share|improve this answer

































      3














      What you should do is pick a font, and then generate a list of which Unicode characters have glyphs defined for your font. You can use a font library like freetype to test glyphs (test for FT_Get_Char_Index(...) != 0).






      share|improve this answer































        -7














        Unicode, stict term, has no range. Numbers can go infinite.



        What you gave is not UTF8 which has 1 byte for ASCII characters.



        As for the range, I believe there is no range of printable characters. It always evolves. Check the page I gave above.






        share|improve this answer



















        • 8





          Afaik Unicode is only defined until 0x10ffff, beyond that no codepoints will be assigned

          – Sebastian
          Dec 3 '10 at 6:42












        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f3770117%2fwhat-is-the-range-of-unicode-printable-characters%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        5 Answers
        5






        active

        oldest

        votes








        5 Answers
        5






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        18














        See, http://en.wikipedia.org/wiki/Unicode_control_characters



        You might want to look especially at C0 and C1 control character http://en.wikipedia.org/wiki/C0_and_C1_control_codes



        The wiki says, the C0 control character is in the range U+0000—U+001F and U+007F (which is the same range as ASCII) and C1 control character is in the range U+0080—U+009F



        other than C-control character, Unicode also has hundreds of formatting control characters, e.g. zero-width non-joiner, which makes character spacing closer, or bidirectional text control. This formatting control characters are rather scattered.



        More importantly, what are you doing that requires you to know Unicode's non-printable characters? More likely than not, whatever you're trying to do is the wrong approach to solve your problem.






        share|improve this answer



















        • 4





          I want to create a random unicode string generator which will generate printable characters.

          – Anindya Chatterjee
          Sep 22 '10 at 14:29






        • 5





          Printable by whom? Do you want to include eg. all the Chinese characters? Many users won't have fonts for them, so ‘printing’ them would give you nothing, a blank box, or some other useless replacement character.

          – bobince
          Sep 22 '10 at 20:29






        • 5





          One good reason is to avoid security exploits: bugzilla.mozilla.org/show_bug.cgi?id=968576

          – Neil McGuigan
          Apr 7 '15 at 20:46











        • @bobince My browser can display Chinese characters. Not sure if that was the case in 2010 though.

          – John
          May 16 '18 at 10:55
















        18














        See, http://en.wikipedia.org/wiki/Unicode_control_characters



        You might want to look especially at C0 and C1 control character http://en.wikipedia.org/wiki/C0_and_C1_control_codes



        The wiki says, the C0 control character is in the range U+0000—U+001F and U+007F (which is the same range as ASCII) and C1 control character is in the range U+0080—U+009F



        other than C-control character, Unicode also has hundreds of formatting control characters, e.g. zero-width non-joiner, which makes character spacing closer, or bidirectional text control. This formatting control characters are rather scattered.



        More importantly, what are you doing that requires you to know Unicode's non-printable characters? More likely than not, whatever you're trying to do is the wrong approach to solve your problem.






        share|improve this answer



















        • 4





          I want to create a random unicode string generator which will generate printable characters.

          – Anindya Chatterjee
          Sep 22 '10 at 14:29






        • 5





          Printable by whom? Do you want to include eg. all the Chinese characters? Many users won't have fonts for them, so ‘printing’ them would give you nothing, a blank box, or some other useless replacement character.

          – bobince
          Sep 22 '10 at 20:29






        • 5





          One good reason is to avoid security exploits: bugzilla.mozilla.org/show_bug.cgi?id=968576

          – Neil McGuigan
          Apr 7 '15 at 20:46











        • @bobince My browser can display Chinese characters. Not sure if that was the case in 2010 though.

          – John
          May 16 '18 at 10:55














        18












        18








        18







        See, http://en.wikipedia.org/wiki/Unicode_control_characters



        You might want to look especially at C0 and C1 control character http://en.wikipedia.org/wiki/C0_and_C1_control_codes



        The wiki says, the C0 control character is in the range U+0000—U+001F and U+007F (which is the same range as ASCII) and C1 control character is in the range U+0080—U+009F



        other than C-control character, Unicode also has hundreds of formatting control characters, e.g. zero-width non-joiner, which makes character spacing closer, or bidirectional text control. This formatting control characters are rather scattered.



        More importantly, what are you doing that requires you to know Unicode's non-printable characters? More likely than not, whatever you're trying to do is the wrong approach to solve your problem.






        share|improve this answer













        See, http://en.wikipedia.org/wiki/Unicode_control_characters



        You might want to look especially at C0 and C1 control character http://en.wikipedia.org/wiki/C0_and_C1_control_codes



        The wiki says, the C0 control character is in the range U+0000—U+001F and U+007F (which is the same range as ASCII) and C1 control character is in the range U+0080—U+009F



        other than C-control character, Unicode also has hundreds of formatting control characters, e.g. zero-width non-joiner, which makes character spacing closer, or bidirectional text control. This formatting control characters are rather scattered.



        More importantly, what are you doing that requires you to know Unicode's non-printable characters? More likely than not, whatever you're trying to do is the wrong approach to solve your problem.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Sep 22 '10 at 14:27









        Lie RyanLie Ryan

        45.9k1072124




        45.9k1072124








        • 4





          I want to create a random unicode string generator which will generate printable characters.

          – Anindya Chatterjee
          Sep 22 '10 at 14:29






        • 5





          Printable by whom? Do you want to include eg. all the Chinese characters? Many users won't have fonts for them, so ‘printing’ them would give you nothing, a blank box, or some other useless replacement character.

          – bobince
          Sep 22 '10 at 20:29






        • 5





          One good reason is to avoid security exploits: bugzilla.mozilla.org/show_bug.cgi?id=968576

          – Neil McGuigan
          Apr 7 '15 at 20:46











        • @bobince My browser can display Chinese characters. Not sure if that was the case in 2010 though.

          – John
          May 16 '18 at 10:55














        • 4





          I want to create a random unicode string generator which will generate printable characters.

          – Anindya Chatterjee
          Sep 22 '10 at 14:29






        • 5





          Printable by whom? Do you want to include eg. all the Chinese characters? Many users won't have fonts for them, so ‘printing’ them would give you nothing, a blank box, or some other useless replacement character.

          – bobince
          Sep 22 '10 at 20:29






        • 5





          One good reason is to avoid security exploits: bugzilla.mozilla.org/show_bug.cgi?id=968576

          – Neil McGuigan
          Apr 7 '15 at 20:46











        • @bobince My browser can display Chinese characters. Not sure if that was the case in 2010 though.

          – John
          May 16 '18 at 10:55








        4




        4





        I want to create a random unicode string generator which will generate printable characters.

        – Anindya Chatterjee
        Sep 22 '10 at 14:29





        I want to create a random unicode string generator which will generate printable characters.

        – Anindya Chatterjee
        Sep 22 '10 at 14:29




        5




        5





        Printable by whom? Do you want to include eg. all the Chinese characters? Many users won't have fonts for them, so ‘printing’ them would give you nothing, a blank box, or some other useless replacement character.

        – bobince
        Sep 22 '10 at 20:29





        Printable by whom? Do you want to include eg. all the Chinese characters? Many users won't have fonts for them, so ‘printing’ them would give you nothing, a blank box, or some other useless replacement character.

        – bobince
        Sep 22 '10 at 20:29




        5




        5





        One good reason is to avoid security exploits: bugzilla.mozilla.org/show_bug.cgi?id=968576

        – Neil McGuigan
        Apr 7 '15 at 20:46





        One good reason is to avoid security exploits: bugzilla.mozilla.org/show_bug.cgi?id=968576

        – Neil McGuigan
        Apr 7 '15 at 20:46













        @bobince My browser can display Chinese characters. Not sure if that was the case in 2010 though.

        – John
        May 16 '18 at 10:55





        @bobince My browser can display Chinese characters. Not sure if that was the case in 2010 though.

        – John
        May 16 '18 at 10:55













        13














        First, you should remove the word 'UTF8' in your question, it's not pertinent (UTF8 is just one of the encodings of Unicode, it's something orthogonal to your question).



        Second: the meaning of "printable/non printable" is less clear in Unicode. Perhaps you mean a "graphical character" ; and one can even dispute if a space is printable/graphical. The non-graphical characters would consist, basically, of control characters: the range 0x00-0x0f plus some others that are scattered.



        Anyway, the vast majority of Unicode characters (more than 200.000) are "graphical". But this certainly does not imply that they are printable in your environment.



        It seems to me a bad idea, if you intend to generate a "random printable" unicode string, to try to include all "printable" characters.






        share|improve this answer




























          13














          First, you should remove the word 'UTF8' in your question, it's not pertinent (UTF8 is just one of the encodings of Unicode, it's something orthogonal to your question).



          Second: the meaning of "printable/non printable" is less clear in Unicode. Perhaps you mean a "graphical character" ; and one can even dispute if a space is printable/graphical. The non-graphical characters would consist, basically, of control characters: the range 0x00-0x0f plus some others that are scattered.



          Anyway, the vast majority of Unicode characters (more than 200.000) are "graphical". But this certainly does not imply that they are printable in your environment.



          It seems to me a bad idea, if you intend to generate a "random printable" unicode string, to try to include all "printable" characters.






          share|improve this answer


























            13












            13








            13







            First, you should remove the word 'UTF8' in your question, it's not pertinent (UTF8 is just one of the encodings of Unicode, it's something orthogonal to your question).



            Second: the meaning of "printable/non printable" is less clear in Unicode. Perhaps you mean a "graphical character" ; and one can even dispute if a space is printable/graphical. The non-graphical characters would consist, basically, of control characters: the range 0x00-0x0f plus some others that are scattered.



            Anyway, the vast majority of Unicode characters (more than 200.000) are "graphical". But this certainly does not imply that they are printable in your environment.



            It seems to me a bad idea, if you intend to generate a "random printable" unicode string, to try to include all "printable" characters.






            share|improve this answer













            First, you should remove the word 'UTF8' in your question, it's not pertinent (UTF8 is just one of the encodings of Unicode, it's something orthogonal to your question).



            Second: the meaning of "printable/non printable" is less clear in Unicode. Perhaps you mean a "graphical character" ; and one can even dispute if a space is printable/graphical. The non-graphical characters would consist, basically, of control characters: the range 0x00-0x0f plus some others that are scattered.



            Anyway, the vast majority of Unicode characters (more than 200.000) are "graphical". But this certainly does not imply that they are printable in your environment.



            It seems to me a bad idea, if you intend to generate a "random printable" unicode string, to try to include all "printable" characters.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Sep 22 '10 at 14:55









            leonbloyleonbloy

            54.1k17107153




            54.1k17107153























                6














                This is an old question, but it is still valid and I think there is more to usefully, but briefly, say on the subject than is covered by existing answers.



                Unicode



                Unicode defines properties for characters.



                One of these properties is "General Category" which has Major classes and subclasses. The Major classes are Letter, Mark, Punctuation, Symbol, Separator, and Other.



                By knowing the properties of your characters, you can decide whether you consider them printable in your particular context.



                You must always remember that terms like "character" and "printable" are often difficult and have interesting edge-cases.





                Programming Language support



                Some programming languages assist with this problem.



                For example, the Go language has a "unicode" package which provides many useful Unicode-related functions including these two:



                func IsGraphic(r rune) bool

                IsGraphic reports whether the rune is defined as a Graphic by Unicode. Such
                characters include letters, marks, numbers, punctuation, symbols, and spaces,
                from categories L, M, N, P, S, Zs.

                func IsPrint(r rune) bool

                IsPrint reports whether the rune is defined as printable by Go. Such
                characters include letters, marks, numbers, punctuation, symbols, and
                the ASCII space character, from categories L, M, N, P, S and the ASCII
                space character. This categorization is the same as IsGraphic except
                that the only spacing character is ASCII space, U+0020.


                Notice that it says "defined as printable by Go" not by "defined as printable by Unicode". It is almost as if there are some depths the wizards at Unicode dare not plumb.





                Printable



                The more you learn about Unicode, the more you realise how unexpectedly diverse and unfathomably weird human writing systems are.



                In particular whether a particular "character" is printable is not always obvious.



                Is a zero-width space printable? When is a hyphenation point printable? Are there characters whose printability depends on their position in a word or on what characters are adjacent to them? Is a combining-character always printable?





                Footnotes




                ASCII printable character range is u0020 - u007f




                No it isn't. u007f is DEL which is not normally considered a printable character. It is, for example, associated with the keyboard key labelled "DEL" whose earliest purpose was to command the deletion of a character from some medium (display, file etc).



                In fact many 8-bit character sets have many non-consecutive ranges which are non-printable. See for example C0 and C1 controls.






                share|improve this answer






























                  6














                  This is an old question, but it is still valid and I think there is more to usefully, but briefly, say on the subject than is covered by existing answers.



                  Unicode



                  Unicode defines properties for characters.



                  One of these properties is "General Category" which has Major classes and subclasses. The Major classes are Letter, Mark, Punctuation, Symbol, Separator, and Other.



                  By knowing the properties of your characters, you can decide whether you consider them printable in your particular context.



                  You must always remember that terms like "character" and "printable" are often difficult and have interesting edge-cases.





                  Programming Language support



                  Some programming languages assist with this problem.



                  For example, the Go language has a "unicode" package which provides many useful Unicode-related functions including these two:



                  func IsGraphic(r rune) bool

                  IsGraphic reports whether the rune is defined as a Graphic by Unicode. Such
                  characters include letters, marks, numbers, punctuation, symbols, and spaces,
                  from categories L, M, N, P, S, Zs.

                  func IsPrint(r rune) bool

                  IsPrint reports whether the rune is defined as printable by Go. Such
                  characters include letters, marks, numbers, punctuation, symbols, and
                  the ASCII space character, from categories L, M, N, P, S and the ASCII
                  space character. This categorization is the same as IsGraphic except
                  that the only spacing character is ASCII space, U+0020.


                  Notice that it says "defined as printable by Go" not by "defined as printable by Unicode". It is almost as if there are some depths the wizards at Unicode dare not plumb.





                  Printable



                  The more you learn about Unicode, the more you realise how unexpectedly diverse and unfathomably weird human writing systems are.



                  In particular whether a particular "character" is printable is not always obvious.



                  Is a zero-width space printable? When is a hyphenation point printable? Are there characters whose printability depends on their position in a word or on what characters are adjacent to them? Is a combining-character always printable?





                  Footnotes




                  ASCII printable character range is u0020 - u007f




                  No it isn't. u007f is DEL which is not normally considered a printable character. It is, for example, associated with the keyboard key labelled "DEL" whose earliest purpose was to command the deletion of a character from some medium (display, file etc).



                  In fact many 8-bit character sets have many non-consecutive ranges which are non-printable. See for example C0 and C1 controls.






                  share|improve this answer




























                    6












                    6








                    6







                    This is an old question, but it is still valid and I think there is more to usefully, but briefly, say on the subject than is covered by existing answers.



                    Unicode



                    Unicode defines properties for characters.



                    One of these properties is "General Category" which has Major classes and subclasses. The Major classes are Letter, Mark, Punctuation, Symbol, Separator, and Other.



                    By knowing the properties of your characters, you can decide whether you consider them printable in your particular context.



                    You must always remember that terms like "character" and "printable" are often difficult and have interesting edge-cases.





                    Programming Language support



                    Some programming languages assist with this problem.



                    For example, the Go language has a "unicode" package which provides many useful Unicode-related functions including these two:



                    func IsGraphic(r rune) bool

                    IsGraphic reports whether the rune is defined as a Graphic by Unicode. Such
                    characters include letters, marks, numbers, punctuation, symbols, and spaces,
                    from categories L, M, N, P, S, Zs.

                    func IsPrint(r rune) bool

                    IsPrint reports whether the rune is defined as printable by Go. Such
                    characters include letters, marks, numbers, punctuation, symbols, and
                    the ASCII space character, from categories L, M, N, P, S and the ASCII
                    space character. This categorization is the same as IsGraphic except
                    that the only spacing character is ASCII space, U+0020.


                    Notice that it says "defined as printable by Go" not by "defined as printable by Unicode". It is almost as if there are some depths the wizards at Unicode dare not plumb.





                    Printable



                    The more you learn about Unicode, the more you realise how unexpectedly diverse and unfathomably weird human writing systems are.



                    In particular whether a particular "character" is printable is not always obvious.



                    Is a zero-width space printable? When is a hyphenation point printable? Are there characters whose printability depends on their position in a word or on what characters are adjacent to them? Is a combining-character always printable?





                    Footnotes




                    ASCII printable character range is u0020 - u007f




                    No it isn't. u007f is DEL which is not normally considered a printable character. It is, for example, associated with the keyboard key labelled "DEL" whose earliest purpose was to command the deletion of a character from some medium (display, file etc).



                    In fact many 8-bit character sets have many non-consecutive ranges which are non-printable. See for example C0 and C1 controls.






                    share|improve this answer















                    This is an old question, but it is still valid and I think there is more to usefully, but briefly, say on the subject than is covered by existing answers.



                    Unicode



                    Unicode defines properties for characters.



                    One of these properties is "General Category" which has Major classes and subclasses. The Major classes are Letter, Mark, Punctuation, Symbol, Separator, and Other.



                    By knowing the properties of your characters, you can decide whether you consider them printable in your particular context.



                    You must always remember that terms like "character" and "printable" are often difficult and have interesting edge-cases.





                    Programming Language support



                    Some programming languages assist with this problem.



                    For example, the Go language has a "unicode" package which provides many useful Unicode-related functions including these two:



                    func IsGraphic(r rune) bool

                    IsGraphic reports whether the rune is defined as a Graphic by Unicode. Such
                    characters include letters, marks, numbers, punctuation, symbols, and spaces,
                    from categories L, M, N, P, S, Zs.

                    func IsPrint(r rune) bool

                    IsPrint reports whether the rune is defined as printable by Go. Such
                    characters include letters, marks, numbers, punctuation, symbols, and
                    the ASCII space character, from categories L, M, N, P, S and the ASCII
                    space character. This categorization is the same as IsGraphic except
                    that the only spacing character is ASCII space, U+0020.


                    Notice that it says "defined as printable by Go" not by "defined as printable by Unicode". It is almost as if there are some depths the wizards at Unicode dare not plumb.





                    Printable



                    The more you learn about Unicode, the more you realise how unexpectedly diverse and unfathomably weird human writing systems are.



                    In particular whether a particular "character" is printable is not always obvious.



                    Is a zero-width space printable? When is a hyphenation point printable? Are there characters whose printability depends on their position in a word or on what characters are adjacent to them? Is a combining-character always printable?





                    Footnotes




                    ASCII printable character range is u0020 - u007f




                    No it isn't. u007f is DEL which is not normally considered a printable character. It is, for example, associated with the keyboard key labelled "DEL" whose earliest purpose was to command the deletion of a character from some medium (display, file etc).



                    In fact many 8-bit character sets have many non-consecutive ranges which are non-printable. See for example C0 and C1 controls.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Nov 23 '18 at 16:55

























                    answered Nov 23 '18 at 16:19









                    RedGrittyBrickRedGrittyBrick

                    2,46511937




                    2,46511937























                        3














                        What you should do is pick a font, and then generate a list of which Unicode characters have glyphs defined for your font. You can use a font library like freetype to test glyphs (test for FT_Get_Char_Index(...) != 0).






                        share|improve this answer




























                          3














                          What you should do is pick a font, and then generate a list of which Unicode characters have glyphs defined for your font. You can use a font library like freetype to test glyphs (test for FT_Get_Char_Index(...) != 0).






                          share|improve this answer


























                            3












                            3








                            3







                            What you should do is pick a font, and then generate a list of which Unicode characters have glyphs defined for your font. You can use a font library like freetype to test glyphs (test for FT_Get_Char_Index(...) != 0).






                            share|improve this answer













                            What you should do is pick a font, and then generate a list of which Unicode characters have glyphs defined for your font. You can use a font library like freetype to test glyphs (test for FT_Get_Char_Index(...) != 0).







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered May 3 '11 at 17:48









                            jkljkl

                            391




                            391























                                -7














                                Unicode, stict term, has no range. Numbers can go infinite.



                                What you gave is not UTF8 which has 1 byte for ASCII characters.



                                As for the range, I believe there is no range of printable characters. It always evolves. Check the page I gave above.






                                share|improve this answer



















                                • 8





                                  Afaik Unicode is only defined until 0x10ffff, beyond that no codepoints will be assigned

                                  – Sebastian
                                  Dec 3 '10 at 6:42
















                                -7














                                Unicode, stict term, has no range. Numbers can go infinite.



                                What you gave is not UTF8 which has 1 byte for ASCII characters.



                                As for the range, I believe there is no range of printable characters. It always evolves. Check the page I gave above.






                                share|improve this answer



















                                • 8





                                  Afaik Unicode is only defined until 0x10ffff, beyond that no codepoints will be assigned

                                  – Sebastian
                                  Dec 3 '10 at 6:42














                                -7












                                -7








                                -7







                                Unicode, stict term, has no range. Numbers can go infinite.



                                What you gave is not UTF8 which has 1 byte for ASCII characters.



                                As for the range, I believe there is no range of printable characters. It always evolves. Check the page I gave above.






                                share|improve this answer













                                Unicode, stict term, has no range. Numbers can go infinite.



                                What you gave is not UTF8 which has 1 byte for ASCII characters.



                                As for the range, I believe there is no range of printable characters. It always evolves. Check the page I gave above.







                                share|improve this answer












                                share|improve this answer



                                share|improve this answer










                                answered Sep 22 '10 at 14:21









                                WernightWernight

                                22.5k17100119




                                22.5k17100119








                                • 8





                                  Afaik Unicode is only defined until 0x10ffff, beyond that no codepoints will be assigned

                                  – Sebastian
                                  Dec 3 '10 at 6:42














                                • 8





                                  Afaik Unicode is only defined until 0x10ffff, beyond that no codepoints will be assigned

                                  – Sebastian
                                  Dec 3 '10 at 6:42








                                8




                                8





                                Afaik Unicode is only defined until 0x10ffff, beyond that no codepoints will be assigned

                                – Sebastian
                                Dec 3 '10 at 6:42





                                Afaik Unicode is only defined until 0x10ffff, beyond that no codepoints will be assigned

                                – Sebastian
                                Dec 3 '10 at 6:42


















                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f3770117%2fwhat-is-the-range-of-unicode-printable-characters%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                                Alcedinidae

                                Origin of the phrase “under your belt”?