extracting text information from a national id












0















I'm trying to do OCR arabic on the following ID but I get a very noisy picture, and can't extract information from it.



Here is my attempt



import tesserocr
from PIL import Image
import pytesseract
import matplotlib as plt
import cv2
import imutils
import numpy as np

image = cv2.imread(r'c:ahmedahmed.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.bilateralFilter(gray,11,18,18)

gray = cv2.GaussianBlur(gray,(5,5), 0)

kernel = np.ones((2,2), np.uint8)


gray = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY,11,2)
#img_dilation = cv2.erode(gray, kernel, iterations=1)


#cv2.imshow("dilation", img_dilation)

cv2.imshow("gray", gray)

text = pytesseract.image_to_string(gray, lang='ara')
print(text)
with open(r"c:ahmedfile.txt", "w", encoding="utf-8") as myfile:
myfile.write(text)
cv2.waitKey(0)


result



sample










share|improve this question























  • are you only interested in the text that is in the red box?

    – yapws87
    Nov 29 '18 at 3:25
















0















I'm trying to do OCR arabic on the following ID but I get a very noisy picture, and can't extract information from it.



Here is my attempt



import tesserocr
from PIL import Image
import pytesseract
import matplotlib as plt
import cv2
import imutils
import numpy as np

image = cv2.imread(r'c:ahmedahmed.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.bilateralFilter(gray,11,18,18)

gray = cv2.GaussianBlur(gray,(5,5), 0)

kernel = np.ones((2,2), np.uint8)


gray = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY,11,2)
#img_dilation = cv2.erode(gray, kernel, iterations=1)


#cv2.imshow("dilation", img_dilation)

cv2.imshow("gray", gray)

text = pytesseract.image_to_string(gray, lang='ara')
print(text)
with open(r"c:ahmedfile.txt", "w", encoding="utf-8") as myfile:
myfile.write(text)
cv2.waitKey(0)


result



sample










share|improve this question























  • are you only interested in the text that is in the red box?

    – yapws87
    Nov 29 '18 at 3:25














0












0








0


1






I'm trying to do OCR arabic on the following ID but I get a very noisy picture, and can't extract information from it.



Here is my attempt



import tesserocr
from PIL import Image
import pytesseract
import matplotlib as plt
import cv2
import imutils
import numpy as np

image = cv2.imread(r'c:ahmedahmed.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.bilateralFilter(gray,11,18,18)

gray = cv2.GaussianBlur(gray,(5,5), 0)

kernel = np.ones((2,2), np.uint8)


gray = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY,11,2)
#img_dilation = cv2.erode(gray, kernel, iterations=1)


#cv2.imshow("dilation", img_dilation)

cv2.imshow("gray", gray)

text = pytesseract.image_to_string(gray, lang='ara')
print(text)
with open(r"c:ahmedfile.txt", "w", encoding="utf-8") as myfile:
myfile.write(text)
cv2.waitKey(0)


result



sample










share|improve this question














I'm trying to do OCR arabic on the following ID but I get a very noisy picture, and can't extract information from it.



Here is my attempt



import tesserocr
from PIL import Image
import pytesseract
import matplotlib as plt
import cv2
import imutils
import numpy as np

image = cv2.imread(r'c:ahmedahmed.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.bilateralFilter(gray,11,18,18)

gray = cv2.GaussianBlur(gray,(5,5), 0)

kernel = np.ones((2,2), np.uint8)


gray = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY,11,2)
#img_dilation = cv2.erode(gray, kernel, iterations=1)


#cv2.imshow("dilation", img_dilation)

cv2.imshow("gray", gray)

text = pytesseract.image_to_string(gray, lang='ara')
print(text)
with open(r"c:ahmedfile.txt", "w", encoding="utf-8") as myfile:
myfile.write(text)
cv2.waitKey(0)


result



sample







python opencv computer-vision tesseract






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 22 '18 at 13:36









chris burgeeschris burgees

105




105













  • are you only interested in the text that is in the red box?

    – yapws87
    Nov 29 '18 at 3:25



















  • are you only interested in the text that is in the red box?

    – yapws87
    Nov 29 '18 at 3:25

















are you only interested in the text that is in the red box?

– yapws87
Nov 29 '18 at 3:25





are you only interested in the text that is in the red box?

– yapws87
Nov 29 '18 at 3:25












2 Answers
2






active

oldest

votes


















0














This is my output using ImageMagick TextCleaner script:



out



Script: textcleaner -g -e stretch -f 50 -o 30 -s 1 C:/Users/PC/Desktop/id.jpg C:/Users/PC/Desktop/out.png



Take a look here if you want to install and use TextCleaner script on Windows... It's a tutorial I made as simple as possible after few researches I made when I was in your same situation.



Now it should be very easy to detect the text and (not sure how simple) recognize it.






share|improve this answer

































    0














    The text for your id is in black color which makes the extraction process easy. All you need to do is threshold the dark pixels and you should be able to get the text out.



    Here is a snip of the code



    import cv2
    import numpy as np

    # load image in grayscale
    image = cv2.imread('AVXjv.jpg',0)

    # remove noise
    dst = cv2.blur(image,(3,3))

    # extract dark regions which corresponds to text
    val, dst = cv2.threshold(dst,80,255,cv2.THRESH_BINARY_INV)

    # morphological close to connect seperated blobs
    dst = cv2.dilate(dst,None)
    dst = cv2.erode(dst,None)

    cv2.imshow("dst",dst)
    cv2.waitKey(0)


    And here is the result:



    enter image description here






    share|improve this answer
























    • There is a green text up which is not there in the thresholded picture

      – chris burgees
      Nov 29 '18 at 22:26











    • You can try processing the different channels separately to extract the green text.

      – Qidi
      Nov 30 '18 at 8:36











    • @Qidi Can you post an answer with your approach ?

      – chris burgees
      Nov 30 '18 at 10:09











    • Why would you need the green text? Isn't it the same for all the ids?

      – yapws87
      Dec 2 '18 at 1:44











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53432201%2fextracting-text-information-from-a-national-id%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    This is my output using ImageMagick TextCleaner script:



    out



    Script: textcleaner -g -e stretch -f 50 -o 30 -s 1 C:/Users/PC/Desktop/id.jpg C:/Users/PC/Desktop/out.png



    Take a look here if you want to install and use TextCleaner script on Windows... It's a tutorial I made as simple as possible after few researches I made when I was in your same situation.



    Now it should be very easy to detect the text and (not sure how simple) recognize it.






    share|improve this answer






























      0














      This is my output using ImageMagick TextCleaner script:



      out



      Script: textcleaner -g -e stretch -f 50 -o 30 -s 1 C:/Users/PC/Desktop/id.jpg C:/Users/PC/Desktop/out.png



      Take a look here if you want to install and use TextCleaner script on Windows... It's a tutorial I made as simple as possible after few researches I made when I was in your same situation.



      Now it should be very easy to detect the text and (not sure how simple) recognize it.






      share|improve this answer




























        0












        0








        0







        This is my output using ImageMagick TextCleaner script:



        out



        Script: textcleaner -g -e stretch -f 50 -o 30 -s 1 C:/Users/PC/Desktop/id.jpg C:/Users/PC/Desktop/out.png



        Take a look here if you want to install and use TextCleaner script on Windows... It's a tutorial I made as simple as possible after few researches I made when I was in your same situation.



        Now it should be very easy to detect the text and (not sure how simple) recognize it.






        share|improve this answer















        This is my output using ImageMagick TextCleaner script:



        out



        Script: textcleaner -g -e stretch -f 50 -o 30 -s 1 C:/Users/PC/Desktop/id.jpg C:/Users/PC/Desktop/out.png



        Take a look here if you want to install and use TextCleaner script on Windows... It's a tutorial I made as simple as possible after few researches I made when I was in your same situation.



        Now it should be very easy to detect the text and (not sure how simple) recognize it.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 28 '18 at 17:07

























        answered Nov 28 '18 at 16:58









        LinkLink

        975533




        975533

























            0














            The text for your id is in black color which makes the extraction process easy. All you need to do is threshold the dark pixels and you should be able to get the text out.



            Here is a snip of the code



            import cv2
            import numpy as np

            # load image in grayscale
            image = cv2.imread('AVXjv.jpg',0)

            # remove noise
            dst = cv2.blur(image,(3,3))

            # extract dark regions which corresponds to text
            val, dst = cv2.threshold(dst,80,255,cv2.THRESH_BINARY_INV)

            # morphological close to connect seperated blobs
            dst = cv2.dilate(dst,None)
            dst = cv2.erode(dst,None)

            cv2.imshow("dst",dst)
            cv2.waitKey(0)


            And here is the result:



            enter image description here






            share|improve this answer
























            • There is a green text up which is not there in the thresholded picture

              – chris burgees
              Nov 29 '18 at 22:26











            • You can try processing the different channels separately to extract the green text.

              – Qidi
              Nov 30 '18 at 8:36











            • @Qidi Can you post an answer with your approach ?

              – chris burgees
              Nov 30 '18 at 10:09











            • Why would you need the green text? Isn't it the same for all the ids?

              – yapws87
              Dec 2 '18 at 1:44
















            0














            The text for your id is in black color which makes the extraction process easy. All you need to do is threshold the dark pixels and you should be able to get the text out.



            Here is a snip of the code



            import cv2
            import numpy as np

            # load image in grayscale
            image = cv2.imread('AVXjv.jpg',0)

            # remove noise
            dst = cv2.blur(image,(3,3))

            # extract dark regions which corresponds to text
            val, dst = cv2.threshold(dst,80,255,cv2.THRESH_BINARY_INV)

            # morphological close to connect seperated blobs
            dst = cv2.dilate(dst,None)
            dst = cv2.erode(dst,None)

            cv2.imshow("dst",dst)
            cv2.waitKey(0)


            And here is the result:



            enter image description here






            share|improve this answer
























            • There is a green text up which is not there in the thresholded picture

              – chris burgees
              Nov 29 '18 at 22:26











            • You can try processing the different channels separately to extract the green text.

              – Qidi
              Nov 30 '18 at 8:36











            • @Qidi Can you post an answer with your approach ?

              – chris burgees
              Nov 30 '18 at 10:09











            • Why would you need the green text? Isn't it the same for all the ids?

              – yapws87
              Dec 2 '18 at 1:44














            0












            0








            0







            The text for your id is in black color which makes the extraction process easy. All you need to do is threshold the dark pixels and you should be able to get the text out.



            Here is a snip of the code



            import cv2
            import numpy as np

            # load image in grayscale
            image = cv2.imread('AVXjv.jpg',0)

            # remove noise
            dst = cv2.blur(image,(3,3))

            # extract dark regions which corresponds to text
            val, dst = cv2.threshold(dst,80,255,cv2.THRESH_BINARY_INV)

            # morphological close to connect seperated blobs
            dst = cv2.dilate(dst,None)
            dst = cv2.erode(dst,None)

            cv2.imshow("dst",dst)
            cv2.waitKey(0)


            And here is the result:



            enter image description here






            share|improve this answer













            The text for your id is in black color which makes the extraction process easy. All you need to do is threshold the dark pixels and you should be able to get the text out.



            Here is a snip of the code



            import cv2
            import numpy as np

            # load image in grayscale
            image = cv2.imread('AVXjv.jpg',0)

            # remove noise
            dst = cv2.blur(image,(3,3))

            # extract dark regions which corresponds to text
            val, dst = cv2.threshold(dst,80,255,cv2.THRESH_BINARY_INV)

            # morphological close to connect seperated blobs
            dst = cv2.dilate(dst,None)
            dst = cv2.erode(dst,None)

            cv2.imshow("dst",dst)
            cv2.waitKey(0)


            And here is the result:



            enter image description here







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 29 '18 at 14:52









            yapws87yapws87

            1,013213




            1,013213













            • There is a green text up which is not there in the thresholded picture

              – chris burgees
              Nov 29 '18 at 22:26











            • You can try processing the different channels separately to extract the green text.

              – Qidi
              Nov 30 '18 at 8:36











            • @Qidi Can you post an answer with your approach ?

              – chris burgees
              Nov 30 '18 at 10:09











            • Why would you need the green text? Isn't it the same for all the ids?

              – yapws87
              Dec 2 '18 at 1:44



















            • There is a green text up which is not there in the thresholded picture

              – chris burgees
              Nov 29 '18 at 22:26











            • You can try processing the different channels separately to extract the green text.

              – Qidi
              Nov 30 '18 at 8:36











            • @Qidi Can you post an answer with your approach ?

              – chris burgees
              Nov 30 '18 at 10:09











            • Why would you need the green text? Isn't it the same for all the ids?

              – yapws87
              Dec 2 '18 at 1:44

















            There is a green text up which is not there in the thresholded picture

            – chris burgees
            Nov 29 '18 at 22:26





            There is a green text up which is not there in the thresholded picture

            – chris burgees
            Nov 29 '18 at 22:26













            You can try processing the different channels separately to extract the green text.

            – Qidi
            Nov 30 '18 at 8:36





            You can try processing the different channels separately to extract the green text.

            – Qidi
            Nov 30 '18 at 8:36













            @Qidi Can you post an answer with your approach ?

            – chris burgees
            Nov 30 '18 at 10:09





            @Qidi Can you post an answer with your approach ?

            – chris burgees
            Nov 30 '18 at 10:09













            Why would you need the green text? Isn't it the same for all the ids?

            – yapws87
            Dec 2 '18 at 1:44





            Why would you need the green text? Isn't it the same for all the ids?

            – yapws87
            Dec 2 '18 at 1:44


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53432201%2fextracting-text-information-from-a-national-id%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

            Alcedinidae

            Origin of the phrase “under your belt”?