extracting text information from a national id

I'm trying to do OCR arabic on the following ID but I get a very noisy picture, and can't extract information from it.

Here is my attempt

import tesserocr

from PIL import Image

import pytesseract

import matplotlib as plt

import cv2

import imutils

import numpy as np



image = cv2.imread(r'c:ahmedahmed.jpg')

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

gray = cv2.bilateralFilter(gray,11,18,18)



gray = cv2.GaussianBlur(gray,(5,5), 0)



kernel = np.ones((2,2), np.uint8)





gray = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,

            cv2.THRESH_BINARY,11,2)

#img_dilation = cv2.erode(gray, kernel, iterations=1)





#cv2.imshow("dilation", img_dilation)



cv2.imshow("gray", gray)



text = pytesseract.image_to_string(gray, lang='ara')

print(text)

with open(r"c:ahmedfile.txt", "w", encoding="utf-8") as myfile:

    myfile.write(text)

cv2.waitKey(0)

result

sample

asked Nov 22 '18 at 13:36

chris burgees

105

are you only interested in the text that is in the red box?

– yapws87
Nov 29 '18 at 3:25

add a comment |

I'm trying to do OCR arabic on the following ID but I get a very noisy picture, and can't extract information from it.

Here is my attempt

import tesserocr

from PIL import Image

import pytesseract

import matplotlib as plt

import cv2

import imutils

import numpy as np



image = cv2.imread(r'c:ahmedahmed.jpg')

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

gray = cv2.bilateralFilter(gray,11,18,18)



gray = cv2.GaussianBlur(gray,(5,5), 0)



kernel = np.ones((2,2), np.uint8)





gray = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,

            cv2.THRESH_BINARY,11,2)

#img_dilation = cv2.erode(gray, kernel, iterations=1)





#cv2.imshow("dilation", img_dilation)



cv2.imshow("gray", gray)



text = pytesseract.image_to_string(gray, lang='ara')

print(text)

with open(r"c:ahmedfile.txt", "w", encoding="utf-8") as myfile:

    myfile.write(text)

cv2.waitKey(0)

result

sample

asked Nov 22 '18 at 13:36

chris burgees

105

are you only interested in the text that is in the red box?

– yapws87
Nov 29 '18 at 3:25

add a comment |

I'm trying to do OCR arabic on the following ID but I get a very noisy picture, and can't extract information from it.

Here is my attempt

import tesserocr

from PIL import Image

import pytesseract

import matplotlib as plt

import cv2

import imutils

import numpy as np



image = cv2.imread(r'c:ahmedahmed.jpg')

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

gray = cv2.bilateralFilter(gray,11,18,18)



gray = cv2.GaussianBlur(gray,(5,5), 0)



kernel = np.ones((2,2), np.uint8)





gray = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,

            cv2.THRESH_BINARY,11,2)

#img_dilation = cv2.erode(gray, kernel, iterations=1)





#cv2.imshow("dilation", img_dilation)



cv2.imshow("gray", gray)



text = pytesseract.image_to_string(gray, lang='ara')

print(text)

with open(r"c:ahmedfile.txt", "w", encoding="utf-8") as myfile:

    myfile.write(text)

cv2.waitKey(0)

result

sample

asked Nov 22 '18 at 13:36

chris burgees

105

I'm trying to do OCR arabic on the following ID but I get a very noisy picture, and can't extract information from it.

Here is my attempt

import tesserocr

from PIL import Image

import pytesseract

import matplotlib as plt

import cv2

import imutils

import numpy as np



image = cv2.imread(r'c:ahmedahmed.jpg')

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

gray = cv2.bilateralFilter(gray,11,18,18)



gray = cv2.GaussianBlur(gray,(5,5), 0)



kernel = np.ones((2,2), np.uint8)





gray = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,

            cv2.THRESH_BINARY,11,2)

#img_dilation = cv2.erode(gray, kernel, iterations=1)





#cv2.imshow("dilation", img_dilation)



cv2.imshow("gray", gray)



text = pytesseract.image_to_string(gray, lang='ara')

print(text)

with open(r"c:ahmedfile.txt", "w", encoding="utf-8") as myfile:

    myfile.write(text)

cv2.waitKey(0)

result

sample

python opencv computer-vision tesseract

asked Nov 22 '18 at 13:36

chris burgees

105

asked Nov 22 '18 at 13:36

chris burgees

105

asked Nov 22 '18 at 13:36

chris burgees

105

asked Nov 22 '18 at 13:36

chris burgees

105

asked Nov 22 '18 at 13:36

chris burgees

105

are you only interested in the text that is in the red box?

– yapws87
Nov 29 '18 at 3:25

add a comment |

are you only interested in the text that is in the red box?

– yapws87
Nov 29 '18 at 3:25

are you only interested in the text that is in the red box?

– yapws87
Nov 29 '18 at 3:25

add a comment |

2 Answers
2

active

oldest

votes

This is my output using ImageMagick TextCleaner script:

out

Script: textcleaner -g -e stretch -f 50 -o 30 -s 1 C:/Users/PC/Desktop/id.jpg C:/Users/PC/Desktop/out.png

Take a look here if you want to install and use TextCleaner script on Windows... It's a tutorial I made as simple as possible after few researches I made when I was in your same situation.

Now it should be very easy to detect the text and (not sure how simple) recognize it.

edited Nov 28 '18 at 17:07

answered Nov 28 '18 at 16:58

Link

975533

add a comment |

The text for your id is in black color which makes the extraction process easy. All you need to do is threshold the dark pixels and you should be able to get the text out.

Here is a snip of the code

import cv2

import numpy as np



# load image in grayscale

image = cv2.imread('AVXjv.jpg',0)



# remove noise

dst = cv2.blur(image,(3,3))



# extract dark regions which corresponds to text

val, dst = cv2.threshold(dst,80,255,cv2.THRESH_BINARY_INV)



# morphological close to connect seperated blobs

dst = cv2.dilate(dst,None)

dst = cv2.erode(dst,None)



cv2.imshow("dst",dst)

cv2.waitKey(0)

And here is the result:

enter image description here

answered Nov 29 '18 at 14:52

yapws87

1,013213

There is a green text up which is not there in the thresholded picture

– chris burgees
Nov 29 '18 at 22:26

You can try processing the different channels separately to extract the green text.

– Qidi
Nov 30 '18 at 8:36

@Qidi Can you post an answer with your approach ?

– chris burgees
Nov 30 '18 at 10:09

Why would you need the green text? Isn't it the same for all the ids?

– yapws87
Dec 2 '18 at 1:44

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53432201%2fextracting-text-information-from-a-national-id%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

This is my output using ImageMagick TextCleaner script:

out

Script: textcleaner -g -e stretch -f 50 -o 30 -s 1 C:/Users/PC/Desktop/id.jpg C:/Users/PC/Desktop/out.png

Take a look here if you want to install and use TextCleaner script on Windows... It's a tutorial I made as simple as possible after few researches I made when I was in your same situation.

Now it should be very easy to detect the text and (not sure how simple) recognize it.

edited Nov 28 '18 at 17:07

answered Nov 28 '18 at 16:58

Link

975533

add a comment |

This is my output using ImageMagick TextCleaner script:

out

Script: textcleaner -g -e stretch -f 50 -o 30 -s 1 C:/Users/PC/Desktop/id.jpg C:/Users/PC/Desktop/out.png

Take a look here if you want to install and use TextCleaner script on Windows... It's a tutorial I made as simple as possible after few researches I made when I was in your same situation.

Now it should be very easy to detect the text and (not sure how simple) recognize it.

edited Nov 28 '18 at 17:07

answered Nov 28 '18 at 16:58

Link

975533

add a comment |

This is my output using ImageMagick TextCleaner script:

out

Script: textcleaner -g -e stretch -f 50 -o 30 -s 1 C:/Users/PC/Desktop/id.jpg C:/Users/PC/Desktop/out.png

Take a look here if you want to install and use TextCleaner script on Windows... It's a tutorial I made as simple as possible after few researches I made when I was in your same situation.

Now it should be very easy to detect the text and (not sure how simple) recognize it.

edited Nov 28 '18 at 17:07

answered Nov 28 '18 at 16:58

Link

975533

This is my output using ImageMagick TextCleaner script:

out

Script: textcleaner -g -e stretch -f 50 -o 30 -s 1 C:/Users/PC/Desktop/id.jpg C:/Users/PC/Desktop/out.png

Take a look here if you want to install and use TextCleaner script on Windows... It's a tutorial I made as simple as possible after few researches I made when I was in your same situation.

Now it should be very easy to detect the text and (not sure how simple) recognize it.

edited Nov 28 '18 at 17:07

answered Nov 28 '18 at 16:58

Link

975533

edited Nov 28 '18 at 17:07

answered Nov 28 '18 at 16:58

Link

975533

answered Nov 28 '18 at 16:58

Link

975533

answered Nov 28 '18 at 16:58

Link

975533

add a comment |

The text for your id is in black color which makes the extraction process easy. All you need to do is threshold the dark pixels and you should be able to get the text out.

Here is a snip of the code

import cv2

import numpy as np



# load image in grayscale

image = cv2.imread('AVXjv.jpg',0)



# remove noise

dst = cv2.blur(image,(3,3))



# extract dark regions which corresponds to text

val, dst = cv2.threshold(dst,80,255,cv2.THRESH_BINARY_INV)



# morphological close to connect seperated blobs

dst = cv2.dilate(dst,None)

dst = cv2.erode(dst,None)



cv2.imshow("dst",dst)

cv2.waitKey(0)

And here is the result:

enter image description here

answered Nov 29 '18 at 14:52

yapws87

1,013213

There is a green text up which is not there in the thresholded picture

– chris burgees
Nov 29 '18 at 22:26

You can try processing the different channels separately to extract the green text.

– Qidi
Nov 30 '18 at 8:36

@Qidi Can you post an answer with your approach ?

– chris burgees
Nov 30 '18 at 10:09

Why would you need the green text? Isn't it the same for all the ids?

– yapws87
Dec 2 '18 at 1:44

add a comment |

The text for your id is in black color which makes the extraction process easy. All you need to do is threshold the dark pixels and you should be able to get the text out.

Here is a snip of the code

import cv2

import numpy as np



# load image in grayscale

image = cv2.imread('AVXjv.jpg',0)



# remove noise

dst = cv2.blur(image,(3,3))



# extract dark regions which corresponds to text

val, dst = cv2.threshold(dst,80,255,cv2.THRESH_BINARY_INV)



# morphological close to connect seperated blobs

dst = cv2.dilate(dst,None)

dst = cv2.erode(dst,None)



cv2.imshow("dst",dst)

cv2.waitKey(0)

And here is the result:

enter image description here

answered Nov 29 '18 at 14:52

yapws87

1,013213

There is a green text up which is not there in the thresholded picture

– chris burgees
Nov 29 '18 at 22:26

You can try processing the different channels separately to extract the green text.

– Qidi
Nov 30 '18 at 8:36

@Qidi Can you post an answer with your approach ?

– chris burgees
Nov 30 '18 at 10:09

Why would you need the green text? Isn't it the same for all the ids?

– yapws87
Dec 2 '18 at 1:44

add a comment |

The text for your id is in black color which makes the extraction process easy. All you need to do is threshold the dark pixels and you should be able to get the text out.

Here is a snip of the code

import cv2

import numpy as np



# load image in grayscale

image = cv2.imread('AVXjv.jpg',0)



# remove noise

dst = cv2.blur(image,(3,3))



# extract dark regions which corresponds to text

val, dst = cv2.threshold(dst,80,255,cv2.THRESH_BINARY_INV)



# morphological close to connect seperated blobs

dst = cv2.dilate(dst,None)

dst = cv2.erode(dst,None)



cv2.imshow("dst",dst)

cv2.waitKey(0)

And here is the result:

enter image description here

answered Nov 29 '18 at 14:52

yapws87

1,013213

The text for your id is in black color which makes the extraction process easy. All you need to do is threshold the dark pixels and you should be able to get the text out.

Here is a snip of the code

import cv2

import numpy as np



# load image in grayscale

image = cv2.imread('AVXjv.jpg',0)



# remove noise

dst = cv2.blur(image,(3,3))



# extract dark regions which corresponds to text

val, dst = cv2.threshold(dst,80,255,cv2.THRESH_BINARY_INV)



# morphological close to connect seperated blobs

dst = cv2.dilate(dst,None)

dst = cv2.erode(dst,None)



cv2.imshow("dst",dst)

cv2.waitKey(0)

And here is the result:

enter image description here

answered Nov 29 '18 at 14:52

yapws87

1,013213

answered Nov 29 '18 at 14:52

yapws87

1,013213

answered Nov 29 '18 at 14:52

yapws87

1,013213

answered Nov 29 '18 at 14:52

yapws87

1,013213

There is a green text up which is not there in the thresholded picture

– chris burgees
Nov 29 '18 at 22:26

You can try processing the different channels separately to extract the green text.

– Qidi
Nov 30 '18 at 8:36

@Qidi Can you post an answer with your approach ?

– chris burgees
Nov 30 '18 at 10:09

Why would you need the green text? Isn't it the same for all the ids?

– yapws87
Dec 2 '18 at 1:44

add a comment |

There is a green text up which is not there in the thresholded picture

– chris burgees
Nov 29 '18 at 22:26

You can try processing the different channels separately to extract the green text.

– Qidi
Nov 30 '18 at 8:36

@Qidi Can you post an answer with your approach ?

– chris burgees
Nov 30 '18 at 10:09

Why would you need the green text? Isn't it the same for all the ids?

– yapws87
Dec 2 '18 at 1:44

There is a green text up which is not there in the thresholded picture

– chris burgees
Nov 29 '18 at 22:26

You can try processing the different channels separately to extract the green text.

– Qidi
Nov 30 '18 at 8:36

@Qidi Can you post an answer with your approach ?

– chris burgees
Nov 30 '18 at 10:09

Why would you need the green text? Isn't it the same for all the ids?

– yapws87
Dec 2 '18 at 1:44

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr