Splitting string ID code into various parts

I have a series of identification codes that I need to split out. The format of these codes is [region(letter)][district(number)] - [place(number)][subdistrict(letter)].

An example of some codes includes S22-201, TT100-12, and V6-1B. Often there is no subdistrict, and all points fall within the same larger district (so no As or Cs or whatever at the end of the string.

I can do parts of it, like splitting at the hyphen.

!Original_ID!.split('-')[0]

and then extracting the district

!Split_ID![1:3]

But it seems like two steps for this are unnecessary, and only works when I know the specific number of characters in the string, which isn't realistic for a large data set.

I'd like to be able to grab each piece at once:

letters on the left of the hyphen

numbers on the left of the hyphen

numbers on the right of the hyphen

letters (if any) on the right of the hyphen.

I'd need the numeric fields to be integers (or I guess possibly floats in some rare cases maybe).

I am still not doing something correctly. I may need to start smaller and brush up on my Python before I do this, I just assumed this would be a good place to start learning. Here's where I am at, in the Python window in ArcMap.

with arcpy.da.UpdateCursor("Wet_Sub",['Flag_ID','District','Split_ID']) as uCur:

for sRow in uCur:

    OrigID  = sRow[0].split('-')[0] # first element in the Original_ID

    charRng = range(len(OrigID)) # a range to iterate over

    Chars   = ''

    Numbers = ''

    for Idx in charRng:

        if OrigID[Idx].isnumeric():

            Numbers += OrigID[Idx]

        else:

            chars += OrigID[Idx]

    sRow[1] = float(Numbers)

    sRow[2] = Chars

    uCur.updateRow(sRow)

"Wet_Sub" and 'Flag_ID' are the names of the feature class and actual original field. I also tried to follow along with user2856's suggestion. It looks like I may need to be using both of those code blocks, one pasted into another, but I wasn't sure how to fit them together and what parts to change/remove (e.g. "etc... from code block above").

edited 10 hours ago

PolyGeo♦

53.7k1781244

asked yesterday

vce500

New contributor

add a comment |

I have a series of identification codes that I need to split out. The format of these codes is [region(letter)][district(number)] - [place(number)][subdistrict(letter)].

I can do parts of it, like splitting at the hyphen.

!Original_ID!.split('-')[0]

and then extracting the district

!Split_ID![1:3]

But it seems like two steps for this are unnecessary, and only works when I know the specific number of characters in the string, which isn't realistic for a large data set.

I'd like to be able to grab each piece at once:

letters on the left of the hyphen

numbers on the left of the hyphen

numbers on the right of the hyphen

letters (if any) on the right of the hyphen.

I'd need the numeric fields to be integers (or I guess possibly floats in some rare cases maybe).

with arcpy.da.UpdateCursor("Wet_Sub",['Flag_ID','District','Split_ID']) as uCur:

for sRow in uCur:

    OrigID  = sRow[0].split('-')[0] # first element in the Original_ID

    charRng = range(len(OrigID)) # a range to iterate over

    Chars   = ''

    Numbers = ''

    for Idx in charRng:

        if OrigID[Idx].isnumeric():

            Numbers += OrigID[Idx]

        else:

            chars += OrigID[Idx]

    sRow[1] = float(Numbers)

    sRow[2] = Chars

    uCur.updateRow(sRow)

edited 10 hours ago

PolyGeo♦

53.7k1781244

asked yesterday

vce500

New contributor

add a comment |

I have a series of identification codes that I need to split out. The format of these codes is [region(letter)][district(number)] - [place(number)][subdistrict(letter)].

I can do parts of it, like splitting at the hyphen.

!Original_ID!.split('-')[0]

and then extracting the district

!Split_ID![1:3]

But it seems like two steps for this are unnecessary, and only works when I know the specific number of characters in the string, which isn't realistic for a large data set.

I'd like to be able to grab each piece at once:

letters on the left of the hyphen

numbers on the left of the hyphen

numbers on the right of the hyphen

letters (if any) on the right of the hyphen.

I'd need the numeric fields to be integers (or I guess possibly floats in some rare cases maybe).

with arcpy.da.UpdateCursor("Wet_Sub",['Flag_ID','District','Split_ID']) as uCur:

for sRow in uCur:

    OrigID  = sRow[0].split('-')[0] # first element in the Original_ID

    charRng = range(len(OrigID)) # a range to iterate over

    Chars   = ''

    Numbers = ''

    for Idx in charRng:

        if OrigID[Idx].isnumeric():

            Numbers += OrigID[Idx]

        else:

            chars += OrigID[Idx]

    sRow[1] = float(Numbers)

    sRow[2] = Chars

    uCur.updateRow(sRow)

edited 10 hours ago

PolyGeo♦

53.7k1781244

asked yesterday

vce500

New contributor

I have a series of identification codes that I need to split out. The format of these codes is [region(letter)][district(number)] - [place(number)][subdistrict(letter)].

I can do parts of it, like splitting at the hyphen.

!Original_ID!.split('-')[0]

and then extracting the district

!Split_ID![1:3]

But it seems like two steps for this are unnecessary, and only works when I know the specific number of characters in the string, which isn't realistic for a large data set.

I'd like to be able to grab each piece at once:

letters on the left of the hyphen

numbers on the left of the hyphen

numbers on the right of the hyphen

letters (if any) on the right of the hyphen.

I'd need the numeric fields to be integers (or I guess possibly floats in some rare cases maybe).

with arcpy.da.UpdateCursor("Wet_Sub",['Flag_ID','District','Split_ID']) as uCur:

for sRow in uCur:

    OrigID  = sRow[0].split('-')[0] # first element in the Original_ID

    charRng = range(len(OrigID)) # a range to iterate over

    Chars   = ''

    Numbers = ''

    for Idx in charRng:

        if OrigID[Idx].isnumeric():

            Numbers += OrigID[Idx]

        else:

            chars += OrigID[Idx]

    sRow[1] = float(Numbers)

    sRow[2] = Chars

    uCur.updateRow(sRow)

arcgis-desktop arcmap field-calculator python-parser

edited 10 hours ago

PolyGeo♦

53.7k1781244

asked yesterday

vce500

New contributor

edited 10 hours ago

PolyGeo♦

53.7k1781244

asked yesterday

vce500

New contributor

edited 10 hours ago

PolyGeo♦

53.7k1781244

edited 10 hours ago

PolyGeo♦

53.7k1781244

edited 10 hours ago

PolyGeo♦

53.7k1781244

asked yesterday

vce500

New contributor

asked yesterday

vce500

asked yesterday

vce500

New contributor

vce500 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

2 Answers
2

active

oldest

votes

You're not going to be able to calculate two fields in one go.. though you can split it up into two calcs. I would do this with an update cursor:

with arcpy.da.UpdateCursor(YourFeatureClass,['Original_ID','District','Split_ID']) as uCur:

    for sRow in uCur:

        OrigID  = sRow[0].split('-')[0] # first element in the Original_ID

        charRng = range(len(OrigID)) # a range to iterate over

        Chars   = ''

        Numbers = ''

        for Idx in charRng:

            if OrigID[Idx].isnumeric():

                Numbers += OrigID[Idx]

            else:

                chars += OrigID[Idx]

        sRow[1] = float(Numbers)

        sRow[2] = Chars

        uCur.updateRow(sRow)

This shows how to break up a string into numbers and not numbers and put the values into a row, it should give you some ideas where to start from.

answered yesterday

Michael Stimson

21.6k22460

add a comment |

Assuming you have four fields, region, district, place and subdistrict already added and you want to use the field calculator to populate them. You would have to run the calculator four times using an expression like:

Code Block

import re

def parse(s):

    """The format of these codes is [region(letter)][district(number)] - [place(number)][subdistrict(letter)].

    An example of a some codes include S22-201, TT100-12, and V6-1B.

    Often there is no subdistrict, and all points fall within the same larger district

    (so no As or Cs or whatever at the end of the string)."""



    letters = re.findall(r'[a-z A-Z]+', s)

    numbers = re.findall(r'[0-9]+', s)



    region = letters[0]

    district, place = [int(n) for n in numbers]

    try:

        subdistrict = letters[1]

    except IndexError:

        subdistrict = None



    return region, district, place, subdistrict

Then for the region field, use:

parse(!Original_ID!)[0]

For district:

parse(!Original_ID!)[1]

For place:

parse(!Original_ID!)[2]

For subdistrict:

parse(!Original_ID!)[3]

However, I would use the update cursor approach suggested by Michael Stimson so you could update all four fields in one hit. Use the following in the python window of ArcMap/ArcGIS Pro:

import re 

def parse(s): 

    etc... from code block above



with arcpy.da.UpdateCursor(YourFeatureClass, ['Original_ID','Region', 'District', 'Place', 'Subdistrict']) as rows:

    for row in rows:

        region, district, place, subdistrict = parse(row[0])

        row = [row[0], region, district, place, subdistrict]

        rows.updateRow(row)

edited yesterday

answered yesterday

user2856

30.3k258105

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "79"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

vce500 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f315591%2fsplitting-string-id-code-into-various-parts%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You're not going to be able to calculate two fields in one go.. though you can split it up into two calcs. I would do this with an update cursor:

with arcpy.da.UpdateCursor(YourFeatureClass,['Original_ID','District','Split_ID']) as uCur:

    for sRow in uCur:

        OrigID  = sRow[0].split('-')[0] # first element in the Original_ID

        charRng = range(len(OrigID)) # a range to iterate over

        Chars   = ''

        Numbers = ''

        for Idx in charRng:

            if OrigID[Idx].isnumeric():

                Numbers += OrigID[Idx]

            else:

                chars += OrigID[Idx]

        sRow[1] = float(Numbers)

        sRow[2] = Chars

        uCur.updateRow(sRow)

This shows how to break up a string into numbers and not numbers and put the values into a row, it should give you some ideas where to start from.

answered yesterday

Michael Stimson

21.6k22460

add a comment |

You're not going to be able to calculate two fields in one go.. though you can split it up into two calcs. I would do this with an update cursor:

with arcpy.da.UpdateCursor(YourFeatureClass,['Original_ID','District','Split_ID']) as uCur:

    for sRow in uCur:

        OrigID  = sRow[0].split('-')[0] # first element in the Original_ID

        charRng = range(len(OrigID)) # a range to iterate over

        Chars   = ''

        Numbers = ''

        for Idx in charRng:

            if OrigID[Idx].isnumeric():

                Numbers += OrigID[Idx]

            else:

                chars += OrigID[Idx]

        sRow[1] = float(Numbers)

        sRow[2] = Chars

        uCur.updateRow(sRow)

This shows how to break up a string into numbers and not numbers and put the values into a row, it should give you some ideas where to start from.

answered yesterday

Michael Stimson

21.6k22460

add a comment |

You're not going to be able to calculate two fields in one go.. though you can split it up into two calcs. I would do this with an update cursor:

with arcpy.da.UpdateCursor(YourFeatureClass,['Original_ID','District','Split_ID']) as uCur:

    for sRow in uCur:

        OrigID  = sRow[0].split('-')[0] # first element in the Original_ID

        charRng = range(len(OrigID)) # a range to iterate over

        Chars   = ''

        Numbers = ''

        for Idx in charRng:

            if OrigID[Idx].isnumeric():

                Numbers += OrigID[Idx]

            else:

                chars += OrigID[Idx]

        sRow[1] = float(Numbers)

        sRow[2] = Chars

        uCur.updateRow(sRow)

This shows how to break up a string into numbers and not numbers and put the values into a row, it should give you some ideas where to start from.

answered yesterday

Michael Stimson

21.6k22460

You're not going to be able to calculate two fields in one go.. though you can split it up into two calcs. I would do this with an update cursor:

with arcpy.da.UpdateCursor(YourFeatureClass,['Original_ID','District','Split_ID']) as uCur:

    for sRow in uCur:

        OrigID  = sRow[0].split('-')[0] # first element in the Original_ID

        charRng = range(len(OrigID)) # a range to iterate over

        Chars   = ''

        Numbers = ''

        for Idx in charRng:

            if OrigID[Idx].isnumeric():

                Numbers += OrigID[Idx]

            else:

                chars += OrigID[Idx]

        sRow[1] = float(Numbers)

        sRow[2] = Chars

        uCur.updateRow(sRow)

This shows how to break up a string into numbers and not numbers and put the values into a row, it should give you some ideas where to start from.

answered yesterday

Michael Stimson

21.6k22460

answered yesterday

Michael Stimson

21.6k22460

answered yesterday

Michael Stimson

21.6k22460

answered yesterday

Michael Stimson

21.6k22460

add a comment |

Code Block

import re

def parse(s):

    """The format of these codes is [region(letter)][district(number)] - [place(number)][subdistrict(letter)].

    An example of a some codes include S22-201, TT100-12, and V6-1B.

    Often there is no subdistrict, and all points fall within the same larger district

    (so no As or Cs or whatever at the end of the string)."""



    letters = re.findall(r'[a-z A-Z]+', s)

    numbers = re.findall(r'[0-9]+', s)



    region = letters[0]

    district, place = [int(n) for n in numbers]

    try:

        subdistrict = letters[1]

    except IndexError:

        subdistrict = None



    return region, district, place, subdistrict

Then for the region field, use:

parse(!Original_ID!)[0]

For district:

parse(!Original_ID!)[1]

For place:

parse(!Original_ID!)[2]

For subdistrict:

parse(!Original_ID!)[3]

However, I would use the update cursor approach suggested by Michael Stimson so you could update all four fields in one hit. Use the following in the python window of ArcMap/ArcGIS Pro:

import re 

def parse(s): 

    etc... from code block above



with arcpy.da.UpdateCursor(YourFeatureClass, ['Original_ID','Region', 'District', 'Place', 'Subdistrict']) as rows:

    for row in rows:

        region, district, place, subdistrict = parse(row[0])

        row = [row[0], region, district, place, subdistrict]

        rows.updateRow(row)

edited yesterday

answered yesterday

user2856

30.3k258105

add a comment |

Code Block

import re

def parse(s):

    """The format of these codes is [region(letter)][district(number)] - [place(number)][subdistrict(letter)].

    An example of a some codes include S22-201, TT100-12, and V6-1B.

    Often there is no subdistrict, and all points fall within the same larger district

    (so no As or Cs or whatever at the end of the string)."""



    letters = re.findall(r'[a-z A-Z]+', s)

    numbers = re.findall(r'[0-9]+', s)



    region = letters[0]

    district, place = [int(n) for n in numbers]

    try:

        subdistrict = letters[1]

    except IndexError:

        subdistrict = None



    return region, district, place, subdistrict

Then for the region field, use:

parse(!Original_ID!)[0]

For district:

parse(!Original_ID!)[1]

For place:

parse(!Original_ID!)[2]

For subdistrict:

parse(!Original_ID!)[3]

However, I would use the update cursor approach suggested by Michael Stimson so you could update all four fields in one hit. Use the following in the python window of ArcMap/ArcGIS Pro:

import re 

def parse(s): 

    etc... from code block above



with arcpy.da.UpdateCursor(YourFeatureClass, ['Original_ID','Region', 'District', 'Place', 'Subdistrict']) as rows:

    for row in rows:

        region, district, place, subdistrict = parse(row[0])

        row = [row[0], region, district, place, subdistrict]

        rows.updateRow(row)

edited yesterday

answered yesterday

user2856

30.3k258105

add a comment |

Code Block

import re

def parse(s):

    """The format of these codes is [region(letter)][district(number)] - [place(number)][subdistrict(letter)].

    An example of a some codes include S22-201, TT100-12, and V6-1B.

    Often there is no subdistrict, and all points fall within the same larger district

    (so no As or Cs or whatever at the end of the string)."""



    letters = re.findall(r'[a-z A-Z]+', s)

    numbers = re.findall(r'[0-9]+', s)



    region = letters[0]

    district, place = [int(n) for n in numbers]

    try:

        subdistrict = letters[1]

    except IndexError:

        subdistrict = None



    return region, district, place, subdistrict

Then for the region field, use:

parse(!Original_ID!)[0]

For district:

parse(!Original_ID!)[1]

For place:

parse(!Original_ID!)[2]

For subdistrict:

parse(!Original_ID!)[3]

However, I would use the update cursor approach suggested by Michael Stimson so you could update all four fields in one hit. Use the following in the python window of ArcMap/ArcGIS Pro:

import re 

def parse(s): 

    etc... from code block above



with arcpy.da.UpdateCursor(YourFeatureClass, ['Original_ID','Region', 'District', 'Place', 'Subdistrict']) as rows:

    for row in rows:

        region, district, place, subdistrict = parse(row[0])

        row = [row[0], region, district, place, subdistrict]

        rows.updateRow(row)

edited yesterday

answered yesterday

user2856

30.3k258105

Code Block

import re

def parse(s):

    """The format of these codes is [region(letter)][district(number)] - [place(number)][subdistrict(letter)].

    An example of a some codes include S22-201, TT100-12, and V6-1B.

    Often there is no subdistrict, and all points fall within the same larger district

    (so no As or Cs or whatever at the end of the string)."""



    letters = re.findall(r'[a-z A-Z]+', s)

    numbers = re.findall(r'[0-9]+', s)



    region = letters[0]

    district, place = [int(n) for n in numbers]

    try:

        subdistrict = letters[1]

    except IndexError:

        subdistrict = None



    return region, district, place, subdistrict

Then for the region field, use:

parse(!Original_ID!)[0]

For district:

parse(!Original_ID!)[1]

For place:

parse(!Original_ID!)[2]

For subdistrict:

parse(!Original_ID!)[3]

However, I would use the update cursor approach suggested by Michael Stimson so you could update all four fields in one hit. Use the following in the python window of ArcMap/ArcGIS Pro:

import re 

def parse(s): 

    etc... from code block above



with arcpy.da.UpdateCursor(YourFeatureClass, ['Original_ID','Region', 'District', 'Place', 'Subdistrict']) as rows:

    for row in rows:

        region, district, place, subdistrict = parse(row[0])

        row = [row[0], region, district, place, subdistrict]

        rows.updateRow(row)

edited yesterday

answered yesterday

user2856

30.3k258105

edited yesterday

answered yesterday

user2856

30.3k258105

answered yesterday

user2856

30.3k258105

answered yesterday

user2856

30.3k258105

add a comment |

vce500 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

vce500 is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Geographic Information Systems Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr