Chemical Formula Parser












8












$begingroup$


As an exercise, I wrote a simple context free parser modeled from a pushdown automaton for chemical formulas such as NO2 or Fe(NO3)3.



To explain a bit about the decisions made:




  1. The state functions (pstart, pchem and pnum) are lambdas because they carry around a lot of context. They could be methods of a class, it doesn't really matter.


  2. The state functions have the input character injected because I realized halfway that I have to process some more after the line ended and I don't want to duplicate the state functions.



I am mostly concerned about the general messiness of the parser. There is no obvious way to know where variables are mutated, control flow is all over the place and consequently I find it difficult to reason about the program. Is there a better code structure?



Finally, I'd like to know if I caught all illegal syntax. Note repeating chemicals like NN is allowed.



#include<iostream>
#include<stack>
#include<unordered_map>
#include<string>
#include<iterator>

using namespace std::literals;

enum class state_t { start, num, chem, error };

double parse(std::string line, const std::unordered_map<std::string, double>& m)
{
auto b = line.begin(), e = line.end();
std::stack<double> stk;
int num = 0;
std::string chem;
state_t state = state_t::start;

auto pstart = [&](char c) {
if(std::isupper(c))
{
chem = ""s + c;
state = state_t::chem;
return true;
}
else if(std::isdigit(c))
{
if(stk.empty())
state = state_t::error;
else
{
num = c - '0';
state = state_t::num;
return true;
}
}
else if(c == '(')
{
stk.push(-1);
return true;
}
else if(c == ')')
{
double sum = 0;
while(!stk.empty() && stk.top() > 0)
{
sum += stk.top();
stk.pop();
}
if(stk.empty())
state = state_t::error;
else
{
stk.pop();
stk.push(sum);
return true;
}
}
else
state = state_t::error;

return false;
};

auto pnum = [&](char c) {
if(std::isdigit(c))
{
num = 10 * num + c - '0';
return true;
}
else
{
stk.top() *= num;
state = state_t::start;
}
return false;
};

auto pchem = [&](char c){
if(std::islower(c))
{
chem += c;
return true;
}
else
{
if(auto it = m.find(chem); it != m.end())
{
stk.push(it->second);
state = state_t::start;
}
else
state = state_t::error;
}
return false;
};

while(b != e)
{
switch(state)
{
case state_t::start:
if(pstart(*b))
b++;
break;
case state_t::num:
if(pnum(*b))
b++;
break;
case state_t::chem:
if(pchem(*b))
b++;
break;
default:
return -1;
}
}

switch(state)
{
case state_t::num:
pnum('n');
break;
case state_t::chem:
pchem('n');
break;
}

if(state == state_t::error)
return -1;

double sum = 0;
while(!stk.empty() && stk.top() > 0)
{
sum += stk.top();
stk.pop();
}
if(stk.size() > 0) // expected ')'
return -1;

return sum;
}

int main()
{
std::unordered_map<std::string, double> m;
m["Na"] = 23.5;
m["Cl"] = 35.5;
m["O"] = 16;
m["N"] = 14;
m["Fe"] = 55.8;

std::string line;
while(getline(std::cin, line))
std::cout << parse(std::move(line), m) << 'n';
}









share|improve this question









New contributor




友人A is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$








  • 1




    $begingroup$
    I realized if(stk.size() > 1) should be if(stk.size() > 0), should I edit?
    $endgroup$
    – 友人A
    yesterday






  • 2




    $begingroup$
    Feel free to edit as long as there are no answers. But try to not edit as time goes by.
    $endgroup$
    – Incomputable
    yesterday






  • 2




    $begingroup$
    Note, the singular is “automaton”.
    $endgroup$
    – Konrad Rudolph
    yesterday






  • 1




    $begingroup$
    There is a standard for writing chemical formulas to be machine parsable so that they are not ambiguous. Its called SMILE Simple Molecule Input Line Entry. The trouble with your system is that for more complex molecules you can't be unambiguous.
    $endgroup$
    – Martin York
    yesterday








  • 1




    $begingroup$
    You said that you wrote a parser but the result of the function is double so clearly you’re doing way more than just parsing, and it’s not at all clear (from the names, nor the usage) what transformation you’re performing. I’m guessing you are calculating the molecular weight? In which case the function parse should be renamed appropriately.
    $endgroup$
    – Konrad Rudolph
    17 hours ago


















8












$begingroup$


As an exercise, I wrote a simple context free parser modeled from a pushdown automaton for chemical formulas such as NO2 or Fe(NO3)3.



To explain a bit about the decisions made:




  1. The state functions (pstart, pchem and pnum) are lambdas because they carry around a lot of context. They could be methods of a class, it doesn't really matter.


  2. The state functions have the input character injected because I realized halfway that I have to process some more after the line ended and I don't want to duplicate the state functions.



I am mostly concerned about the general messiness of the parser. There is no obvious way to know where variables are mutated, control flow is all over the place and consequently I find it difficult to reason about the program. Is there a better code structure?



Finally, I'd like to know if I caught all illegal syntax. Note repeating chemicals like NN is allowed.



#include<iostream>
#include<stack>
#include<unordered_map>
#include<string>
#include<iterator>

using namespace std::literals;

enum class state_t { start, num, chem, error };

double parse(std::string line, const std::unordered_map<std::string, double>& m)
{
auto b = line.begin(), e = line.end();
std::stack<double> stk;
int num = 0;
std::string chem;
state_t state = state_t::start;

auto pstart = [&](char c) {
if(std::isupper(c))
{
chem = ""s + c;
state = state_t::chem;
return true;
}
else if(std::isdigit(c))
{
if(stk.empty())
state = state_t::error;
else
{
num = c - '0';
state = state_t::num;
return true;
}
}
else if(c == '(')
{
stk.push(-1);
return true;
}
else if(c == ')')
{
double sum = 0;
while(!stk.empty() && stk.top() > 0)
{
sum += stk.top();
stk.pop();
}
if(stk.empty())
state = state_t::error;
else
{
stk.pop();
stk.push(sum);
return true;
}
}
else
state = state_t::error;

return false;
};

auto pnum = [&](char c) {
if(std::isdigit(c))
{
num = 10 * num + c - '0';
return true;
}
else
{
stk.top() *= num;
state = state_t::start;
}
return false;
};

auto pchem = [&](char c){
if(std::islower(c))
{
chem += c;
return true;
}
else
{
if(auto it = m.find(chem); it != m.end())
{
stk.push(it->second);
state = state_t::start;
}
else
state = state_t::error;
}
return false;
};

while(b != e)
{
switch(state)
{
case state_t::start:
if(pstart(*b))
b++;
break;
case state_t::num:
if(pnum(*b))
b++;
break;
case state_t::chem:
if(pchem(*b))
b++;
break;
default:
return -1;
}
}

switch(state)
{
case state_t::num:
pnum('n');
break;
case state_t::chem:
pchem('n');
break;
}

if(state == state_t::error)
return -1;

double sum = 0;
while(!stk.empty() && stk.top() > 0)
{
sum += stk.top();
stk.pop();
}
if(stk.size() > 0) // expected ')'
return -1;

return sum;
}

int main()
{
std::unordered_map<std::string, double> m;
m["Na"] = 23.5;
m["Cl"] = 35.5;
m["O"] = 16;
m["N"] = 14;
m["Fe"] = 55.8;

std::string line;
while(getline(std::cin, line))
std::cout << parse(std::move(line), m) << 'n';
}









share|improve this question









New contributor




友人A is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$








  • 1




    $begingroup$
    I realized if(stk.size() > 1) should be if(stk.size() > 0), should I edit?
    $endgroup$
    – 友人A
    yesterday






  • 2




    $begingroup$
    Feel free to edit as long as there are no answers. But try to not edit as time goes by.
    $endgroup$
    – Incomputable
    yesterday






  • 2




    $begingroup$
    Note, the singular is “automaton”.
    $endgroup$
    – Konrad Rudolph
    yesterday






  • 1




    $begingroup$
    There is a standard for writing chemical formulas to be machine parsable so that they are not ambiguous. Its called SMILE Simple Molecule Input Line Entry. The trouble with your system is that for more complex molecules you can't be unambiguous.
    $endgroup$
    – Martin York
    yesterday








  • 1




    $begingroup$
    You said that you wrote a parser but the result of the function is double so clearly you’re doing way more than just parsing, and it’s not at all clear (from the names, nor the usage) what transformation you’re performing. I’m guessing you are calculating the molecular weight? In which case the function parse should be renamed appropriately.
    $endgroup$
    – Konrad Rudolph
    17 hours ago
















8












8








8


1



$begingroup$


As an exercise, I wrote a simple context free parser modeled from a pushdown automaton for chemical formulas such as NO2 or Fe(NO3)3.



To explain a bit about the decisions made:




  1. The state functions (pstart, pchem and pnum) are lambdas because they carry around a lot of context. They could be methods of a class, it doesn't really matter.


  2. The state functions have the input character injected because I realized halfway that I have to process some more after the line ended and I don't want to duplicate the state functions.



I am mostly concerned about the general messiness of the parser. There is no obvious way to know where variables are mutated, control flow is all over the place and consequently I find it difficult to reason about the program. Is there a better code structure?



Finally, I'd like to know if I caught all illegal syntax. Note repeating chemicals like NN is allowed.



#include<iostream>
#include<stack>
#include<unordered_map>
#include<string>
#include<iterator>

using namespace std::literals;

enum class state_t { start, num, chem, error };

double parse(std::string line, const std::unordered_map<std::string, double>& m)
{
auto b = line.begin(), e = line.end();
std::stack<double> stk;
int num = 0;
std::string chem;
state_t state = state_t::start;

auto pstart = [&](char c) {
if(std::isupper(c))
{
chem = ""s + c;
state = state_t::chem;
return true;
}
else if(std::isdigit(c))
{
if(stk.empty())
state = state_t::error;
else
{
num = c - '0';
state = state_t::num;
return true;
}
}
else if(c == '(')
{
stk.push(-1);
return true;
}
else if(c == ')')
{
double sum = 0;
while(!stk.empty() && stk.top() > 0)
{
sum += stk.top();
stk.pop();
}
if(stk.empty())
state = state_t::error;
else
{
stk.pop();
stk.push(sum);
return true;
}
}
else
state = state_t::error;

return false;
};

auto pnum = [&](char c) {
if(std::isdigit(c))
{
num = 10 * num + c - '0';
return true;
}
else
{
stk.top() *= num;
state = state_t::start;
}
return false;
};

auto pchem = [&](char c){
if(std::islower(c))
{
chem += c;
return true;
}
else
{
if(auto it = m.find(chem); it != m.end())
{
stk.push(it->second);
state = state_t::start;
}
else
state = state_t::error;
}
return false;
};

while(b != e)
{
switch(state)
{
case state_t::start:
if(pstart(*b))
b++;
break;
case state_t::num:
if(pnum(*b))
b++;
break;
case state_t::chem:
if(pchem(*b))
b++;
break;
default:
return -1;
}
}

switch(state)
{
case state_t::num:
pnum('n');
break;
case state_t::chem:
pchem('n');
break;
}

if(state == state_t::error)
return -1;

double sum = 0;
while(!stk.empty() && stk.top() > 0)
{
sum += stk.top();
stk.pop();
}
if(stk.size() > 0) // expected ')'
return -1;

return sum;
}

int main()
{
std::unordered_map<std::string, double> m;
m["Na"] = 23.5;
m["Cl"] = 35.5;
m["O"] = 16;
m["N"] = 14;
m["Fe"] = 55.8;

std::string line;
while(getline(std::cin, line))
std::cout << parse(std::move(line), m) << 'n';
}









share|improve this question









New contributor




友人A is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




As an exercise, I wrote a simple context free parser modeled from a pushdown automaton for chemical formulas such as NO2 or Fe(NO3)3.



To explain a bit about the decisions made:




  1. The state functions (pstart, pchem and pnum) are lambdas because they carry around a lot of context. They could be methods of a class, it doesn't really matter.


  2. The state functions have the input character injected because I realized halfway that I have to process some more after the line ended and I don't want to duplicate the state functions.



I am mostly concerned about the general messiness of the parser. There is no obvious way to know where variables are mutated, control flow is all over the place and consequently I find it difficult to reason about the program. Is there a better code structure?



Finally, I'd like to know if I caught all illegal syntax. Note repeating chemicals like NN is allowed.



#include<iostream>
#include<stack>
#include<unordered_map>
#include<string>
#include<iterator>

using namespace std::literals;

enum class state_t { start, num, chem, error };

double parse(std::string line, const std::unordered_map<std::string, double>& m)
{
auto b = line.begin(), e = line.end();
std::stack<double> stk;
int num = 0;
std::string chem;
state_t state = state_t::start;

auto pstart = [&](char c) {
if(std::isupper(c))
{
chem = ""s + c;
state = state_t::chem;
return true;
}
else if(std::isdigit(c))
{
if(stk.empty())
state = state_t::error;
else
{
num = c - '0';
state = state_t::num;
return true;
}
}
else if(c == '(')
{
stk.push(-1);
return true;
}
else if(c == ')')
{
double sum = 0;
while(!stk.empty() && stk.top() > 0)
{
sum += stk.top();
stk.pop();
}
if(stk.empty())
state = state_t::error;
else
{
stk.pop();
stk.push(sum);
return true;
}
}
else
state = state_t::error;

return false;
};

auto pnum = [&](char c) {
if(std::isdigit(c))
{
num = 10 * num + c - '0';
return true;
}
else
{
stk.top() *= num;
state = state_t::start;
}
return false;
};

auto pchem = [&](char c){
if(std::islower(c))
{
chem += c;
return true;
}
else
{
if(auto it = m.find(chem); it != m.end())
{
stk.push(it->second);
state = state_t::start;
}
else
state = state_t::error;
}
return false;
};

while(b != e)
{
switch(state)
{
case state_t::start:
if(pstart(*b))
b++;
break;
case state_t::num:
if(pnum(*b))
b++;
break;
case state_t::chem:
if(pchem(*b))
b++;
break;
default:
return -1;
}
}

switch(state)
{
case state_t::num:
pnum('n');
break;
case state_t::chem:
pchem('n');
break;
}

if(state == state_t::error)
return -1;

double sum = 0;
while(!stk.empty() && stk.top() > 0)
{
sum += stk.top();
stk.pop();
}
if(stk.size() > 0) // expected ')'
return -1;

return sum;
}

int main()
{
std::unordered_map<std::string, double> m;
m["Na"] = 23.5;
m["Cl"] = 35.5;
m["O"] = 16;
m["N"] = 14;
m["Fe"] = 55.8;

std::string line;
while(getline(std::cin, line))
std::cout << parse(std::move(line), m) << 'n';
}






c++ parsing






share|improve this question









New contributor




友人A is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




友人A is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited yesterday







友人A













New contributor




友人A is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked yesterday









友人A友人A

436




436




New contributor




友人A is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





友人A is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






友人A is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 1




    $begingroup$
    I realized if(stk.size() > 1) should be if(stk.size() > 0), should I edit?
    $endgroup$
    – 友人A
    yesterday






  • 2




    $begingroup$
    Feel free to edit as long as there are no answers. But try to not edit as time goes by.
    $endgroup$
    – Incomputable
    yesterday






  • 2




    $begingroup$
    Note, the singular is “automaton”.
    $endgroup$
    – Konrad Rudolph
    yesterday






  • 1




    $begingroup$
    There is a standard for writing chemical formulas to be machine parsable so that they are not ambiguous. Its called SMILE Simple Molecule Input Line Entry. The trouble with your system is that for more complex molecules you can't be unambiguous.
    $endgroup$
    – Martin York
    yesterday








  • 1




    $begingroup$
    You said that you wrote a parser but the result of the function is double so clearly you’re doing way more than just parsing, and it’s not at all clear (from the names, nor the usage) what transformation you’re performing. I’m guessing you are calculating the molecular weight? In which case the function parse should be renamed appropriately.
    $endgroup$
    – Konrad Rudolph
    17 hours ago
















  • 1




    $begingroup$
    I realized if(stk.size() > 1) should be if(stk.size() > 0), should I edit?
    $endgroup$
    – 友人A
    yesterday






  • 2




    $begingroup$
    Feel free to edit as long as there are no answers. But try to not edit as time goes by.
    $endgroup$
    – Incomputable
    yesterday






  • 2




    $begingroup$
    Note, the singular is “automaton”.
    $endgroup$
    – Konrad Rudolph
    yesterday






  • 1




    $begingroup$
    There is a standard for writing chemical formulas to be machine parsable so that they are not ambiguous. Its called SMILE Simple Molecule Input Line Entry. The trouble with your system is that for more complex molecules you can't be unambiguous.
    $endgroup$
    – Martin York
    yesterday








  • 1




    $begingroup$
    You said that you wrote a parser but the result of the function is double so clearly you’re doing way more than just parsing, and it’s not at all clear (from the names, nor the usage) what transformation you’re performing. I’m guessing you are calculating the molecular weight? In which case the function parse should be renamed appropriately.
    $endgroup$
    – Konrad Rudolph
    17 hours ago










1




1




$begingroup$
I realized if(stk.size() > 1) should be if(stk.size() > 0), should I edit?
$endgroup$
– 友人A
yesterday




$begingroup$
I realized if(stk.size() > 1) should be if(stk.size() > 0), should I edit?
$endgroup$
– 友人A
yesterday




2




2




$begingroup$
Feel free to edit as long as there are no answers. But try to not edit as time goes by.
$endgroup$
– Incomputable
yesterday




$begingroup$
Feel free to edit as long as there are no answers. But try to not edit as time goes by.
$endgroup$
– Incomputable
yesterday




2




2




$begingroup$
Note, the singular is “automaton”.
$endgroup$
– Konrad Rudolph
yesterday




$begingroup$
Note, the singular is “automaton”.
$endgroup$
– Konrad Rudolph
yesterday




1




1




$begingroup$
There is a standard for writing chemical formulas to be machine parsable so that they are not ambiguous. Its called SMILE Simple Molecule Input Line Entry. The trouble with your system is that for more complex molecules you can't be unambiguous.
$endgroup$
– Martin York
yesterday






$begingroup$
There is a standard for writing chemical formulas to be machine parsable so that they are not ambiguous. Its called SMILE Simple Molecule Input Line Entry. The trouble with your system is that for more complex molecules you can't be unambiguous.
$endgroup$
– Martin York
yesterday






1




1




$begingroup$
You said that you wrote a parser but the result of the function is double so clearly you’re doing way more than just parsing, and it’s not at all clear (from the names, nor the usage) what transformation you’re performing. I’m guessing you are calculating the molecular weight? In which case the function parse should be renamed appropriately.
$endgroup$
– Konrad Rudolph
17 hours ago






$begingroup$
You said that you wrote a parser but the result of the function is double so clearly you’re doing way more than just parsing, and it’s not at all clear (from the names, nor the usage) what transformation you’re performing. I’m guessing you are calculating the molecular weight? In which case the function parse should be renamed appropriately.
$endgroup$
– Konrad Rudolph
17 hours ago












2 Answers
2






active

oldest

votes


















6












$begingroup$

Using a state machine is fine, but is usually harder to get right than writing the parser with focus on the grammar. You're also mixing a computation with the parsing which adds more information to consider when analyzing the code.
I would recommend to separate the parser code from the computation and then writing the parser strictly following the grammar you want to parse. I'll try to illustrate what I mean by giving a simplified version of your parser. Say you have a grammar:



formula = *(group)
group = element [ count ]
element = uppercase [ lowercase ]
count = "0" .. "9"


You can now give a function for each non-terminal:




// formula = *(group) ; * becomes while (...) ...
std::list<group> parse_formula(std::stringstream& s)
{
std::list<group> rv;

while (!s.eof())
{
rv.push_back(parse_group(s));
}

return rv;
}

// group = element [ count ]
group parse_group(std::stringstream& s)
{
group g;
group.element = parse_element(s);
try
{
group.count = parse_count(s);
}
catch (const parse_failed&)
{
group = 1;
}
}

// element = uppercase [lowercase]
std::string parse_element(std::stringstream& s)
{
if (!std::isupper(s.peek()))
{
throw parse_failed(...);
}

std::string element = s.get();;

if (std::islower(s.peek()))
{
s.get(ch);
element += ch;
}

return element;
}

// count = [0-9]
unsigned parse_count(std::stringstream& s)
{
if (!std::isdigit(s.peek()))
{
throw parse_failed(...);
}

unsigned rv;
s >> rv; // this actually violates the grammar as it reads multiple digits
return rv;
}


You can then iterate over the list of groups and compute the sum.






share|improve this answer










New contributor




Cornholio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$





















    6












    $begingroup$

    Lambdas Too Complex



    You say that the reason behind your lambdas is that they carry a lot of state around, but it is mutating plenty of it as well. Extracting your lambdas into methods in their own right would make it easier to keep track of where your state is used and where it is mutated.



    Methods Should Calculate Or Mutate, Not Both



    Lambdas that calculate something and then mutate something else are notoriously difficult to debug and maintain. When you create your methods, it should either calculate something (such as the atomic weight) or change something (such as changing your state).



    Variable Names Unclear



    Names like pstart, num and m do not tell me anything about the purpose of these variables. There is nothing wrong with LongVariableNamesThatExplainWhatTheyAreDoing.






    share|improve this answer









    $endgroup$













      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "196"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });






      友人A is a new contributor. Be nice, and check out our Code of Conduct.










      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f212312%2fchemical-formula-parser%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      6












      $begingroup$

      Using a state machine is fine, but is usually harder to get right than writing the parser with focus on the grammar. You're also mixing a computation with the parsing which adds more information to consider when analyzing the code.
      I would recommend to separate the parser code from the computation and then writing the parser strictly following the grammar you want to parse. I'll try to illustrate what I mean by giving a simplified version of your parser. Say you have a grammar:



      formula = *(group)
      group = element [ count ]
      element = uppercase [ lowercase ]
      count = "0" .. "9"


      You can now give a function for each non-terminal:




      // formula = *(group) ; * becomes while (...) ...
      std::list<group> parse_formula(std::stringstream& s)
      {
      std::list<group> rv;

      while (!s.eof())
      {
      rv.push_back(parse_group(s));
      }

      return rv;
      }

      // group = element [ count ]
      group parse_group(std::stringstream& s)
      {
      group g;
      group.element = parse_element(s);
      try
      {
      group.count = parse_count(s);
      }
      catch (const parse_failed&)
      {
      group = 1;
      }
      }

      // element = uppercase [lowercase]
      std::string parse_element(std::stringstream& s)
      {
      if (!std::isupper(s.peek()))
      {
      throw parse_failed(...);
      }

      std::string element = s.get();;

      if (std::islower(s.peek()))
      {
      s.get(ch);
      element += ch;
      }

      return element;
      }

      // count = [0-9]
      unsigned parse_count(std::stringstream& s)
      {
      if (!std::isdigit(s.peek()))
      {
      throw parse_failed(...);
      }

      unsigned rv;
      s >> rv; // this actually violates the grammar as it reads multiple digits
      return rv;
      }


      You can then iterate over the list of groups and compute the sum.






      share|improve this answer










      New contributor




      Cornholio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$


















        6












        $begingroup$

        Using a state machine is fine, but is usually harder to get right than writing the parser with focus on the grammar. You're also mixing a computation with the parsing which adds more information to consider when analyzing the code.
        I would recommend to separate the parser code from the computation and then writing the parser strictly following the grammar you want to parse. I'll try to illustrate what I mean by giving a simplified version of your parser. Say you have a grammar:



        formula = *(group)
        group = element [ count ]
        element = uppercase [ lowercase ]
        count = "0" .. "9"


        You can now give a function for each non-terminal:




        // formula = *(group) ; * becomes while (...) ...
        std::list<group> parse_formula(std::stringstream& s)
        {
        std::list<group> rv;

        while (!s.eof())
        {
        rv.push_back(parse_group(s));
        }

        return rv;
        }

        // group = element [ count ]
        group parse_group(std::stringstream& s)
        {
        group g;
        group.element = parse_element(s);
        try
        {
        group.count = parse_count(s);
        }
        catch (const parse_failed&)
        {
        group = 1;
        }
        }

        // element = uppercase [lowercase]
        std::string parse_element(std::stringstream& s)
        {
        if (!std::isupper(s.peek()))
        {
        throw parse_failed(...);
        }

        std::string element = s.get();;

        if (std::islower(s.peek()))
        {
        s.get(ch);
        element += ch;
        }

        return element;
        }

        // count = [0-9]
        unsigned parse_count(std::stringstream& s)
        {
        if (!std::isdigit(s.peek()))
        {
        throw parse_failed(...);
        }

        unsigned rv;
        s >> rv; // this actually violates the grammar as it reads multiple digits
        return rv;
        }


        You can then iterate over the list of groups and compute the sum.






        share|improve this answer










        New contributor




        Cornholio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        $endgroup$
















          6












          6








          6





          $begingroup$

          Using a state machine is fine, but is usually harder to get right than writing the parser with focus on the grammar. You're also mixing a computation with the parsing which adds more information to consider when analyzing the code.
          I would recommend to separate the parser code from the computation and then writing the parser strictly following the grammar you want to parse. I'll try to illustrate what I mean by giving a simplified version of your parser. Say you have a grammar:



          formula = *(group)
          group = element [ count ]
          element = uppercase [ lowercase ]
          count = "0" .. "9"


          You can now give a function for each non-terminal:




          // formula = *(group) ; * becomes while (...) ...
          std::list<group> parse_formula(std::stringstream& s)
          {
          std::list<group> rv;

          while (!s.eof())
          {
          rv.push_back(parse_group(s));
          }

          return rv;
          }

          // group = element [ count ]
          group parse_group(std::stringstream& s)
          {
          group g;
          group.element = parse_element(s);
          try
          {
          group.count = parse_count(s);
          }
          catch (const parse_failed&)
          {
          group = 1;
          }
          }

          // element = uppercase [lowercase]
          std::string parse_element(std::stringstream& s)
          {
          if (!std::isupper(s.peek()))
          {
          throw parse_failed(...);
          }

          std::string element = s.get();;

          if (std::islower(s.peek()))
          {
          s.get(ch);
          element += ch;
          }

          return element;
          }

          // count = [0-9]
          unsigned parse_count(std::stringstream& s)
          {
          if (!std::isdigit(s.peek()))
          {
          throw parse_failed(...);
          }

          unsigned rv;
          s >> rv; // this actually violates the grammar as it reads multiple digits
          return rv;
          }


          You can then iterate over the list of groups and compute the sum.






          share|improve this answer










          New contributor




          Cornholio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$



          Using a state machine is fine, but is usually harder to get right than writing the parser with focus on the grammar. You're also mixing a computation with the parsing which adds more information to consider when analyzing the code.
          I would recommend to separate the parser code from the computation and then writing the parser strictly following the grammar you want to parse. I'll try to illustrate what I mean by giving a simplified version of your parser. Say you have a grammar:



          formula = *(group)
          group = element [ count ]
          element = uppercase [ lowercase ]
          count = "0" .. "9"


          You can now give a function for each non-terminal:




          // formula = *(group) ; * becomes while (...) ...
          std::list<group> parse_formula(std::stringstream& s)
          {
          std::list<group> rv;

          while (!s.eof())
          {
          rv.push_back(parse_group(s));
          }

          return rv;
          }

          // group = element [ count ]
          group parse_group(std::stringstream& s)
          {
          group g;
          group.element = parse_element(s);
          try
          {
          group.count = parse_count(s);
          }
          catch (const parse_failed&)
          {
          group = 1;
          }
          }

          // element = uppercase [lowercase]
          std::string parse_element(std::stringstream& s)
          {
          if (!std::isupper(s.peek()))
          {
          throw parse_failed(...);
          }

          std::string element = s.get();;

          if (std::islower(s.peek()))
          {
          s.get(ch);
          element += ch;
          }

          return element;
          }

          // count = [0-9]
          unsigned parse_count(std::stringstream& s)
          {
          if (!std::isdigit(s.peek()))
          {
          throw parse_failed(...);
          }

          unsigned rv;
          s >> rv; // this actually violates the grammar as it reads multiple digits
          return rv;
          }


          You can then iterate over the list of groups and compute the sum.







          share|improve this answer










          New contributor




          Cornholio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          share|improve this answer



          share|improve this answer








          edited yesterday





















          New contributor




          Cornholio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          answered yesterday









          CornholioCornholio

          1365




          1365




          New contributor




          Cornholio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





          New contributor





          Cornholio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          Cornholio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.

























              6












              $begingroup$

              Lambdas Too Complex



              You say that the reason behind your lambdas is that they carry a lot of state around, but it is mutating plenty of it as well. Extracting your lambdas into methods in their own right would make it easier to keep track of where your state is used and where it is mutated.



              Methods Should Calculate Or Mutate, Not Both



              Lambdas that calculate something and then mutate something else are notoriously difficult to debug and maintain. When you create your methods, it should either calculate something (such as the atomic weight) or change something (such as changing your state).



              Variable Names Unclear



              Names like pstart, num and m do not tell me anything about the purpose of these variables. There is nothing wrong with LongVariableNamesThatExplainWhatTheyAreDoing.






              share|improve this answer









              $endgroup$


















                6












                $begingroup$

                Lambdas Too Complex



                You say that the reason behind your lambdas is that they carry a lot of state around, but it is mutating plenty of it as well. Extracting your lambdas into methods in their own right would make it easier to keep track of where your state is used and where it is mutated.



                Methods Should Calculate Or Mutate, Not Both



                Lambdas that calculate something and then mutate something else are notoriously difficult to debug and maintain. When you create your methods, it should either calculate something (such as the atomic weight) or change something (such as changing your state).



                Variable Names Unclear



                Names like pstart, num and m do not tell me anything about the purpose of these variables. There is nothing wrong with LongVariableNamesThatExplainWhatTheyAreDoing.






                share|improve this answer









                $endgroup$
















                  6












                  6








                  6





                  $begingroup$

                  Lambdas Too Complex



                  You say that the reason behind your lambdas is that they carry a lot of state around, but it is mutating plenty of it as well. Extracting your lambdas into methods in their own right would make it easier to keep track of where your state is used and where it is mutated.



                  Methods Should Calculate Or Mutate, Not Both



                  Lambdas that calculate something and then mutate something else are notoriously difficult to debug and maintain. When you create your methods, it should either calculate something (such as the atomic weight) or change something (such as changing your state).



                  Variable Names Unclear



                  Names like pstart, num and m do not tell me anything about the purpose of these variables. There is nothing wrong with LongVariableNamesThatExplainWhatTheyAreDoing.






                  share|improve this answer









                  $endgroup$



                  Lambdas Too Complex



                  You say that the reason behind your lambdas is that they carry a lot of state around, but it is mutating plenty of it as well. Extracting your lambdas into methods in their own right would make it easier to keep track of where your state is used and where it is mutated.



                  Methods Should Calculate Or Mutate, Not Both



                  Lambdas that calculate something and then mutate something else are notoriously difficult to debug and maintain. When you create your methods, it should either calculate something (such as the atomic weight) or change something (such as changing your state).



                  Variable Names Unclear



                  Names like pstart, num and m do not tell me anything about the purpose of these variables. There is nothing wrong with LongVariableNamesThatExplainWhatTheyAreDoing.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered yesterday









                  Joe CJoe C

                  960211




                  960211






















                      友人A is a new contributor. Be nice, and check out our Code of Conduct.










                      draft saved

                      draft discarded


















                      友人A is a new contributor. Be nice, and check out our Code of Conduct.













                      友人A is a new contributor. Be nice, and check out our Code of Conduct.












                      友人A is a new contributor. Be nice, and check out our Code of Conduct.
















                      Thanks for contributing an answer to Code Review Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f212312%2fchemical-formula-parser%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                      Alcedinidae

                      Origin of the phrase “under your belt”?