How you debug a binary format

I would like to be able to debug building a binary builder. Right now I am basically printing out the input data to the binary parser, and then going deep into the code and printing out the mapping of the input to the output, then taking the output mapping (integers) and using that to locate the corresponding integer in the binary. Pretty clunky, and requires that I modify the source code deeply to get at the mapping between input and output.

It What seems like you could do is view the binary in different variants (in my case I'd like to view it in 8-bit chunks as decimal numbers, because that's pretty close to the input). Actually, some numbers are 16 bit, some 8, some 32, etc. So maybe there would be a way to view the binary with each of these different numbers highlighted in memory in some way.

The only way I could see that being possible is if you actually build a visualizer specific to the actual binary format/layout. So it knows where in the sequence the 32 bit numbers should be, and where the 8 bit numbers should be, etc. This is a lot of work and kind of tricky in some situations. So wondering if there's a general way to do it.

Also wondering what the general way of debugging this type of thing currently is, so maybe I can get some ideas on what to try from that.

asked 2 days ago

Lance Pollard

819513

71

You got one answer saying "use the hexdump directly, and do this and that additionally"- and that answer got a lot of upvotes. And a second answer, 5 hours later(!), saying only "use a hexdump". Then you accepted the second one in favor of the first? Seriously?

– Doc Brown
2 days ago

4

While you might have a good reason to use a binary format, do consider whether you can just use an existing text format like JSON instead. Human readability counts a lot, and machines and networks are typically fast enough that using a custom format to reduce size is unnecessary nowadays.

– jpmc26
2 days ago

2

The tool you want is a hex editor. If you're using Windows, HxD is pretty good.

– immibis
2 days ago

3

@jpmc26 there's still a lot of use for binary formats and always will be. Human readability is usually secondary to performance, storage requirements, and network performance. And there are still many areas where especially network performance is poor and storage limited. Plus don't forget all the systems having to interface with legacy systems (both hardware and software) and having to support their data formats.

– jwenting
yesterday

3

@jwenting No, actually, developer time is usually the most expensive piece of an application. Sure, that may not be the case if you're working at Google or Facebook, but most apps don't operate at that scale. And when your developers spending time on stuff is the most expensive resource, human readability counts for a lot more than the 100 milliseconds extra for the program to parse it.

– jpmc26
yesterday

|
show 5 more comments

Also wondering what the general way of debugging this type of thing currently is, so maybe I can get some ideas on what to try from that.

asked 2 days ago

Lance Pollard

819513

71

You got one answer saying "use the hexdump directly, and do this and that additionally"- and that answer got a lot of upvotes. And a second answer, 5 hours later(!), saying only "use a hexdump". Then you accepted the second one in favor of the first? Seriously?

– Doc Brown
2 days ago

4

While you might have a good reason to use a binary format, do consider whether you can just use an existing text format like JSON instead. Human readability counts a lot, and machines and networks are typically fast enough that using a custom format to reduce size is unnecessary nowadays.

– jpmc26
2 days ago

2

The tool you want is a hex editor. If you're using Windows, HxD is pretty good.

– immibis
2 days ago

3

@jpmc26 there's still a lot of use for binary formats and always will be. Human readability is usually secondary to performance, storage requirements, and network performance. And there are still many areas where especially network performance is poor and storage limited. Plus don't forget all the systems having to interface with legacy systems (both hardware and software) and having to support their data formats.

– jwenting
yesterday

3

@jwenting No, actually, developer time is usually the most expensive piece of an application. Sure, that may not be the case if you're working at Google or Facebook, but most apps don't operate at that scale. And when your developers spending time on stuff is the most expensive resource, human readability counts for a lot more than the 100 milliseconds extra for the program to parse it.

– jpmc26
yesterday

|
show 5 more comments

Also wondering what the general way of debugging this type of thing currently is, so maybe I can get some ideas on what to try from that.

asked 2 days ago

Lance Pollard

819513

Also wondering what the general way of debugging this type of thing currently is, so maybe I can get some ideas on what to try from that.

debugging binary

asked 2 days ago

Lance Pollard

819513

asked 2 days ago

Lance Pollard

819513

asked 2 days ago

Lance Pollard

819513

asked 2 days ago

Lance Pollard

819513

asked 2 days ago

Lance Pollard

819513

71

You got one answer saying "use the hexdump directly, and do this and that additionally"- and that answer got a lot of upvotes. And a second answer, 5 hours later(!), saying only "use a hexdump". Then you accepted the second one in favor of the first? Seriously?

– Doc Brown
2 days ago

4

While you might have a good reason to use a binary format, do consider whether you can just use an existing text format like JSON instead. Human readability counts a lot, and machines and networks are typically fast enough that using a custom format to reduce size is unnecessary nowadays.

– jpmc26
2 days ago

2

The tool you want is a hex editor. If you're using Windows, HxD is pretty good.

– immibis
2 days ago

3

@jpmc26 there's still a lot of use for binary formats and always will be. Human readability is usually secondary to performance, storage requirements, and network performance. And there are still many areas where especially network performance is poor and storage limited. Plus don't forget all the systems having to interface with legacy systems (both hardware and software) and having to support their data formats.

– jwenting
yesterday

3

@jwenting No, actually, developer time is usually the most expensive piece of an application. Sure, that may not be the case if you're working at Google or Facebook, but most apps don't operate at that scale. And when your developers spending time on stuff is the most expensive resource, human readability counts for a lot more than the 100 milliseconds extra for the program to parse it.

– jpmc26
yesterday

|
show 5 more comments

71

You got one answer saying "use the hexdump directly, and do this and that additionally"- and that answer got a lot of upvotes. And a second answer, 5 hours later(!), saying only "use a hexdump". Then you accepted the second one in favor of the first? Seriously?

– Doc Brown
2 days ago

4

While you might have a good reason to use a binary format, do consider whether you can just use an existing text format like JSON instead. Human readability counts a lot, and machines and networks are typically fast enough that using a custom format to reduce size is unnecessary nowadays.

– jpmc26
2 days ago

2

The tool you want is a hex editor. If you're using Windows, HxD is pretty good.

– immibis
2 days ago

3

@jpmc26 there's still a lot of use for binary formats and always will be. Human readability is usually secondary to performance, storage requirements, and network performance. And there are still many areas where especially network performance is poor and storage limited. Plus don't forget all the systems having to interface with legacy systems (both hardware and software) and having to support their data formats.

– jwenting
yesterday

3

@jwenting No, actually, developer time is usually the most expensive piece of an application. Sure, that may not be the case if you're working at Google or Facebook, but most apps don't operate at that scale. And when your developers spending time on stuff is the most expensive resource, human readability counts for a lot more than the 100 milliseconds extra for the program to parse it.

– jpmc26
yesterday

You got one answer saying "use the hexdump directly, and do this and that additionally"- and that answer got a lot of upvotes. And a second answer, 5 hours later(!), saying only "use a hexdump". Then you accepted the second one in favor of the first? Seriously?

– Doc Brown
2 days ago

While you might have a good reason to use a binary format, do consider whether you can just use an existing text format like JSON instead. Human readability counts a lot, and machines and networks are typically fast enough that using a custom format to reduce size is unnecessary nowadays.

– jpmc26
2 days ago

The tool you want is a hex editor. If you're using Windows, HxD is pretty good.

– immibis
2 days ago

@jpmc26 there's still a lot of use for binary formats and always will be. Human readability is usually secondary to performance, storage requirements, and network performance. And there are still many areas where especially network performance is poor and storage limited. Plus don't forget all the systems having to interface with legacy systems (both hardware and software) and having to support their data formats.

– jwenting
yesterday

@jwenting No, actually, developer time is usually the most expensive piece of an application. Sure, that may not be the case if you're working at Google or Facebook, but most apps don't operate at that scale. And when your developers spending time on stuff is the most expensive resource, human readability counts for a lot more than the 100 milliseconds extra for the program to parse it.

– jpmc26
yesterday

|
show 5 more comments

5 Answers
5

active

oldest

votes

For ad-hoc checks, just use a standard hexdump and learn to eyeball it.

If you want to tool up for a proper investigation, I usually write a separate decoder in something like Python - ideally this will be driven directly from a message spec document or IDL, and be as automated as possible (so there's no chance of manually introducing the same bug in both decoders).

Lastly, don't forget you should be writing unit tests for your decoder, using known-correct canned input.

answered 2 days ago

Useless

8,98922036

1

"just use a standard hexdump and learn to eyeball it." Yup. In my experience, multiple sections of anything up to 200 bits can be written down on a whiteboard for grouped comparison, which sometimes helps with this kind of thing to get started.

– Mast
2 days ago

I find a separate decoder well worth the effort if the binary data plays an important part in the application (or system, in general). This is especially true if the data format is variable: data in fixed layouts can be spotted in a hexdump with a little practice but hits a practicability wall fast. We debugged USB and CAN traffic with commercial packet decoders, and I have written a PROFIBus decoder (where variables spread across bytes, completely unreadable in a hex dump), and found all three of them immensely helpful.

– Peter A. Schneider
yesterday

add a comment |

The first step to doing this is that you need a way to find or define a grammar that describes structure of the data i.e. a schema.

An example of this is a language feature of COBOL which is informally known as copybook. In COBOL programs you would define the structure of the data in memory. This structure mapped directly to the way the bytes were stored. This is common to languages of that era as opposed to common contemporary languages where the physical layout of memory is an implementation concern that is abstracted away from the developer.

A google search for binary data schema language turns up a number of tools. An example is Apache DFDL. There may already be UI for this as well.

edited 2 days ago

answered 2 days ago

JimmyJames

13.2k2351

2

This feature is not reserved to 'ancient' era languages. C and C++ structs and unions can be memory aligned. C# has StructLayoutAttribute, which I have use to transmit binary data.

– Kasper van den Berg
2 days ago

1

@KaspervandenBerg Unless you are saying that C and C++ added these recently, I consider that the same era. The point is that these formats were not simply for data transmission, though they were used for that, they mapped directly to how the code worked with data in memory and on disk. That's not, in general, how newer languages tend to work though they may have such features.

– JimmyJames
2 days ago

@KaspervandenBerg C++ does not do that as much as you think it does. It is possible using implementation-specific tooling to align and eliminate padding (and, admittedly, increasingly the standard is adding features for this kind of thing) and member order is deterministic (but not necessarily the same as in memory!).

– Lightness Races in Orbit
yesterday

add a comment |

ASN.1, Abstract Syntax Notation One, provides a way of specifying a binary format.

DDT - Develop using sample data and unit tests.

A textual dump can be helpful. If in XML you can collapse/expand subhierarchies.

ASN.1 is not really needed but a grammar based, more declarative file specification is easier.

answered 2 days ago

Joop Eggen

94745

5

If the never-ending parade of security vulnerabilities in ASN.1 parsers is any indication, adopting it would certainly provide good exercise in debugging binary formats.

– Mark
2 days ago

1

@Mark many small byte arrays (and that in varying hierarchy trees) are often not handled right (securely) in C (for instance not using exceptions). Never underestimate the low-levelness, inherent unsafeness of C. ASN.1 in - for instance - java does not expose this problem. As an ASN.1 grammar directed parsing could be done safely, even C could be done with a small and safe code base. And part of the vulnerabilities are inherent of the binary format itself: one can exploit "legal" constructs of the format's grammar, that have desastrous semantics.

– Joop Eggen
yesterday

add a comment |

I'm not sure I fully understand, but it sounds like you have a parser for this binary format, and you control the code for it. So this answer is built on that assumption.

A parser will in some way be filling up structs, classes, or whatever data structure your language has. If you implement a ToString for everything that gets parsed, then you end up with a very easy to use and easily maintained method of displaying that binary data in a human readable format.

You would essentially have:

byte arrayOfBytes; // initialized somehow

Object obj = Parser.parse(arrayOfBytes);

Logger.log(obj.ToString());

And that's it, from the standpoint of using it. Of course this requires you implement/override the ToString function for yourObject class/struct/whatever, and you would also have to do so for any nested classes/structs/whatevers.

You can additionally use a conditional statement to prevent the ToString function from being called in release code so that you don't waste time on something that won't be logged outside of debug mode.

Your ToString might look like this:

return String.Format("%d,%d,%d,%d", int32var, int16var, int8var, int32var2);



// OR



return String.Format("%s:%d,%s:%d,%s:%d,%s:%d", varName1, int32var, varName2, int16var, varName3, int8var, varName4, int32var2);

Your original question makes it sound like you've somewhat attempted to do this, and that you think this method is burdensome, but you have also at some point implemented parsing a binary format and created variables to store that data. So all you have to do is print those existing variables at the appropriate level of abstraction (the class/struct the variable is in).

This is something you should only have to do once, and you can do it while building the parser. And it will only change when the binary format changes (which will already prompt a change to your parser anyways).

In a similar vein: some languages have robust features for turning classes into XML or JSON. C# is particularly good at this. You don't have to give up your binary format, you just do the XML or JSON in a debug logging statement and leave your release code alone.

I'd personally recommend not going the hex dump route, because it's prone to errors (did you start on the right byte, are you sure when you're reading left to right that you're "seeing" the correct endianness, etc).

Example: Say your ToStrings spit out variables a,b,c,d,e,f,g,h. You run your program and notice a bug with g, but the problem really started with c (but you're debugging, so you haven't figured that out yet). If you know the input values (and you should) you'll instantly see that c is where problems start.

Compared to a hex dump that just tells you 338E 8455 0000 FF76 0000 E444 ....; if your fields are varying size, where does c begin and what's the value - a hex editor will tell you but my point is this is error prone and time consuming. Not only that, but you can't easily/quickly automate a test via a hex viewer. Printing out a string after parsing the data will tell you exactly what your program is 'thinking', and will be one step along the path of automated testing.

answered yesterday

Shaz

2,3761714

add a comment |

Other answers have described viewing a hex dump, or writing out object structures in i.e. JSON. I think combining both of these is very helpful.

Using a tool that can render the JSON on top of the hex dump is really useful; I wrote an open source tool that parsed .NET binaries called dotNetBytes, here's a view of an example DLL.

dotNetBytes Example

answered yesterday

Carl Walsh

1453

add a comment |

protected by gnat yesterday

Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).

Would you like to answer one of these unanswered questions instead?

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

For ad-hoc checks, just use a standard hexdump and learn to eyeball it.

Lastly, don't forget you should be writing unit tests for your decoder, using known-correct canned input.

answered 2 days ago

Useless

8,98922036

1

"just use a standard hexdump and learn to eyeball it." Yup. In my experience, multiple sections of anything up to 200 bits can be written down on a whiteboard for grouped comparison, which sometimes helps with this kind of thing to get started.

– Mast
2 days ago

I find a separate decoder well worth the effort if the binary data plays an important part in the application (or system, in general). This is especially true if the data format is variable: data in fixed layouts can be spotted in a hexdump with a little practice but hits a practicability wall fast. We debugged USB and CAN traffic with commercial packet decoders, and I have written a PROFIBus decoder (where variables spread across bytes, completely unreadable in a hex dump), and found all three of them immensely helpful.

– Peter A. Schneider
yesterday

add a comment |

For ad-hoc checks, just use a standard hexdump and learn to eyeball it.

Lastly, don't forget you should be writing unit tests for your decoder, using known-correct canned input.

answered 2 days ago

Useless

8,98922036

1

"just use a standard hexdump and learn to eyeball it." Yup. In my experience, multiple sections of anything up to 200 bits can be written down on a whiteboard for grouped comparison, which sometimes helps with this kind of thing to get started.

– Mast
2 days ago

I find a separate decoder well worth the effort if the binary data plays an important part in the application (or system, in general). This is especially true if the data format is variable: data in fixed layouts can be spotted in a hexdump with a little practice but hits a practicability wall fast. We debugged USB and CAN traffic with commercial packet decoders, and I have written a PROFIBus decoder (where variables spread across bytes, completely unreadable in a hex dump), and found all three of them immensely helpful.

– Peter A. Schneider
yesterday

add a comment |

For ad-hoc checks, just use a standard hexdump and learn to eyeball it.

Lastly, don't forget you should be writing unit tests for your decoder, using known-correct canned input.

answered 2 days ago

Useless

8,98922036

For ad-hoc checks, just use a standard hexdump and learn to eyeball it.

Lastly, don't forget you should be writing unit tests for your decoder, using known-correct canned input.

answered 2 days ago

Useless

8,98922036

answered 2 days ago

Useless

8,98922036

answered 2 days ago

Useless

8,98922036

answered 2 days ago

Useless

8,98922036

1

"just use a standard hexdump and learn to eyeball it." Yup. In my experience, multiple sections of anything up to 200 bits can be written down on a whiteboard for grouped comparison, which sometimes helps with this kind of thing to get started.

– Mast
2 days ago

I find a separate decoder well worth the effort if the binary data plays an important part in the application (or system, in general). This is especially true if the data format is variable: data in fixed layouts can be spotted in a hexdump with a little practice but hits a practicability wall fast. We debugged USB and CAN traffic with commercial packet decoders, and I have written a PROFIBus decoder (where variables spread across bytes, completely unreadable in a hex dump), and found all three of them immensely helpful.

– Peter A. Schneider
yesterday

add a comment |

1

"just use a standard hexdump and learn to eyeball it." Yup. In my experience, multiple sections of anything up to 200 bits can be written down on a whiteboard for grouped comparison, which sometimes helps with this kind of thing to get started.

– Mast
2 days ago

I find a separate decoder well worth the effort if the binary data plays an important part in the application (or system, in general). This is especially true if the data format is variable: data in fixed layouts can be spotted in a hexdump with a little practice but hits a practicability wall fast. We debugged USB and CAN traffic with commercial packet decoders, and I have written a PROFIBus decoder (where variables spread across bytes, completely unreadable in a hex dump), and found all three of them immensely helpful.

– Peter A. Schneider
yesterday

"just use a standard hexdump and learn to eyeball it." Yup. In my experience, multiple sections of anything up to 200 bits can be written down on a whiteboard for grouped comparison, which sometimes helps with this kind of thing to get started.

– Mast
2 days ago

I find a separate decoder well worth the effort if the binary data plays an important part in the application (or system, in general). This is especially true if the data format is variable: data in fixed layouts can be spotted in a hexdump with a little practice but hits a practicability wall fast. We debugged USB and CAN traffic with commercial packet decoders, and I have written a PROFIBus decoder (where variables spread across bytes, completely unreadable in a hex dump), and found all three of them immensely helpful.

– Peter A. Schneider
yesterday

add a comment |

The first step to doing this is that you need a way to find or define a grammar that describes structure of the data i.e. a schema.

A google search for binary data schema language turns up a number of tools. An example is Apache DFDL. There may already be UI for this as well.

edited 2 days ago

answered 2 days ago

JimmyJames

13.2k2351

2

This feature is not reserved to 'ancient' era languages. C and C++ structs and unions can be memory aligned. C# has StructLayoutAttribute, which I have use to transmit binary data.

– Kasper van den Berg
2 days ago

1

@KaspervandenBerg Unless you are saying that C and C++ added these recently, I consider that the same era. The point is that these formats were not simply for data transmission, though they were used for that, they mapped directly to how the code worked with data in memory and on disk. That's not, in general, how newer languages tend to work though they may have such features.

– JimmyJames
2 days ago

@KaspervandenBerg C++ does not do that as much as you think it does. It is possible using implementation-specific tooling to align and eliminate padding (and, admittedly, increasingly the standard is adding features for this kind of thing) and member order is deterministic (but not necessarily the same as in memory!).

– Lightness Races in Orbit
yesterday

add a comment |

The first step to doing this is that you need a way to find or define a grammar that describes structure of the data i.e. a schema.

A google search for binary data schema language turns up a number of tools. An example is Apache DFDL. There may already be UI for this as well.

edited 2 days ago

answered 2 days ago

JimmyJames

13.2k2351

2

This feature is not reserved to 'ancient' era languages. C and C++ structs and unions can be memory aligned. C# has StructLayoutAttribute, which I have use to transmit binary data.

– Kasper van den Berg
2 days ago

1

@KaspervandenBerg Unless you are saying that C and C++ added these recently, I consider that the same era. The point is that these formats were not simply for data transmission, though they were used for that, they mapped directly to how the code worked with data in memory and on disk. That's not, in general, how newer languages tend to work though they may have such features.

– JimmyJames
2 days ago

@KaspervandenBerg C++ does not do that as much as you think it does. It is possible using implementation-specific tooling to align and eliminate padding (and, admittedly, increasingly the standard is adding features for this kind of thing) and member order is deterministic (but not necessarily the same as in memory!).

– Lightness Races in Orbit
yesterday

add a comment |

The first step to doing this is that you need a way to find or define a grammar that describes structure of the data i.e. a schema.

A google search for binary data schema language turns up a number of tools. An example is Apache DFDL. There may already be UI for this as well.

edited 2 days ago

answered 2 days ago

JimmyJames

13.2k2351

The first step to doing this is that you need a way to find or define a grammar that describes structure of the data i.e. a schema.

A google search for binary data schema language turns up a number of tools. An example is Apache DFDL. There may already be UI for this as well.

edited 2 days ago

answered 2 days ago

JimmyJames

13.2k2351

edited 2 days ago

answered 2 days ago

JimmyJames

13.2k2351

answered 2 days ago

JimmyJames

13.2k2351

answered 2 days ago

JimmyJames

13.2k2351

2

This feature is not reserved to 'ancient' era languages. C and C++ structs and unions can be memory aligned. C# has StructLayoutAttribute, which I have use to transmit binary data.

– Kasper van den Berg
2 days ago

1

@KaspervandenBerg Unless you are saying that C and C++ added these recently, I consider that the same era. The point is that these formats were not simply for data transmission, though they were used for that, they mapped directly to how the code worked with data in memory and on disk. That's not, in general, how newer languages tend to work though they may have such features.

– JimmyJames
2 days ago

@KaspervandenBerg C++ does not do that as much as you think it does. It is possible using implementation-specific tooling to align and eliminate padding (and, admittedly, increasingly the standard is adding features for this kind of thing) and member order is deterministic (but not necessarily the same as in memory!).

– Lightness Races in Orbit
yesterday

add a comment |

2

This feature is not reserved to 'ancient' era languages. C and C++ structs and unions can be memory aligned. C# has StructLayoutAttribute, which I have use to transmit binary data.

– Kasper van den Berg
2 days ago

1

@KaspervandenBerg Unless you are saying that C and C++ added these recently, I consider that the same era. The point is that these formats were not simply for data transmission, though they were used for that, they mapped directly to how the code worked with data in memory and on disk. That's not, in general, how newer languages tend to work though they may have such features.

– JimmyJames
2 days ago

@KaspervandenBerg C++ does not do that as much as you think it does. It is possible using implementation-specific tooling to align and eliminate padding (and, admittedly, increasingly the standard is adding features for this kind of thing) and member order is deterministic (but not necessarily the same as in memory!).

– Lightness Races in Orbit
yesterday

This feature is not reserved to 'ancient' era languages. C and C++ structs and unions can be memory aligned. C# has StructLayoutAttribute, which I have use to transmit binary data.

– Kasper van den Berg
2 days ago

@KaspervandenBerg Unless you are saying that C and C++ added these recently, I consider that the same era. The point is that these formats were not simply for data transmission, though they were used for that, they mapped directly to how the code worked with data in memory and on disk. That's not, in general, how newer languages tend to work though they may have such features.

– JimmyJames
2 days ago

@KaspervandenBerg C++ does not do that as much as you think it does. It is possible using implementation-specific tooling to align and eliminate padding (and, admittedly, increasingly the standard is adding features for this kind of thing) and member order is deterministic (but not necessarily the same as in memory!).

– Lightness Races in Orbit
yesterday

add a comment |

ASN.1, Abstract Syntax Notation One, provides a way of specifying a binary format.

DDT - Develop using sample data and unit tests.

A textual dump can be helpful. If in XML you can collapse/expand subhierarchies.

ASN.1 is not really needed but a grammar based, more declarative file specification is easier.

answered 2 days ago

Joop Eggen

94745

5

If the never-ending parade of security vulnerabilities in ASN.1 parsers is any indication, adopting it would certainly provide good exercise in debugging binary formats.

– Mark
2 days ago

1

@Mark many small byte arrays (and that in varying hierarchy trees) are often not handled right (securely) in C (for instance not using exceptions). Never underestimate the low-levelness, inherent unsafeness of C. ASN.1 in - for instance - java does not expose this problem. As an ASN.1 grammar directed parsing could be done safely, even C could be done with a small and safe code base. And part of the vulnerabilities are inherent of the binary format itself: one can exploit "legal" constructs of the format's grammar, that have desastrous semantics.

– Joop Eggen
yesterday

add a comment |

ASN.1, Abstract Syntax Notation One, provides a way of specifying a binary format.

DDT - Develop using sample data and unit tests.

A textual dump can be helpful. If in XML you can collapse/expand subhierarchies.

ASN.1 is not really needed but a grammar based, more declarative file specification is easier.

answered 2 days ago

Joop Eggen

94745

5

If the never-ending parade of security vulnerabilities in ASN.1 parsers is any indication, adopting it would certainly provide good exercise in debugging binary formats.

– Mark
2 days ago

1

@Mark many small byte arrays (and that in varying hierarchy trees) are often not handled right (securely) in C (for instance not using exceptions). Never underestimate the low-levelness, inherent unsafeness of C. ASN.1 in - for instance - java does not expose this problem. As an ASN.1 grammar directed parsing could be done safely, even C could be done with a small and safe code base. And part of the vulnerabilities are inherent of the binary format itself: one can exploit "legal" constructs of the format's grammar, that have desastrous semantics.

– Joop Eggen
yesterday

add a comment |

ASN.1, Abstract Syntax Notation One, provides a way of specifying a binary format.

DDT - Develop using sample data and unit tests.

A textual dump can be helpful. If in XML you can collapse/expand subhierarchies.

ASN.1 is not really needed but a grammar based, more declarative file specification is easier.

answered 2 days ago

Joop Eggen

94745

ASN.1, Abstract Syntax Notation One, provides a way of specifying a binary format.

DDT - Develop using sample data and unit tests.

A textual dump can be helpful. If in XML you can collapse/expand subhierarchies.

ASN.1 is not really needed but a grammar based, more declarative file specification is easier.

answered 2 days ago

Joop Eggen

94745

answered 2 days ago

Joop Eggen

94745

answered 2 days ago

Joop Eggen

94745

answered 2 days ago

Joop Eggen

94745

5

If the never-ending parade of security vulnerabilities in ASN.1 parsers is any indication, adopting it would certainly provide good exercise in debugging binary formats.

– Mark
2 days ago

1

@Mark many small byte arrays (and that in varying hierarchy trees) are often not handled right (securely) in C (for instance not using exceptions). Never underestimate the low-levelness, inherent unsafeness of C. ASN.1 in - for instance - java does not expose this problem. As an ASN.1 grammar directed parsing could be done safely, even C could be done with a small and safe code base. And part of the vulnerabilities are inherent of the binary format itself: one can exploit "legal" constructs of the format's grammar, that have desastrous semantics.

– Joop Eggen
yesterday

add a comment |

5

If the never-ending parade of security vulnerabilities in ASN.1 parsers is any indication, adopting it would certainly provide good exercise in debugging binary formats.

– Mark
2 days ago

1

@Mark many small byte arrays (and that in varying hierarchy trees) are often not handled right (securely) in C (for instance not using exceptions). Never underestimate the low-levelness, inherent unsafeness of C. ASN.1 in - for instance - java does not expose this problem. As an ASN.1 grammar directed parsing could be done safely, even C could be done with a small and safe code base. And part of the vulnerabilities are inherent of the binary format itself: one can exploit "legal" constructs of the format's grammar, that have desastrous semantics.

– Joop Eggen
yesterday

If the never-ending parade of security vulnerabilities in ASN.1 parsers is any indication, adopting it would certainly provide good exercise in debugging binary formats.

– Mark
2 days ago

@Mark many small byte arrays (and that in varying hierarchy trees) are often not handled right (securely) in C (for instance not using exceptions). Never underestimate the low-levelness, inherent unsafeness of C. ASN.1 in - for instance - java does not expose this problem. As an ASN.1 grammar directed parsing could be done safely, even C could be done with a small and safe code base. And part of the vulnerabilities are inherent of the binary format itself: one can exploit "legal" constructs of the format's grammar, that have desastrous semantics.

– Joop Eggen
yesterday

add a comment |

I'm not sure I fully understand, but it sounds like you have a parser for this binary format, and you control the code for it. So this answer is built on that assumption.

You would essentially have:

byte arrayOfBytes; // initialized somehow

Object obj = Parser.parse(arrayOfBytes);

Logger.log(obj.ToString());

Your ToString might look like this:

return String.Format("%d,%d,%d,%d", int32var, int16var, int8var, int32var2);



// OR



return String.Format("%s:%d,%s:%d,%s:%d,%s:%d", varName1, int32var, varName2, int16var, varName3, int8var, varName4, int32var2);

answered yesterday

Shaz

2,3761714

add a comment |

I'm not sure I fully understand, but it sounds like you have a parser for this binary format, and you control the code for it. So this answer is built on that assumption.

You would essentially have:

byte arrayOfBytes; // initialized somehow

Object obj = Parser.parse(arrayOfBytes);

Logger.log(obj.ToString());

Your ToString might look like this:

return String.Format("%d,%d,%d,%d", int32var, int16var, int8var, int32var2);



// OR



return String.Format("%s:%d,%s:%d,%s:%d,%s:%d", varName1, int32var, varName2, int16var, varName3, int8var, varName4, int32var2);

answered yesterday

Shaz

2,3761714

add a comment |

I'm not sure I fully understand, but it sounds like you have a parser for this binary format, and you control the code for it. So this answer is built on that assumption.

You would essentially have:

byte arrayOfBytes; // initialized somehow

Object obj = Parser.parse(arrayOfBytes);

Logger.log(obj.ToString());

Your ToString might look like this:

return String.Format("%d,%d,%d,%d", int32var, int16var, int8var, int32var2);



// OR



return String.Format("%s:%d,%s:%d,%s:%d,%s:%d", varName1, int32var, varName2, int16var, varName3, int8var, varName4, int32var2);

answered yesterday

Shaz

2,3761714

I'm not sure I fully understand, but it sounds like you have a parser for this binary format, and you control the code for it. So this answer is built on that assumption.

You would essentially have:

byte arrayOfBytes; // initialized somehow

Object obj = Parser.parse(arrayOfBytes);

Logger.log(obj.ToString());

Your ToString might look like this:

return String.Format("%d,%d,%d,%d", int32var, int16var, int8var, int32var2);



// OR



return String.Format("%s:%d,%s:%d,%s:%d,%s:%d", varName1, int32var, varName2, int16var, varName3, int8var, varName4, int32var2);

answered yesterday

Shaz

2,3761714

answered yesterday

Shaz

2,3761714

answered yesterday

Shaz

2,3761714

answered yesterday

Shaz

2,3761714

add a comment |

Other answers have described viewing a hex dump, or writing out object structures in i.e. JSON. I think combining both of these is very helpful.

Using a tool that can render the JSON on top of the hex dump is really useful; I wrote an open source tool that parsed .NET binaries called dotNetBytes, here's a view of an example DLL.

dotNetBytes Example

answered yesterday

Carl Walsh

1453

add a comment |

Other answers have described viewing a hex dump, or writing out object structures in i.e. JSON. I think combining both of these is very helpful.

Using a tool that can render the JSON on top of the hex dump is really useful; I wrote an open source tool that parsed .NET binaries called dotNetBytes, here's a view of an example DLL.

dotNetBytes Example

answered yesterday

Carl Walsh

1453

add a comment |

Other answers have described viewing a hex dump, or writing out object structures in i.e. JSON. I think combining both of these is very helpful.

Using a tool that can render the JSON on top of the hex dump is really useful; I wrote an open source tool that parsed .NET binaries called dotNetBytes, here's a view of an example DLL.

dotNetBytes Example

answered yesterday

Carl Walsh

1453

Other answers have described viewing a hex dump, or writing out object structures in i.e. JSON. I think combining both of these is very helpful.

Using a tool that can render the JSON on top of the hex dump is really useful; I wrote an open source tool that parsed .NET binaries called dotNetBytes, here's a view of an example DLL.

dotNetBytes Example

answered yesterday

Carl Walsh

1453

answered yesterday

Carl Walsh

1453

answered yesterday

Carl Walsh

1453

answered yesterday

Carl Walsh

1453

add a comment |

protected by gnat yesterday

This page is only for reference, If you need detailed information, please check here

2AAzRbC9H

搜尋此網誌

Argthtjtr