Embedding binary data in a script efficiently












3















I have seen some installation files (huge ones, install.sh for Matlab or Mathematica, for example) for Unix-like systems, they must have embedded quite a lot of binary data, such as icons, sound, graphics, etc, into the script. I am wondering how that can be done, since this can be potentially useful in simplifying file structure.



I am particularly interested in doing this with Python and/or Bash.



Existing methods that I know of in Python:




  1. Just use a byte string: x = b'x23xa3xef' ..., terribly inefficient, takes half a MB for a 100KB wav file.

  2. base64, better than option 1, enlarge the size by a factor of 4/3.


I am wondering if there are other (better) ways to do this?










share|improve this question




















  • 2





    I very much doubt they are embedding the entire payload into the script.

    – Burhan Khalid
    Dec 22 '14 at 11:45











  • I think they probably did, by looking at the size, it's several GB.

    – qed
    Dec 22 '14 at 11:48













  • Your intro makes this question "too broad". If you're interested in efficient method for integrating binary data into script files, ask that..

    – Korem
    Dec 22 '14 at 11:48






  • 1





    This may help stackoverflow.com/questions/955460/…

    – ρss
    Dec 22 '14 at 12:21






  • 1





    Some languages, such as Bash, are forgiving enough to allow arbitrary binary data to be appended the end of a script.

    – Rufflewind
    Dec 23 '14 at 0:48
















3















I have seen some installation files (huge ones, install.sh for Matlab or Mathematica, for example) for Unix-like systems, they must have embedded quite a lot of binary data, such as icons, sound, graphics, etc, into the script. I am wondering how that can be done, since this can be potentially useful in simplifying file structure.



I am particularly interested in doing this with Python and/or Bash.



Existing methods that I know of in Python:




  1. Just use a byte string: x = b'x23xa3xef' ..., terribly inefficient, takes half a MB for a 100KB wav file.

  2. base64, better than option 1, enlarge the size by a factor of 4/3.


I am wondering if there are other (better) ways to do this?










share|improve this question




















  • 2





    I very much doubt they are embedding the entire payload into the script.

    – Burhan Khalid
    Dec 22 '14 at 11:45











  • I think they probably did, by looking at the size, it's several GB.

    – qed
    Dec 22 '14 at 11:48













  • Your intro makes this question "too broad". If you're interested in efficient method for integrating binary data into script files, ask that..

    – Korem
    Dec 22 '14 at 11:48






  • 1





    This may help stackoverflow.com/questions/955460/…

    – ρss
    Dec 22 '14 at 12:21






  • 1





    Some languages, such as Bash, are forgiving enough to allow arbitrary binary data to be appended the end of a script.

    – Rufflewind
    Dec 23 '14 at 0:48














3












3








3








I have seen some installation files (huge ones, install.sh for Matlab or Mathematica, for example) for Unix-like systems, they must have embedded quite a lot of binary data, such as icons, sound, graphics, etc, into the script. I am wondering how that can be done, since this can be potentially useful in simplifying file structure.



I am particularly interested in doing this with Python and/or Bash.



Existing methods that I know of in Python:




  1. Just use a byte string: x = b'x23xa3xef' ..., terribly inefficient, takes half a MB for a 100KB wav file.

  2. base64, better than option 1, enlarge the size by a factor of 4/3.


I am wondering if there are other (better) ways to do this?










share|improve this question
















I have seen some installation files (huge ones, install.sh for Matlab or Mathematica, for example) for Unix-like systems, they must have embedded quite a lot of binary data, such as icons, sound, graphics, etc, into the script. I am wondering how that can be done, since this can be potentially useful in simplifying file structure.



I am particularly interested in doing this with Python and/or Bash.



Existing methods that I know of in Python:




  1. Just use a byte string: x = b'x23xa3xef' ..., terribly inefficient, takes half a MB for a 100KB wav file.

  2. base64, better than option 1, enlarge the size by a factor of 4/3.


I am wondering if there are other (better) ways to do this?







python bash binaryfiles






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 23 '14 at 0:45









Jonathan Leffler

569k916811032




569k916811032










asked Dec 22 '14 at 11:37









qedqed

9,8991267117




9,8991267117








  • 2





    I very much doubt they are embedding the entire payload into the script.

    – Burhan Khalid
    Dec 22 '14 at 11:45











  • I think they probably did, by looking at the size, it's several GB.

    – qed
    Dec 22 '14 at 11:48













  • Your intro makes this question "too broad". If you're interested in efficient method for integrating binary data into script files, ask that..

    – Korem
    Dec 22 '14 at 11:48






  • 1





    This may help stackoverflow.com/questions/955460/…

    – ρss
    Dec 22 '14 at 12:21






  • 1





    Some languages, such as Bash, are forgiving enough to allow arbitrary binary data to be appended the end of a script.

    – Rufflewind
    Dec 23 '14 at 0:48














  • 2





    I very much doubt they are embedding the entire payload into the script.

    – Burhan Khalid
    Dec 22 '14 at 11:45











  • I think they probably did, by looking at the size, it's several GB.

    – qed
    Dec 22 '14 at 11:48













  • Your intro makes this question "too broad". If you're interested in efficient method for integrating binary data into script files, ask that..

    – Korem
    Dec 22 '14 at 11:48






  • 1





    This may help stackoverflow.com/questions/955460/…

    – ρss
    Dec 22 '14 at 12:21






  • 1





    Some languages, such as Bash, are forgiving enough to allow arbitrary binary data to be appended the end of a script.

    – Rufflewind
    Dec 23 '14 at 0:48








2




2





I very much doubt they are embedding the entire payload into the script.

– Burhan Khalid
Dec 22 '14 at 11:45





I very much doubt they are embedding the entire payload into the script.

– Burhan Khalid
Dec 22 '14 at 11:45













I think they probably did, by looking at the size, it's several GB.

– qed
Dec 22 '14 at 11:48







I think they probably did, by looking at the size, it's several GB.

– qed
Dec 22 '14 at 11:48















Your intro makes this question "too broad". If you're interested in efficient method for integrating binary data into script files, ask that..

– Korem
Dec 22 '14 at 11:48





Your intro makes this question "too broad". If you're interested in efficient method for integrating binary data into script files, ask that..

– Korem
Dec 22 '14 at 11:48




1




1





This may help stackoverflow.com/questions/955460/…

– ρss
Dec 22 '14 at 12:21





This may help stackoverflow.com/questions/955460/…

– ρss
Dec 22 '14 at 12:21




1




1





Some languages, such as Bash, are forgiving enough to allow arbitrary binary data to be appended the end of a script.

– Rufflewind
Dec 23 '14 at 0:48





Some languages, such as Bash, are forgiving enough to allow arbitrary binary data to be appended the end of a script.

– Rufflewind
Dec 23 '14 at 0:48












2 Answers
2






active

oldest

votes


















2














You can use base64 + compression (using bz2 for instance) if that suits your data (e.g., if you're not embedding already compressed data).



For instance, to create your data (say your data consist of 100 null bytes followed by 200 bytes with value 0x01):



>>> import bz2
>>> bz2.compress(b'x00' * 100 + b'x01' * 200).encode('base64').replace('n', '')
'QlpoOTFBWSZTWcl9Q1UAAABBBGAAQAAEACAAIZpoM00SrccXckU4UJDJfUNV'


And to use it (in your script) to write the data to a file:



import bz2
data = 'QlpoOTFBWSZTWcl9Q1UAAABBBGAAQAAEACAAIZpoM00SrccXckU4UJDJfUNV'
with open('/tmp/testfile', 'w') as fdesc:
fdesc.write(bz2.decompress(data.decode('base64')))





share|improve this answer


























  • Nice, could you give a small example?

    – qed
    Dec 23 '14 at 0:39



















1














Here's a quick and dirty way. Create the following script called MyInstaller:



#!/bin/bash

dd if="$0" of=payload bs=1 skip=54

exit


Then append your binary to the script, and make it executable:



cat myBinary >> myInstaller
chmod +x myInstaller


When you run the script, it will copy the binary portion to a new file specified in the path of=. This could be a tar file or whatever, so you can do additional processing (unarchiving, setting execute permissions, etc) after the dd command. Just adjust the number in "skip" to reflect the total length of the script before the binary data starts.






share|improve this answer
























  • A frequent use is a shell script to unzip the tarbal in the right place and some additional checks. Java packages for Linux were built like that.

    – mcoolive
    Dec 22 '14 at 13:59











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f27601972%2fembedding-binary-data-in-a-script-efficiently%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














You can use base64 + compression (using bz2 for instance) if that suits your data (e.g., if you're not embedding already compressed data).



For instance, to create your data (say your data consist of 100 null bytes followed by 200 bytes with value 0x01):



>>> import bz2
>>> bz2.compress(b'x00' * 100 + b'x01' * 200).encode('base64').replace('n', '')
'QlpoOTFBWSZTWcl9Q1UAAABBBGAAQAAEACAAIZpoM00SrccXckU4UJDJfUNV'


And to use it (in your script) to write the data to a file:



import bz2
data = 'QlpoOTFBWSZTWcl9Q1UAAABBBGAAQAAEACAAIZpoM00SrccXckU4UJDJfUNV'
with open('/tmp/testfile', 'w') as fdesc:
fdesc.write(bz2.decompress(data.decode('base64')))





share|improve this answer


























  • Nice, could you give a small example?

    – qed
    Dec 23 '14 at 0:39
















2














You can use base64 + compression (using bz2 for instance) if that suits your data (e.g., if you're not embedding already compressed data).



For instance, to create your data (say your data consist of 100 null bytes followed by 200 bytes with value 0x01):



>>> import bz2
>>> bz2.compress(b'x00' * 100 + b'x01' * 200).encode('base64').replace('n', '')
'QlpoOTFBWSZTWcl9Q1UAAABBBGAAQAAEACAAIZpoM00SrccXckU4UJDJfUNV'


And to use it (in your script) to write the data to a file:



import bz2
data = 'QlpoOTFBWSZTWcl9Q1UAAABBBGAAQAAEACAAIZpoM00SrccXckU4UJDJfUNV'
with open('/tmp/testfile', 'w') as fdesc:
fdesc.write(bz2.decompress(data.decode('base64')))





share|improve this answer


























  • Nice, could you give a small example?

    – qed
    Dec 23 '14 at 0:39














2












2








2







You can use base64 + compression (using bz2 for instance) if that suits your data (e.g., if you're not embedding already compressed data).



For instance, to create your data (say your data consist of 100 null bytes followed by 200 bytes with value 0x01):



>>> import bz2
>>> bz2.compress(b'x00' * 100 + b'x01' * 200).encode('base64').replace('n', '')
'QlpoOTFBWSZTWcl9Q1UAAABBBGAAQAAEACAAIZpoM00SrccXckU4UJDJfUNV'


And to use it (in your script) to write the data to a file:



import bz2
data = 'QlpoOTFBWSZTWcl9Q1UAAABBBGAAQAAEACAAIZpoM00SrccXckU4UJDJfUNV'
with open('/tmp/testfile', 'w') as fdesc:
fdesc.write(bz2.decompress(data.decode('base64')))





share|improve this answer















You can use base64 + compression (using bz2 for instance) if that suits your data (e.g., if you're not embedding already compressed data).



For instance, to create your data (say your data consist of 100 null bytes followed by 200 bytes with value 0x01):



>>> import bz2
>>> bz2.compress(b'x00' * 100 + b'x01' * 200).encode('base64').replace('n', '')
'QlpoOTFBWSZTWcl9Q1UAAABBBGAAQAAEACAAIZpoM00SrccXckU4UJDJfUNV'


And to use it (in your script) to write the data to a file:



import bz2
data = 'QlpoOTFBWSZTWcl9Q1UAAABBBGAAQAAEACAAIZpoM00SrccXckU4UJDJfUNV'
with open('/tmp/testfile', 'w') as fdesc:
fdesc.write(bz2.decompress(data.decode('base64')))






share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 23 '14 at 0:44

























answered Dec 23 '14 at 0:38









PierrePierre

4,48412039




4,48412039













  • Nice, could you give a small example?

    – qed
    Dec 23 '14 at 0:39



















  • Nice, could you give a small example?

    – qed
    Dec 23 '14 at 0:39

















Nice, could you give a small example?

– qed
Dec 23 '14 at 0:39





Nice, could you give a small example?

– qed
Dec 23 '14 at 0:39













1














Here's a quick and dirty way. Create the following script called MyInstaller:



#!/bin/bash

dd if="$0" of=payload bs=1 skip=54

exit


Then append your binary to the script, and make it executable:



cat myBinary >> myInstaller
chmod +x myInstaller


When you run the script, it will copy the binary portion to a new file specified in the path of=. This could be a tar file or whatever, so you can do additional processing (unarchiving, setting execute permissions, etc) after the dd command. Just adjust the number in "skip" to reflect the total length of the script before the binary data starts.






share|improve this answer
























  • A frequent use is a shell script to unzip the tarbal in the right place and some additional checks. Java packages for Linux were built like that.

    – mcoolive
    Dec 22 '14 at 13:59
















1














Here's a quick and dirty way. Create the following script called MyInstaller:



#!/bin/bash

dd if="$0" of=payload bs=1 skip=54

exit


Then append your binary to the script, and make it executable:



cat myBinary >> myInstaller
chmod +x myInstaller


When you run the script, it will copy the binary portion to a new file specified in the path of=. This could be a tar file or whatever, so you can do additional processing (unarchiving, setting execute permissions, etc) after the dd command. Just adjust the number in "skip" to reflect the total length of the script before the binary data starts.






share|improve this answer
























  • A frequent use is a shell script to unzip the tarbal in the right place and some additional checks. Java packages for Linux were built like that.

    – mcoolive
    Dec 22 '14 at 13:59














1












1








1







Here's a quick and dirty way. Create the following script called MyInstaller:



#!/bin/bash

dd if="$0" of=payload bs=1 skip=54

exit


Then append your binary to the script, and make it executable:



cat myBinary >> myInstaller
chmod +x myInstaller


When you run the script, it will copy the binary portion to a new file specified in the path of=. This could be a tar file or whatever, so you can do additional processing (unarchiving, setting execute permissions, etc) after the dd command. Just adjust the number in "skip" to reflect the total length of the script before the binary data starts.






share|improve this answer













Here's a quick and dirty way. Create the following script called MyInstaller:



#!/bin/bash

dd if="$0" of=payload bs=1 skip=54

exit


Then append your binary to the script, and make it executable:



cat myBinary >> myInstaller
chmod +x myInstaller


When you run the script, it will copy the binary portion to a new file specified in the path of=. This could be a tar file or whatever, so you can do additional processing (unarchiving, setting execute permissions, etc) after the dd command. Just adjust the number in "skip" to reflect the total length of the script before the binary data starts.







share|improve this answer












share|improve this answer



share|improve this answer










answered Dec 22 '14 at 12:36









Ivan XIvan X

1,7251019




1,7251019













  • A frequent use is a shell script to unzip the tarbal in the right place and some additional checks. Java packages for Linux were built like that.

    – mcoolive
    Dec 22 '14 at 13:59



















  • A frequent use is a shell script to unzip the tarbal in the right place and some additional checks. Java packages for Linux were built like that.

    – mcoolive
    Dec 22 '14 at 13:59

















A frequent use is a shell script to unzip the tarbal in the right place and some additional checks. Java packages for Linux were built like that.

– mcoolive
Dec 22 '14 at 13:59





A frequent use is a shell script to unzip the tarbal in the right place and some additional checks. Java packages for Linux were built like that.

– mcoolive
Dec 22 '14 at 13:59


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f27601972%2fembedding-binary-data-in-a-script-efficiently%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

RAC Tourist Trophy