If i parse an encrypted html file to a string can i somehow obtain the text from it?












2















    import java.net.*;
import java.io.*;
import org.jsoup.Jsoup;
import org.jsoup.helper.Validate;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class UrlReaderTest {
public static void main(String args) throws Exception {

URL url = new URL("https://www.amazon.com/");
String s = null;
StringBuilder contentBuilder = new StringBuilder();
try {
BufferedReader in = new BufferedReader(new
InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null) {
contentBuilder.append(str);
}
in.close();
} catch (IOException e) {
System.err.println("Error");

}

s = contentBuilder.toString();
Document document = Jsoup.parse(s);


System.out.println(document.text());


}
}


What i am getting has mainly symbols like these: Η1?0 Π??0ή=tθ Jr?/β@Q? l?r{ΪεI/ ΉΟ~νJ?j?Ά-??ΙiLs?YdHλ²ύ?α?η?ογV"ηw[:?0??νSQψyθ?*²?γpI? ??²ρνl???2JμΚ?ΣS?Αl4ςRΛKR545υ?SK



Is there anything i can do to transform that in a form that i can use?
I can't find something specific online.



Edit: What i want specificly is to decrypt that information. What i want for example is to be able to take the text from an event page from facebook search it to find the keywords i want and use those somewhere else.










share|improve this question




















  • 2





    Are you looking for an answer other than "decrypt the file"? Those symbols are the encrypted file (bits in memory) being read in as text. They look like nonsense because they are the text representation of encrypted data which is basically random 1's and 0's. You cannot get prettier text because that prettier text would not be the text representation of the same data. If you are looking for something other than "decrypt the file" please specify what "a form that I can use" means

    – MyStackRunnethOver
    Nov 21 '18 at 22:50






  • 1





    The reason you're getting back nonsense is that you're opening a raw stream to an HTTPS URL. Since it's HTTPS, the contents of the stream are encrypted. Consider using HttpsURLConnection, which handles the communication for you and just gives you back the decrypted content. Here's an example: mkyong.com/java/java-https-client-httpsurlconnection-example

    – ethan.roday
    Nov 21 '18 at 23:37






  • 1





    That looks like a zipped rsponse. Why don't you use Jsoup to request the page? I think it decodes the response data by default.

    – t.m.adam
    Nov 22 '18 at 1:34








  • 1





    @err1100: No, openStream gives the decrypted data after SSL record processing.

    – James K Polk
    Nov 22 '18 at 1:59






  • 3





    This is all wrong. As @t.m.adam notes, the page is gzipped, so it can't be read using any Reader. Reader is meant for character streams, not the binary data that you get from gzipping text.

    – James K Polk
    Nov 22 '18 at 2:35


















2















    import java.net.*;
import java.io.*;
import org.jsoup.Jsoup;
import org.jsoup.helper.Validate;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class UrlReaderTest {
public static void main(String args) throws Exception {

URL url = new URL("https://www.amazon.com/");
String s = null;
StringBuilder contentBuilder = new StringBuilder();
try {
BufferedReader in = new BufferedReader(new
InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null) {
contentBuilder.append(str);
}
in.close();
} catch (IOException e) {
System.err.println("Error");

}

s = contentBuilder.toString();
Document document = Jsoup.parse(s);


System.out.println(document.text());


}
}


What i am getting has mainly symbols like these: Η1?0 Π??0ή=tθ Jr?/β@Q? l?r{ΪεI/ ΉΟ~νJ?j?Ά-??ΙiLs?YdHλ²ύ?α?η?ογV"ηw[:?0??νSQψyθ?*²?γpI? ??²ρνl???2JμΚ?ΣS?Αl4ςRΛKR545υ?SK



Is there anything i can do to transform that in a form that i can use?
I can't find something specific online.



Edit: What i want specificly is to decrypt that information. What i want for example is to be able to take the text from an event page from facebook search it to find the keywords i want and use those somewhere else.










share|improve this question




















  • 2





    Are you looking for an answer other than "decrypt the file"? Those symbols are the encrypted file (bits in memory) being read in as text. They look like nonsense because they are the text representation of encrypted data which is basically random 1's and 0's. You cannot get prettier text because that prettier text would not be the text representation of the same data. If you are looking for something other than "decrypt the file" please specify what "a form that I can use" means

    – MyStackRunnethOver
    Nov 21 '18 at 22:50






  • 1





    The reason you're getting back nonsense is that you're opening a raw stream to an HTTPS URL. Since it's HTTPS, the contents of the stream are encrypted. Consider using HttpsURLConnection, which handles the communication for you and just gives you back the decrypted content. Here's an example: mkyong.com/java/java-https-client-httpsurlconnection-example

    – ethan.roday
    Nov 21 '18 at 23:37






  • 1





    That looks like a zipped rsponse. Why don't you use Jsoup to request the page? I think it decodes the response data by default.

    – t.m.adam
    Nov 22 '18 at 1:34








  • 1





    @err1100: No, openStream gives the decrypted data after SSL record processing.

    – James K Polk
    Nov 22 '18 at 1:59






  • 3





    This is all wrong. As @t.m.adam notes, the page is gzipped, so it can't be read using any Reader. Reader is meant for character streams, not the binary data that you get from gzipping text.

    – James K Polk
    Nov 22 '18 at 2:35
















2












2








2








    import java.net.*;
import java.io.*;
import org.jsoup.Jsoup;
import org.jsoup.helper.Validate;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class UrlReaderTest {
public static void main(String args) throws Exception {

URL url = new URL("https://www.amazon.com/");
String s = null;
StringBuilder contentBuilder = new StringBuilder();
try {
BufferedReader in = new BufferedReader(new
InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null) {
contentBuilder.append(str);
}
in.close();
} catch (IOException e) {
System.err.println("Error");

}

s = contentBuilder.toString();
Document document = Jsoup.parse(s);


System.out.println(document.text());


}
}


What i am getting has mainly symbols like these: Η1?0 Π??0ή=tθ Jr?/β@Q? l?r{ΪεI/ ΉΟ~νJ?j?Ά-??ΙiLs?YdHλ²ύ?α?η?ογV"ηw[:?0??νSQψyθ?*²?γpI? ??²ρνl???2JμΚ?ΣS?Αl4ςRΛKR545υ?SK



Is there anything i can do to transform that in a form that i can use?
I can't find something specific online.



Edit: What i want specificly is to decrypt that information. What i want for example is to be able to take the text from an event page from facebook search it to find the keywords i want and use those somewhere else.










share|improve this question
















    import java.net.*;
import java.io.*;
import org.jsoup.Jsoup;
import org.jsoup.helper.Validate;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class UrlReaderTest {
public static void main(String args) throws Exception {

URL url = new URL("https://www.amazon.com/");
String s = null;
StringBuilder contentBuilder = new StringBuilder();
try {
BufferedReader in = new BufferedReader(new
InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null) {
contentBuilder.append(str);
}
in.close();
} catch (IOException e) {
System.err.println("Error");

}

s = contentBuilder.toString();
Document document = Jsoup.parse(s);


System.out.println(document.text());


}
}


What i am getting has mainly symbols like these: Η1?0 Π??0ή=tθ Jr?/β@Q? l?r{ΪεI/ ΉΟ~νJ?j?Ά-??ΙiLs?YdHλ²ύ?α?η?ογV"ηw[:?0??νSQψyθ?*²?γpI? ??²ρνl???2JμΚ?ΣS?Αl4ςRΛKR545υ?SK



Is there anything i can do to transform that in a form that i can use?
I can't find something specific online.



Edit: What i want specificly is to decrypt that information. What i want for example is to be able to take the text from an event page from facebook search it to find the keywords i want and use those somewhere else.







java html encryption






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 '18 at 15:01









treyBake

3,24431035




3,24431035










asked Nov 21 '18 at 22:28









Thodoris YdraiosThodoris Ydraios

334




334








  • 2





    Are you looking for an answer other than "decrypt the file"? Those symbols are the encrypted file (bits in memory) being read in as text. They look like nonsense because they are the text representation of encrypted data which is basically random 1's and 0's. You cannot get prettier text because that prettier text would not be the text representation of the same data. If you are looking for something other than "decrypt the file" please specify what "a form that I can use" means

    – MyStackRunnethOver
    Nov 21 '18 at 22:50






  • 1





    The reason you're getting back nonsense is that you're opening a raw stream to an HTTPS URL. Since it's HTTPS, the contents of the stream are encrypted. Consider using HttpsURLConnection, which handles the communication for you and just gives you back the decrypted content. Here's an example: mkyong.com/java/java-https-client-httpsurlconnection-example

    – ethan.roday
    Nov 21 '18 at 23:37






  • 1





    That looks like a zipped rsponse. Why don't you use Jsoup to request the page? I think it decodes the response data by default.

    – t.m.adam
    Nov 22 '18 at 1:34








  • 1





    @err1100: No, openStream gives the decrypted data after SSL record processing.

    – James K Polk
    Nov 22 '18 at 1:59






  • 3





    This is all wrong. As @t.m.adam notes, the page is gzipped, so it can't be read using any Reader. Reader is meant for character streams, not the binary data that you get from gzipping text.

    – James K Polk
    Nov 22 '18 at 2:35
















  • 2





    Are you looking for an answer other than "decrypt the file"? Those symbols are the encrypted file (bits in memory) being read in as text. They look like nonsense because they are the text representation of encrypted data which is basically random 1's and 0's. You cannot get prettier text because that prettier text would not be the text representation of the same data. If you are looking for something other than "decrypt the file" please specify what "a form that I can use" means

    – MyStackRunnethOver
    Nov 21 '18 at 22:50






  • 1





    The reason you're getting back nonsense is that you're opening a raw stream to an HTTPS URL. Since it's HTTPS, the contents of the stream are encrypted. Consider using HttpsURLConnection, which handles the communication for you and just gives you back the decrypted content. Here's an example: mkyong.com/java/java-https-client-httpsurlconnection-example

    – ethan.roday
    Nov 21 '18 at 23:37






  • 1





    That looks like a zipped rsponse. Why don't you use Jsoup to request the page? I think it decodes the response data by default.

    – t.m.adam
    Nov 22 '18 at 1:34








  • 1





    @err1100: No, openStream gives the decrypted data after SSL record processing.

    – James K Polk
    Nov 22 '18 at 1:59






  • 3





    This is all wrong. As @t.m.adam notes, the page is gzipped, so it can't be read using any Reader. Reader is meant for character streams, not the binary data that you get from gzipping text.

    – James K Polk
    Nov 22 '18 at 2:35










2




2





Are you looking for an answer other than "decrypt the file"? Those symbols are the encrypted file (bits in memory) being read in as text. They look like nonsense because they are the text representation of encrypted data which is basically random 1's and 0's. You cannot get prettier text because that prettier text would not be the text representation of the same data. If you are looking for something other than "decrypt the file" please specify what "a form that I can use" means

– MyStackRunnethOver
Nov 21 '18 at 22:50





Are you looking for an answer other than "decrypt the file"? Those symbols are the encrypted file (bits in memory) being read in as text. They look like nonsense because they are the text representation of encrypted data which is basically random 1's and 0's. You cannot get prettier text because that prettier text would not be the text representation of the same data. If you are looking for something other than "decrypt the file" please specify what "a form that I can use" means

– MyStackRunnethOver
Nov 21 '18 at 22:50




1




1





The reason you're getting back nonsense is that you're opening a raw stream to an HTTPS URL. Since it's HTTPS, the contents of the stream are encrypted. Consider using HttpsURLConnection, which handles the communication for you and just gives you back the decrypted content. Here's an example: mkyong.com/java/java-https-client-httpsurlconnection-example

– ethan.roday
Nov 21 '18 at 23:37





The reason you're getting back nonsense is that you're opening a raw stream to an HTTPS URL. Since it's HTTPS, the contents of the stream are encrypted. Consider using HttpsURLConnection, which handles the communication for you and just gives you back the decrypted content. Here's an example: mkyong.com/java/java-https-client-httpsurlconnection-example

– ethan.roday
Nov 21 '18 at 23:37




1




1





That looks like a zipped rsponse. Why don't you use Jsoup to request the page? I think it decodes the response data by default.

– t.m.adam
Nov 22 '18 at 1:34







That looks like a zipped rsponse. Why don't you use Jsoup to request the page? I think it decodes the response data by default.

– t.m.adam
Nov 22 '18 at 1:34






1




1





@err1100: No, openStream gives the decrypted data after SSL record processing.

– James K Polk
Nov 22 '18 at 1:59





@err1100: No, openStream gives the decrypted data after SSL record processing.

– James K Polk
Nov 22 '18 at 1:59




3




3





This is all wrong. As @t.m.adam notes, the page is gzipped, so it can't be read using any Reader. Reader is meant for character streams, not the binary data that you get from gzipping text.

– James K Polk
Nov 22 '18 at 2:35







This is all wrong. As @t.m.adam notes, the page is gzipped, so it can't be read using any Reader. Reader is meant for character streams, not the binary data that you get from gzipping text.

– James K Polk
Nov 22 '18 at 2:35














1 Answer
1






active

oldest

votes


















4














As @t.m.adam noted in his comment, the problem is that the response from stream is gzipped (compressed). So, if you want to read it from the URL stream, you need to pass it through a GZIPInputStream before InputStreamReader (see this answer). Alternatively, as @t.m.adam suggests, you can use Jsoup's built-in connect() method:



import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class UrlReaderTest {
public static void main(String args) {
System.out.println(System.getProperty("java.classpath"));
try {
Document doc = Jsoup.connect("https://www.amazon.com").get();
System.out.print(doc.text());
}
catch (IOException e) {
System.err.println("Error");
}

}
}





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53421332%2fif-i-parse-an-encrypted-html-file-to-a-string-can-i-somehow-obtain-the-text-from%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    4














    As @t.m.adam noted in his comment, the problem is that the response from stream is gzipped (compressed). So, if you want to read it from the URL stream, you need to pass it through a GZIPInputStream before InputStreamReader (see this answer). Alternatively, as @t.m.adam suggests, you can use Jsoup's built-in connect() method:



    import java.io.IOException;

    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;

    public class UrlReaderTest {
    public static void main(String args) {
    System.out.println(System.getProperty("java.classpath"));
    try {
    Document doc = Jsoup.connect("https://www.amazon.com").get();
    System.out.print(doc.text());
    }
    catch (IOException e) {
    System.err.println("Error");
    }

    }
    }





    share|improve this answer




























      4














      As @t.m.adam noted in his comment, the problem is that the response from stream is gzipped (compressed). So, if you want to read it from the URL stream, you need to pass it through a GZIPInputStream before InputStreamReader (see this answer). Alternatively, as @t.m.adam suggests, you can use Jsoup's built-in connect() method:



      import java.io.IOException;

      import org.jsoup.Jsoup;
      import org.jsoup.nodes.Document;

      public class UrlReaderTest {
      public static void main(String args) {
      System.out.println(System.getProperty("java.classpath"));
      try {
      Document doc = Jsoup.connect("https://www.amazon.com").get();
      System.out.print(doc.text());
      }
      catch (IOException e) {
      System.err.println("Error");
      }

      }
      }





      share|improve this answer


























        4












        4








        4







        As @t.m.adam noted in his comment, the problem is that the response from stream is gzipped (compressed). So, if you want to read it from the URL stream, you need to pass it through a GZIPInputStream before InputStreamReader (see this answer). Alternatively, as @t.m.adam suggests, you can use Jsoup's built-in connect() method:



        import java.io.IOException;

        import org.jsoup.Jsoup;
        import org.jsoup.nodes.Document;

        public class UrlReaderTest {
        public static void main(String args) {
        System.out.println(System.getProperty("java.classpath"));
        try {
        Document doc = Jsoup.connect("https://www.amazon.com").get();
        System.out.print(doc.text());
        }
        catch (IOException e) {
        System.err.println("Error");
        }

        }
        }





        share|improve this answer













        As @t.m.adam noted in his comment, the problem is that the response from stream is gzipped (compressed). So, if you want to read it from the URL stream, you need to pass it through a GZIPInputStream before InputStreamReader (see this answer). Alternatively, as @t.m.adam suggests, you can use Jsoup's built-in connect() method:



        import java.io.IOException;

        import org.jsoup.Jsoup;
        import org.jsoup.nodes.Document;

        public class UrlReaderTest {
        public static void main(String args) {
        System.out.println(System.getProperty("java.classpath"));
        try {
        Document doc = Jsoup.connect("https://www.amazon.com").get();
        System.out.print(doc.text());
        }
        catch (IOException e) {
        System.err.println("Error");
        }

        }
        }






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 23 '18 at 15:07









        ethan.rodayethan.roday

        1,112719




        1,112719
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53421332%2fif-i-parse-an-encrypted-html-file-to-a-string-can-i-somehow-obtain-the-text-from%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

            Alcedinidae

            Origin of the phrase “under your belt”?