If i parse an encrypted html file to a string can i somehow obtain the text from it?
import java.net.*;
import java.io.*;
import org.jsoup.Jsoup;
import org.jsoup.helper.Validate;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class UrlReaderTest {
public static void main(String args) throws Exception {
URL url = new URL("https://www.amazon.com/");
String s = null;
StringBuilder contentBuilder = new StringBuilder();
try {
BufferedReader in = new BufferedReader(new
InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null) {
contentBuilder.append(str);
}
in.close();
} catch (IOException e) {
System.err.println("Error");
}
s = contentBuilder.toString();
Document document = Jsoup.parse(s);
System.out.println(document.text());
}
}
What i am getting has mainly symbols like these: Η1?0 Π??0ή=tθ Jr?/β@Q? l?r{ΪεI/ ΉΟ~νJ?j?Ά-??ΙiLs?YdHλ²ύ?α?η?ογV"ηw[:?0??νSQψyθ?*²?γpI? ??²ρνl???2JμΚ?ΣS?Αl4ςRΛKR545υ?SK
Is there anything i can do to transform that in a form that i can use?
I can't find something specific online.
Edit: What i want specificly is to decrypt that information. What i want for example is to be able to take the text from an event page from facebook search it to find the keywords i want and use those somewhere else.
java html encryption
|
show 3 more comments
import java.net.*;
import java.io.*;
import org.jsoup.Jsoup;
import org.jsoup.helper.Validate;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class UrlReaderTest {
public static void main(String args) throws Exception {
URL url = new URL("https://www.amazon.com/");
String s = null;
StringBuilder contentBuilder = new StringBuilder();
try {
BufferedReader in = new BufferedReader(new
InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null) {
contentBuilder.append(str);
}
in.close();
} catch (IOException e) {
System.err.println("Error");
}
s = contentBuilder.toString();
Document document = Jsoup.parse(s);
System.out.println(document.text());
}
}
What i am getting has mainly symbols like these: Η1?0 Π??0ή=tθ Jr?/β@Q? l?r{ΪεI/ ΉΟ~νJ?j?Ά-??ΙiLs?YdHλ²ύ?α?η?ογV"ηw[:?0??νSQψyθ?*²?γpI? ??²ρνl???2JμΚ?ΣS?Αl4ςRΛKR545υ?SK
Is there anything i can do to transform that in a form that i can use?
I can't find something specific online.
Edit: What i want specificly is to decrypt that information. What i want for example is to be able to take the text from an event page from facebook search it to find the keywords i want and use those somewhere else.
java html encryption
2
Are you looking for an answer other than "decrypt the file"? Those symbols are the encrypted file (bits in memory) being read in as text. They look like nonsense because they are the text representation of encrypted data which is basically random 1's and 0's. You cannot get prettier text because that prettier text would not be the text representation of the same data. If you are looking for something other than "decrypt the file" please specify what "a form that I can use" means
– MyStackRunnethOver
Nov 21 '18 at 22:50
1
The reason you're getting back nonsense is that you're opening a raw stream to an HTTPS URL. Since it's HTTPS, the contents of the stream are encrypted. Consider usingHttpsURLConnection
, which handles the communication for you and just gives you back the decrypted content. Here's an example: mkyong.com/java/java-https-client-httpsurlconnection-example
– ethan.roday
Nov 21 '18 at 23:37
1
That looks like a zipped rsponse. Why don't you useJsoup
to request the page? I think it decodes the response data by default.
– t.m.adam
Nov 22 '18 at 1:34
1
@err1100: No, openStream gives the decrypted data after SSL record processing.
– James K Polk
Nov 22 '18 at 1:59
3
This is all wrong. As @t.m.adam notes, the page is gzipped, so it can't be read using anyReader
.Reader
is meant for character streams, not the binary data that you get from gzipping text.
– James K Polk
Nov 22 '18 at 2:35
|
show 3 more comments
import java.net.*;
import java.io.*;
import org.jsoup.Jsoup;
import org.jsoup.helper.Validate;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class UrlReaderTest {
public static void main(String args) throws Exception {
URL url = new URL("https://www.amazon.com/");
String s = null;
StringBuilder contentBuilder = new StringBuilder();
try {
BufferedReader in = new BufferedReader(new
InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null) {
contentBuilder.append(str);
}
in.close();
} catch (IOException e) {
System.err.println("Error");
}
s = contentBuilder.toString();
Document document = Jsoup.parse(s);
System.out.println(document.text());
}
}
What i am getting has mainly symbols like these: Η1?0 Π??0ή=tθ Jr?/β@Q? l?r{ΪεI/ ΉΟ~νJ?j?Ά-??ΙiLs?YdHλ²ύ?α?η?ογV"ηw[:?0??νSQψyθ?*²?γpI? ??²ρνl???2JμΚ?ΣS?Αl4ςRΛKR545υ?SK
Is there anything i can do to transform that in a form that i can use?
I can't find something specific online.
Edit: What i want specificly is to decrypt that information. What i want for example is to be able to take the text from an event page from facebook search it to find the keywords i want and use those somewhere else.
java html encryption
import java.net.*;
import java.io.*;
import org.jsoup.Jsoup;
import org.jsoup.helper.Validate;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class UrlReaderTest {
public static void main(String args) throws Exception {
URL url = new URL("https://www.amazon.com/");
String s = null;
StringBuilder contentBuilder = new StringBuilder();
try {
BufferedReader in = new BufferedReader(new
InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null) {
contentBuilder.append(str);
}
in.close();
} catch (IOException e) {
System.err.println("Error");
}
s = contentBuilder.toString();
Document document = Jsoup.parse(s);
System.out.println(document.text());
}
}
What i am getting has mainly symbols like these: Η1?0 Π??0ή=tθ Jr?/β@Q? l?r{ΪεI/ ΉΟ~νJ?j?Ά-??ΙiLs?YdHλ²ύ?α?η?ογV"ηw[:?0??νSQψyθ?*²?γpI? ??²ρνl???2JμΚ?ΣS?Αl4ςRΛKR545υ?SK
Is there anything i can do to transform that in a form that i can use?
I can't find something specific online.
Edit: What i want specificly is to decrypt that information. What i want for example is to be able to take the text from an event page from facebook search it to find the keywords i want and use those somewhere else.
java html encryption
java html encryption
edited Nov 22 '18 at 15:01
treyBake
3,24431035
3,24431035
asked Nov 21 '18 at 22:28
Thodoris YdraiosThodoris Ydraios
334
334
2
Are you looking for an answer other than "decrypt the file"? Those symbols are the encrypted file (bits in memory) being read in as text. They look like nonsense because they are the text representation of encrypted data which is basically random 1's and 0's. You cannot get prettier text because that prettier text would not be the text representation of the same data. If you are looking for something other than "decrypt the file" please specify what "a form that I can use" means
– MyStackRunnethOver
Nov 21 '18 at 22:50
1
The reason you're getting back nonsense is that you're opening a raw stream to an HTTPS URL. Since it's HTTPS, the contents of the stream are encrypted. Consider usingHttpsURLConnection
, which handles the communication for you and just gives you back the decrypted content. Here's an example: mkyong.com/java/java-https-client-httpsurlconnection-example
– ethan.roday
Nov 21 '18 at 23:37
1
That looks like a zipped rsponse. Why don't you useJsoup
to request the page? I think it decodes the response data by default.
– t.m.adam
Nov 22 '18 at 1:34
1
@err1100: No, openStream gives the decrypted data after SSL record processing.
– James K Polk
Nov 22 '18 at 1:59
3
This is all wrong. As @t.m.adam notes, the page is gzipped, so it can't be read using anyReader
.Reader
is meant for character streams, not the binary data that you get from gzipping text.
– James K Polk
Nov 22 '18 at 2:35
|
show 3 more comments
2
Are you looking for an answer other than "decrypt the file"? Those symbols are the encrypted file (bits in memory) being read in as text. They look like nonsense because they are the text representation of encrypted data which is basically random 1's and 0's. You cannot get prettier text because that prettier text would not be the text representation of the same data. If you are looking for something other than "decrypt the file" please specify what "a form that I can use" means
– MyStackRunnethOver
Nov 21 '18 at 22:50
1
The reason you're getting back nonsense is that you're opening a raw stream to an HTTPS URL. Since it's HTTPS, the contents of the stream are encrypted. Consider usingHttpsURLConnection
, which handles the communication for you and just gives you back the decrypted content. Here's an example: mkyong.com/java/java-https-client-httpsurlconnection-example
– ethan.roday
Nov 21 '18 at 23:37
1
That looks like a zipped rsponse. Why don't you useJsoup
to request the page? I think it decodes the response data by default.
– t.m.adam
Nov 22 '18 at 1:34
1
@err1100: No, openStream gives the decrypted data after SSL record processing.
– James K Polk
Nov 22 '18 at 1:59
3
This is all wrong. As @t.m.adam notes, the page is gzipped, so it can't be read using anyReader
.Reader
is meant for character streams, not the binary data that you get from gzipping text.
– James K Polk
Nov 22 '18 at 2:35
2
2
Are you looking for an answer other than "decrypt the file"? Those symbols are the encrypted file (bits in memory) being read in as text. They look like nonsense because they are the text representation of encrypted data which is basically random 1's and 0's. You cannot get prettier text because that prettier text would not be the text representation of the same data. If you are looking for something other than "decrypt the file" please specify what "a form that I can use" means
– MyStackRunnethOver
Nov 21 '18 at 22:50
Are you looking for an answer other than "decrypt the file"? Those symbols are the encrypted file (bits in memory) being read in as text. They look like nonsense because they are the text representation of encrypted data which is basically random 1's and 0's. You cannot get prettier text because that prettier text would not be the text representation of the same data. If you are looking for something other than "decrypt the file" please specify what "a form that I can use" means
– MyStackRunnethOver
Nov 21 '18 at 22:50
1
1
The reason you're getting back nonsense is that you're opening a raw stream to an HTTPS URL. Since it's HTTPS, the contents of the stream are encrypted. Consider using
HttpsURLConnection
, which handles the communication for you and just gives you back the decrypted content. Here's an example: mkyong.com/java/java-https-client-httpsurlconnection-example– ethan.roday
Nov 21 '18 at 23:37
The reason you're getting back nonsense is that you're opening a raw stream to an HTTPS URL. Since it's HTTPS, the contents of the stream are encrypted. Consider using
HttpsURLConnection
, which handles the communication for you and just gives you back the decrypted content. Here's an example: mkyong.com/java/java-https-client-httpsurlconnection-example– ethan.roday
Nov 21 '18 at 23:37
1
1
That looks like a zipped rsponse. Why don't you use
Jsoup
to request the page? I think it decodes the response data by default.– t.m.adam
Nov 22 '18 at 1:34
That looks like a zipped rsponse. Why don't you use
Jsoup
to request the page? I think it decodes the response data by default.– t.m.adam
Nov 22 '18 at 1:34
1
1
@err1100: No, openStream gives the decrypted data after SSL record processing.
– James K Polk
Nov 22 '18 at 1:59
@err1100: No, openStream gives the decrypted data after SSL record processing.
– James K Polk
Nov 22 '18 at 1:59
3
3
This is all wrong. As @t.m.adam notes, the page is gzipped, so it can't be read using any
Reader
. Reader
is meant for character streams, not the binary data that you get from gzipping text.– James K Polk
Nov 22 '18 at 2:35
This is all wrong. As @t.m.adam notes, the page is gzipped, so it can't be read using any
Reader
. Reader
is meant for character streams, not the binary data that you get from gzipping text.– James K Polk
Nov 22 '18 at 2:35
|
show 3 more comments
1 Answer
1
active
oldest
votes
As @t.m.adam noted in his comment, the problem is that the response from stream is gzipped (compressed). So, if you want to read it from the URL stream, you need to pass it through a GZIPInputStream
before InputStreamReader
(see this answer). Alternatively, as @t.m.adam suggests, you can use Jsoup's built-in connect()
method:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class UrlReaderTest {
public static void main(String args) {
System.out.println(System.getProperty("java.classpath"));
try {
Document doc = Jsoup.connect("https://www.amazon.com").get();
System.out.print(doc.text());
}
catch (IOException e) {
System.err.println("Error");
}
}
}
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53421332%2fif-i-parse-an-encrypted-html-file-to-a-string-can-i-somehow-obtain-the-text-from%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
As @t.m.adam noted in his comment, the problem is that the response from stream is gzipped (compressed). So, if you want to read it from the URL stream, you need to pass it through a GZIPInputStream
before InputStreamReader
(see this answer). Alternatively, as @t.m.adam suggests, you can use Jsoup's built-in connect()
method:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class UrlReaderTest {
public static void main(String args) {
System.out.println(System.getProperty("java.classpath"));
try {
Document doc = Jsoup.connect("https://www.amazon.com").get();
System.out.print(doc.text());
}
catch (IOException e) {
System.err.println("Error");
}
}
}
add a comment |
As @t.m.adam noted in his comment, the problem is that the response from stream is gzipped (compressed). So, if you want to read it from the URL stream, you need to pass it through a GZIPInputStream
before InputStreamReader
(see this answer). Alternatively, as @t.m.adam suggests, you can use Jsoup's built-in connect()
method:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class UrlReaderTest {
public static void main(String args) {
System.out.println(System.getProperty("java.classpath"));
try {
Document doc = Jsoup.connect("https://www.amazon.com").get();
System.out.print(doc.text());
}
catch (IOException e) {
System.err.println("Error");
}
}
}
add a comment |
As @t.m.adam noted in his comment, the problem is that the response from stream is gzipped (compressed). So, if you want to read it from the URL stream, you need to pass it through a GZIPInputStream
before InputStreamReader
(see this answer). Alternatively, as @t.m.adam suggests, you can use Jsoup's built-in connect()
method:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class UrlReaderTest {
public static void main(String args) {
System.out.println(System.getProperty("java.classpath"));
try {
Document doc = Jsoup.connect("https://www.amazon.com").get();
System.out.print(doc.text());
}
catch (IOException e) {
System.err.println("Error");
}
}
}
As @t.m.adam noted in his comment, the problem is that the response from stream is gzipped (compressed). So, if you want to read it from the URL stream, you need to pass it through a GZIPInputStream
before InputStreamReader
(see this answer). Alternatively, as @t.m.adam suggests, you can use Jsoup's built-in connect()
method:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class UrlReaderTest {
public static void main(String args) {
System.out.println(System.getProperty("java.classpath"));
try {
Document doc = Jsoup.connect("https://www.amazon.com").get();
System.out.print(doc.text());
}
catch (IOException e) {
System.err.println("Error");
}
}
}
answered Nov 23 '18 at 15:07
ethan.rodayethan.roday
1,112719
1,112719
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53421332%2fif-i-parse-an-encrypted-html-file-to-a-string-can-i-somehow-obtain-the-text-from%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Are you looking for an answer other than "decrypt the file"? Those symbols are the encrypted file (bits in memory) being read in as text. They look like nonsense because they are the text representation of encrypted data which is basically random 1's and 0's. You cannot get prettier text because that prettier text would not be the text representation of the same data. If you are looking for something other than "decrypt the file" please specify what "a form that I can use" means
– MyStackRunnethOver
Nov 21 '18 at 22:50
1
The reason you're getting back nonsense is that you're opening a raw stream to an HTTPS URL. Since it's HTTPS, the contents of the stream are encrypted. Consider using
HttpsURLConnection
, which handles the communication for you and just gives you back the decrypted content. Here's an example: mkyong.com/java/java-https-client-httpsurlconnection-example– ethan.roday
Nov 21 '18 at 23:37
1
That looks like a zipped rsponse. Why don't you use
Jsoup
to request the page? I think it decodes the response data by default.– t.m.adam
Nov 22 '18 at 1:34
1
@err1100: No, openStream gives the decrypted data after SSL record processing.
– James K Polk
Nov 22 '18 at 1:59
3
This is all wrong. As @t.m.adam notes, the page is gzipped, so it can't be read using any
Reader
.Reader
is meant for character streams, not the binary data that you get from gzipping text.– James K Polk
Nov 22 '18 at 2:35