Create words' stream using scanner
There is needed to return a stream of all words that have 3letters and more from a file. Is there better way then following, maybe using Stream.iterate:
private Stream<String> getWordsStream(String path){
Stream.Builder<String> wordsStream = Stream.builder();
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
s.useDelimiter("([^a-zA-Z])");
Pattern pattern = Pattern.compile("([a-zA-Z]{3,})");
while ((s.hasNext())){
if(s.hasNext(pattern)){
wordsStream.add(s.next().toUpperCase());
}
else {
s.next();
}
}
s.close();
return wordsStream.build();
}
java loops java-stream builder word
add a comment |
There is needed to return a stream of all words that have 3letters and more from a file. Is there better way then following, maybe using Stream.iterate:
private Stream<String> getWordsStream(String path){
Stream.Builder<String> wordsStream = Stream.builder();
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
s.useDelimiter("([^a-zA-Z])");
Pattern pattern = Pattern.compile("([a-zA-Z]{3,})");
while ((s.hasNext())){
if(s.hasNext(pattern)){
wordsStream.add(s.next().toUpperCase());
}
else {
s.next();
}
}
s.close();
return wordsStream.build();
}
java loops java-stream builder word
Which Java version?
– shmosel
Nov 19 at 21:48
Did you mean to calls.next(pattern)
?
– shmosel
Nov 19 at 21:53
Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.
– PhaseRush
Nov 19 at 22:04
Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all
– a_chubenko
Nov 19 at 22:06
add a comment |
There is needed to return a stream of all words that have 3letters and more from a file. Is there better way then following, maybe using Stream.iterate:
private Stream<String> getWordsStream(String path){
Stream.Builder<String> wordsStream = Stream.builder();
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
s.useDelimiter("([^a-zA-Z])");
Pattern pattern = Pattern.compile("([a-zA-Z]{3,})");
while ((s.hasNext())){
if(s.hasNext(pattern)){
wordsStream.add(s.next().toUpperCase());
}
else {
s.next();
}
}
s.close();
return wordsStream.build();
}
java loops java-stream builder word
There is needed to return a stream of all words that have 3letters and more from a file. Is there better way then following, maybe using Stream.iterate:
private Stream<String> getWordsStream(String path){
Stream.Builder<String> wordsStream = Stream.builder();
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
s.useDelimiter("([^a-zA-Z])");
Pattern pattern = Pattern.compile("([a-zA-Z]{3,})");
while ((s.hasNext())){
if(s.hasNext(pattern)){
wordsStream.add(s.next().toUpperCase());
}
else {
s.next();
}
}
s.close();
return wordsStream.build();
}
java loops java-stream builder word
java loops java-stream builder word
edited Nov 20 at 15:49
asked Nov 19 at 21:43
a_chubenko
375
375
Which Java version?
– shmosel
Nov 19 at 21:48
Did you mean to calls.next(pattern)
?
– shmosel
Nov 19 at 21:53
Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.
– PhaseRush
Nov 19 at 22:04
Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all
– a_chubenko
Nov 19 at 22:06
add a comment |
Which Java version?
– shmosel
Nov 19 at 21:48
Did you mean to calls.next(pattern)
?
– shmosel
Nov 19 at 21:53
Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.
– PhaseRush
Nov 19 at 22:04
Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all
– a_chubenko
Nov 19 at 22:06
Which Java version?
– shmosel
Nov 19 at 21:48
Which Java version?
– shmosel
Nov 19 at 21:48
Did you mean to call
s.next(pattern)
?– shmosel
Nov 19 at 21:53
Did you mean to call
s.next(pattern)
?– shmosel
Nov 19 at 21:53
Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.
– PhaseRush
Nov 19 at 22:04
Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.
– PhaseRush
Nov 19 at 22:04
Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all
– a_chubenko
Nov 19 at 22:06
Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all
– a_chubenko
Nov 19 at 22:06
add a comment |
3 Answers
3
active
oldest
votes
You can use Files.lines()
and a Pattern
:
private static final Pattern SPACES = Pattern.compile("[^a-zA-Z]+");
public static Stream<String> getWordStream(String path) throws IOException{
return Files.lines(Paths.get(path))
.flatMap(SPACES::splitAsStream)
.filter(word -> word.length() >= 3);
}
There is tested for a book with 105 K words. This method is the fastest, took 0.29s.
– a_chubenko
Nov 20 at 12:05
1
@Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.
– Holger
Nov 20 at 12:07
Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");
– a_chubenko
Nov 20 at 12:10
@Alex You should probably usePattern.compile("[^a-zA-Z]+")
(notice the+
at the end). So you won't get "empty" words, e.g a text like:"I have 100 dollars"
would produce an array:["I", "have", "", "", "", "", "dollars"]
with the current pattern
– Lino
Nov 20 at 12:19
add a comment |
The worst part of your code is the following part
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
So when the file is absent, you will print the FileNotFoundException
stack trace and proceed with a null
input stream, leading to a NullPointerException
. Instead of requiring the caller to deal with a spurious NullPointerException
, you should declare the FileNotFoundException
in the method signature. Otherwise, return an empty stream in the erroneous case.
But you don’t need to contruct a FileInputStream
at all, as Scanner
offers constructors accepting a File
or Path
. Combine this with the capability of returning a stream of matches (since Java 9) and you get:
private Stream<String> getWordsStream(String path) {
try {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
} catch(IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
return Stream.empty();
}
}
or preferably
private Stream<String> getWordsStream(String path) throws IOException {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
}
You don’t even need .useDelimiter("([^a-zA-Z])")
here, as skipping all nonmatching stuff is the default behavior.
Closing the returned Stream
will also close the Scanner
.
So the caller should use it like this
try(Stream<String> s = getWordsStream("path/to/file")) {
s.forEach(System.out::println);
}
There is tested for a book with 105 K words. This method took about 0.6s.
– a_chubenko
Nov 20 at 12:06
add a comment |
Thre're much easier approach: read lines from file to the Stream
and filter it with required condition (e.g. length >= 3). Files.lines()
has lazy loading, so it does not ready all words from the file at the beginning, it does it every time when next word is required
public static void main(String... args) throws IOException {
getWordsStream(Paths.get("d:/words.txt")).forEach(System.out::println);
}
public static Stream<String> getWordsStream(Path path) throws IOException {
final Scanner scan = new Scanner(path);
return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(Long.MAX_VALUE,
Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(Consumer<? super String> action) {
while (scan.hasNext()) {
String word = scan.next();
// you can use RegExp if you have more complicated condition
if (word.length() < 3)
continue;
action.accept(word);
return true;
}
return false;
}
}, false).onClose(scan::close);
}
1
andlines()
will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines
– Carlos Heuberger
Nov 19 at 22:17
It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.
– a_chubenko
Nov 19 at 22:29
1
well, actuallylines()
returns all lines, not only the ones with more than 3 letter. It'sgetWordsStream()
that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines
– Carlos Heuberger
Nov 20 at 1:05
With java-11, you need not to change the signature of that method and can useFiles.lines(Path.of(path))
instead.
– nullpointer
Nov 20 at 2:12
Fixed. Same lazy load approach. Pease a cake.
– oleg.cherednik
Nov 20 at 5:22
|
show 3 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383064%2fcreate-words-stream-using-scanner%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use Files.lines()
and a Pattern
:
private static final Pattern SPACES = Pattern.compile("[^a-zA-Z]+");
public static Stream<String> getWordStream(String path) throws IOException{
return Files.lines(Paths.get(path))
.flatMap(SPACES::splitAsStream)
.filter(word -> word.length() >= 3);
}
There is tested for a book with 105 K words. This method is the fastest, took 0.29s.
– a_chubenko
Nov 20 at 12:05
1
@Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.
– Holger
Nov 20 at 12:07
Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");
– a_chubenko
Nov 20 at 12:10
@Alex You should probably usePattern.compile("[^a-zA-Z]+")
(notice the+
at the end). So you won't get "empty" words, e.g a text like:"I have 100 dollars"
would produce an array:["I", "have", "", "", "", "", "dollars"]
with the current pattern
– Lino
Nov 20 at 12:19
add a comment |
You can use Files.lines()
and a Pattern
:
private static final Pattern SPACES = Pattern.compile("[^a-zA-Z]+");
public static Stream<String> getWordStream(String path) throws IOException{
return Files.lines(Paths.get(path))
.flatMap(SPACES::splitAsStream)
.filter(word -> word.length() >= 3);
}
There is tested for a book with 105 K words. This method is the fastest, took 0.29s.
– a_chubenko
Nov 20 at 12:05
1
@Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.
– Holger
Nov 20 at 12:07
Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");
– a_chubenko
Nov 20 at 12:10
@Alex You should probably usePattern.compile("[^a-zA-Z]+")
(notice the+
at the end). So you won't get "empty" words, e.g a text like:"I have 100 dollars"
would produce an array:["I", "have", "", "", "", "", "dollars"]
with the current pattern
– Lino
Nov 20 at 12:19
add a comment |
You can use Files.lines()
and a Pattern
:
private static final Pattern SPACES = Pattern.compile("[^a-zA-Z]+");
public static Stream<String> getWordStream(String path) throws IOException{
return Files.lines(Paths.get(path))
.flatMap(SPACES::splitAsStream)
.filter(word -> word.length() >= 3);
}
You can use Files.lines()
and a Pattern
:
private static final Pattern SPACES = Pattern.compile("[^a-zA-Z]+");
public static Stream<String> getWordStream(String path) throws IOException{
return Files.lines(Paths.get(path))
.flatMap(SPACES::splitAsStream)
.filter(word -> word.length() >= 3);
}
edited Nov 20 at 12:43
answered Nov 20 at 7:23
Lino
7,05021936
7,05021936
There is tested for a book with 105 K words. This method is the fastest, took 0.29s.
– a_chubenko
Nov 20 at 12:05
1
@Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.
– Holger
Nov 20 at 12:07
Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");
– a_chubenko
Nov 20 at 12:10
@Alex You should probably usePattern.compile("[^a-zA-Z]+")
(notice the+
at the end). So you won't get "empty" words, e.g a text like:"I have 100 dollars"
would produce an array:["I", "have", "", "", "", "", "dollars"]
with the current pattern
– Lino
Nov 20 at 12:19
add a comment |
There is tested for a book with 105 K words. This method is the fastest, took 0.29s.
– a_chubenko
Nov 20 at 12:05
1
@Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.
– Holger
Nov 20 at 12:07
Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");
– a_chubenko
Nov 20 at 12:10
@Alex You should probably usePattern.compile("[^a-zA-Z]+")
(notice the+
at the end). So you won't get "empty" words, e.g a text like:"I have 100 dollars"
would produce an array:["I", "have", "", "", "", "", "dollars"]
with the current pattern
– Lino
Nov 20 at 12:19
There is tested for a book with 105 K words. This method is the fastest, took 0.29s.
– a_chubenko
Nov 20 at 12:05
There is tested for a book with 105 K words. This method is the fastest, took 0.29s.
– a_chubenko
Nov 20 at 12:05
1
1
@Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.
– Holger
Nov 20 at 12:07
@Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.
– Holger
Nov 20 at 12:07
Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");
– a_chubenko
Nov 20 at 12:10
Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");
– a_chubenko
Nov 20 at 12:10
@Alex You should probably use
Pattern.compile("[^a-zA-Z]+")
(notice the +
at the end). So you won't get "empty" words, e.g a text like: "I have 100 dollars"
would produce an array: ["I", "have", "", "", "", "", "dollars"]
with the current pattern– Lino
Nov 20 at 12:19
@Alex You should probably use
Pattern.compile("[^a-zA-Z]+")
(notice the +
at the end). So you won't get "empty" words, e.g a text like: "I have 100 dollars"
would produce an array: ["I", "have", "", "", "", "", "dollars"]
with the current pattern– Lino
Nov 20 at 12:19
add a comment |
The worst part of your code is the following part
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
So when the file is absent, you will print the FileNotFoundException
stack trace and proceed with a null
input stream, leading to a NullPointerException
. Instead of requiring the caller to deal with a spurious NullPointerException
, you should declare the FileNotFoundException
in the method signature. Otherwise, return an empty stream in the erroneous case.
But you don’t need to contruct a FileInputStream
at all, as Scanner
offers constructors accepting a File
or Path
. Combine this with the capability of returning a stream of matches (since Java 9) and you get:
private Stream<String> getWordsStream(String path) {
try {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
} catch(IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
return Stream.empty();
}
}
or preferably
private Stream<String> getWordsStream(String path) throws IOException {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
}
You don’t even need .useDelimiter("([^a-zA-Z])")
here, as skipping all nonmatching stuff is the default behavior.
Closing the returned Stream
will also close the Scanner
.
So the caller should use it like this
try(Stream<String> s = getWordsStream("path/to/file")) {
s.forEach(System.out::println);
}
There is tested for a book with 105 K words. This method took about 0.6s.
– a_chubenko
Nov 20 at 12:06
add a comment |
The worst part of your code is the following part
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
So when the file is absent, you will print the FileNotFoundException
stack trace and proceed with a null
input stream, leading to a NullPointerException
. Instead of requiring the caller to deal with a spurious NullPointerException
, you should declare the FileNotFoundException
in the method signature. Otherwise, return an empty stream in the erroneous case.
But you don’t need to contruct a FileInputStream
at all, as Scanner
offers constructors accepting a File
or Path
. Combine this with the capability of returning a stream of matches (since Java 9) and you get:
private Stream<String> getWordsStream(String path) {
try {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
} catch(IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
return Stream.empty();
}
}
or preferably
private Stream<String> getWordsStream(String path) throws IOException {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
}
You don’t even need .useDelimiter("([^a-zA-Z])")
here, as skipping all nonmatching stuff is the default behavior.
Closing the returned Stream
will also close the Scanner
.
So the caller should use it like this
try(Stream<String> s = getWordsStream("path/to/file")) {
s.forEach(System.out::println);
}
There is tested for a book with 105 K words. This method took about 0.6s.
– a_chubenko
Nov 20 at 12:06
add a comment |
The worst part of your code is the following part
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
So when the file is absent, you will print the FileNotFoundException
stack trace and proceed with a null
input stream, leading to a NullPointerException
. Instead of requiring the caller to deal with a spurious NullPointerException
, you should declare the FileNotFoundException
in the method signature. Otherwise, return an empty stream in the erroneous case.
But you don’t need to contruct a FileInputStream
at all, as Scanner
offers constructors accepting a File
or Path
. Combine this with the capability of returning a stream of matches (since Java 9) and you get:
private Stream<String> getWordsStream(String path) {
try {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
} catch(IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
return Stream.empty();
}
}
or preferably
private Stream<String> getWordsStream(String path) throws IOException {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
}
You don’t even need .useDelimiter("([^a-zA-Z])")
here, as skipping all nonmatching stuff is the default behavior.
Closing the returned Stream
will also close the Scanner
.
So the caller should use it like this
try(Stream<String> s = getWordsStream("path/to/file")) {
s.forEach(System.out::println);
}
The worst part of your code is the following part
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
So when the file is absent, you will print the FileNotFoundException
stack trace and proceed with a null
input stream, leading to a NullPointerException
. Instead of requiring the caller to deal with a spurious NullPointerException
, you should declare the FileNotFoundException
in the method signature. Otherwise, return an empty stream in the erroneous case.
But you don’t need to contruct a FileInputStream
at all, as Scanner
offers constructors accepting a File
or Path
. Combine this with the capability of returning a stream of matches (since Java 9) and you get:
private Stream<String> getWordsStream(String path) {
try {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
} catch(IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
return Stream.empty();
}
}
or preferably
private Stream<String> getWordsStream(String path) throws IOException {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
}
You don’t even need .useDelimiter("([^a-zA-Z])")
here, as skipping all nonmatching stuff is the default behavior.
Closing the returned Stream
will also close the Scanner
.
So the caller should use it like this
try(Stream<String> s = getWordsStream("path/to/file")) {
s.forEach(System.out::println);
}
edited Nov 20 at 8:19
answered Nov 20 at 8:03
Holger
162k23229435
162k23229435
There is tested for a book with 105 K words. This method took about 0.6s.
– a_chubenko
Nov 20 at 12:06
add a comment |
There is tested for a book with 105 K words. This method took about 0.6s.
– a_chubenko
Nov 20 at 12:06
There is tested for a book with 105 K words. This method took about 0.6s.
– a_chubenko
Nov 20 at 12:06
There is tested for a book with 105 K words. This method took about 0.6s.
– a_chubenko
Nov 20 at 12:06
add a comment |
Thre're much easier approach: read lines from file to the Stream
and filter it with required condition (e.g. length >= 3). Files.lines()
has lazy loading, so it does not ready all words from the file at the beginning, it does it every time when next word is required
public static void main(String... args) throws IOException {
getWordsStream(Paths.get("d:/words.txt")).forEach(System.out::println);
}
public static Stream<String> getWordsStream(Path path) throws IOException {
final Scanner scan = new Scanner(path);
return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(Long.MAX_VALUE,
Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(Consumer<? super String> action) {
while (scan.hasNext()) {
String word = scan.next();
// you can use RegExp if you have more complicated condition
if (word.length() < 3)
continue;
action.accept(word);
return true;
}
return false;
}
}, false).onClose(scan::close);
}
1
andlines()
will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines
– Carlos Heuberger
Nov 19 at 22:17
It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.
– a_chubenko
Nov 19 at 22:29
1
well, actuallylines()
returns all lines, not only the ones with more than 3 letter. It'sgetWordsStream()
that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines
– Carlos Heuberger
Nov 20 at 1:05
With java-11, you need not to change the signature of that method and can useFiles.lines(Path.of(path))
instead.
– nullpointer
Nov 20 at 2:12
Fixed. Same lazy load approach. Pease a cake.
– oleg.cherednik
Nov 20 at 5:22
|
show 3 more comments
Thre're much easier approach: read lines from file to the Stream
and filter it with required condition (e.g. length >= 3). Files.lines()
has lazy loading, so it does not ready all words from the file at the beginning, it does it every time when next word is required
public static void main(String... args) throws IOException {
getWordsStream(Paths.get("d:/words.txt")).forEach(System.out::println);
}
public static Stream<String> getWordsStream(Path path) throws IOException {
final Scanner scan = new Scanner(path);
return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(Long.MAX_VALUE,
Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(Consumer<? super String> action) {
while (scan.hasNext()) {
String word = scan.next();
// you can use RegExp if you have more complicated condition
if (word.length() < 3)
continue;
action.accept(word);
return true;
}
return false;
}
}, false).onClose(scan::close);
}
1
andlines()
will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines
– Carlos Heuberger
Nov 19 at 22:17
It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.
– a_chubenko
Nov 19 at 22:29
1
well, actuallylines()
returns all lines, not only the ones with more than 3 letter. It'sgetWordsStream()
that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines
– Carlos Heuberger
Nov 20 at 1:05
With java-11, you need not to change the signature of that method and can useFiles.lines(Path.of(path))
instead.
– nullpointer
Nov 20 at 2:12
Fixed. Same lazy load approach. Pease a cake.
– oleg.cherednik
Nov 20 at 5:22
|
show 3 more comments
Thre're much easier approach: read lines from file to the Stream
and filter it with required condition (e.g. length >= 3). Files.lines()
has lazy loading, so it does not ready all words from the file at the beginning, it does it every time when next word is required
public static void main(String... args) throws IOException {
getWordsStream(Paths.get("d:/words.txt")).forEach(System.out::println);
}
public static Stream<String> getWordsStream(Path path) throws IOException {
final Scanner scan = new Scanner(path);
return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(Long.MAX_VALUE,
Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(Consumer<? super String> action) {
while (scan.hasNext()) {
String word = scan.next();
// you can use RegExp if you have more complicated condition
if (word.length() < 3)
continue;
action.accept(word);
return true;
}
return false;
}
}, false).onClose(scan::close);
}
Thre're much easier approach: read lines from file to the Stream
and filter it with required condition (e.g. length >= 3). Files.lines()
has lazy loading, so it does not ready all words from the file at the beginning, it does it every time when next word is required
public static void main(String... args) throws IOException {
getWordsStream(Paths.get("d:/words.txt")).forEach(System.out::println);
}
public static Stream<String> getWordsStream(Path path) throws IOException {
final Scanner scan = new Scanner(path);
return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(Long.MAX_VALUE,
Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(Consumer<? super String> action) {
while (scan.hasNext()) {
String word = scan.next();
// you can use RegExp if you have more complicated condition
if (word.length() < 3)
continue;
action.accept(word);
return true;
}
return false;
}
}, false).onClose(scan::close);
}
edited Nov 20 at 8:34
answered Nov 19 at 22:04
oleg.cherednik
5,49521017
5,49521017
1
andlines()
will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines
– Carlos Heuberger
Nov 19 at 22:17
It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.
– a_chubenko
Nov 19 at 22:29
1
well, actuallylines()
returns all lines, not only the ones with more than 3 letter. It'sgetWordsStream()
that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines
– Carlos Heuberger
Nov 20 at 1:05
With java-11, you need not to change the signature of that method and can useFiles.lines(Path.of(path))
instead.
– nullpointer
Nov 20 at 2:12
Fixed. Same lazy load approach. Pease a cake.
– oleg.cherednik
Nov 20 at 5:22
|
show 3 more comments
1
andlines()
will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines
– Carlos Heuberger
Nov 19 at 22:17
It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.
– a_chubenko
Nov 19 at 22:29
1
well, actuallylines()
returns all lines, not only the ones with more than 3 letter. It'sgetWordsStream()
that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines
– Carlos Heuberger
Nov 20 at 1:05
With java-11, you need not to change the signature of that method and can useFiles.lines(Path.of(path))
instead.
– nullpointer
Nov 20 at 2:12
Fixed. Same lazy load approach. Pease a cake.
– oleg.cherednik
Nov 20 at 5:22
1
1
and
lines()
will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines– Carlos Heuberger
Nov 19 at 22:17
and
lines()
will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines– Carlos Heuberger
Nov 19 at 22:17
It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.
– a_chubenko
Nov 19 at 22:29
It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.
– a_chubenko
Nov 19 at 22:29
1
1
well, actually
lines()
returns all lines, not only the ones with more than 3 letter. It's getWordsStream()
that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines– Carlos Heuberger
Nov 20 at 1:05
well, actually
lines()
returns all lines, not only the ones with more than 3 letter. It's getWordsStream()
that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines– Carlos Heuberger
Nov 20 at 1:05
With java-11, you need not to change the signature of that method and can use
Files.lines(Path.of(path))
instead.– nullpointer
Nov 20 at 2:12
With java-11, you need not to change the signature of that method and can use
Files.lines(Path.of(path))
instead.– nullpointer
Nov 20 at 2:12
Fixed. Same lazy load approach. Pease a cake.
– oleg.cherednik
Nov 20 at 5:22
Fixed. Same lazy load approach. Pease a cake.
– oleg.cherednik
Nov 20 at 5:22
|
show 3 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383064%2fcreate-words-stream-using-scanner%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Which Java version?
– shmosel
Nov 19 at 21:48
Did you mean to call
s.next(pattern)
?– shmosel
Nov 19 at 21:53
Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.
– PhaseRush
Nov 19 at 22:04
Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all
– a_chubenko
Nov 19 at 22:06