Create words' stream using scanner












1














There is needed to return a stream of all words that have 3letters and more from a file. Is there better way then following, maybe using Stream.iterate:



private Stream<String> getWordsStream(String path){
Stream.Builder<String> wordsStream = Stream.builder();
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
s.useDelimiter("([^a-zA-Z])");
Pattern pattern = Pattern.compile("([a-zA-Z]{3,})");
while ((s.hasNext())){
if(s.hasNext(pattern)){
wordsStream.add(s.next().toUpperCase());
}
else {
s.next();
}
}
s.close();
return wordsStream.build();
}









share|improve this question
























  • Which Java version?
    – shmosel
    Nov 19 at 21:48










  • Did you mean to call s.next(pattern)?
    – shmosel
    Nov 19 at 21:53










  • Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.
    – PhaseRush
    Nov 19 at 22:04










  • Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all
    – a_chubenko
    Nov 19 at 22:06
















1














There is needed to return a stream of all words that have 3letters and more from a file. Is there better way then following, maybe using Stream.iterate:



private Stream<String> getWordsStream(String path){
Stream.Builder<String> wordsStream = Stream.builder();
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
s.useDelimiter("([^a-zA-Z])");
Pattern pattern = Pattern.compile("([a-zA-Z]{3,})");
while ((s.hasNext())){
if(s.hasNext(pattern)){
wordsStream.add(s.next().toUpperCase());
}
else {
s.next();
}
}
s.close();
return wordsStream.build();
}









share|improve this question
























  • Which Java version?
    – shmosel
    Nov 19 at 21:48










  • Did you mean to call s.next(pattern)?
    – shmosel
    Nov 19 at 21:53










  • Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.
    – PhaseRush
    Nov 19 at 22:04










  • Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all
    – a_chubenko
    Nov 19 at 22:06














1












1








1







There is needed to return a stream of all words that have 3letters and more from a file. Is there better way then following, maybe using Stream.iterate:



private Stream<String> getWordsStream(String path){
Stream.Builder<String> wordsStream = Stream.builder();
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
s.useDelimiter("([^a-zA-Z])");
Pattern pattern = Pattern.compile("([a-zA-Z]{3,})");
while ((s.hasNext())){
if(s.hasNext(pattern)){
wordsStream.add(s.next().toUpperCase());
}
else {
s.next();
}
}
s.close();
return wordsStream.build();
}









share|improve this question















There is needed to return a stream of all words that have 3letters and more from a file. Is there better way then following, maybe using Stream.iterate:



private Stream<String> getWordsStream(String path){
Stream.Builder<String> wordsStream = Stream.builder();
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
s.useDelimiter("([^a-zA-Z])");
Pattern pattern = Pattern.compile("([a-zA-Z]{3,})");
while ((s.hasNext())){
if(s.hasNext(pattern)){
wordsStream.add(s.next().toUpperCase());
}
else {
s.next();
}
}
s.close();
return wordsStream.build();
}






java loops java-stream builder word






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 at 15:49

























asked Nov 19 at 21:43









a_chubenko

375




375












  • Which Java version?
    – shmosel
    Nov 19 at 21:48










  • Did you mean to call s.next(pattern)?
    – shmosel
    Nov 19 at 21:53










  • Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.
    – PhaseRush
    Nov 19 at 22:04










  • Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all
    – a_chubenko
    Nov 19 at 22:06


















  • Which Java version?
    – shmosel
    Nov 19 at 21:48










  • Did you mean to call s.next(pattern)?
    – shmosel
    Nov 19 at 21:53










  • Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.
    – PhaseRush
    Nov 19 at 22:04










  • Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all
    – a_chubenko
    Nov 19 at 22:06
















Which Java version?
– shmosel
Nov 19 at 21:48




Which Java version?
– shmosel
Nov 19 at 21:48












Did you mean to call s.next(pattern)?
– shmosel
Nov 19 at 21:53




Did you mean to call s.next(pattern)?
– shmosel
Nov 19 at 21:53












Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.
– PhaseRush
Nov 19 at 22:04




Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.
– PhaseRush
Nov 19 at 22:04












Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all
– a_chubenko
Nov 19 at 22:06




Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all
– a_chubenko
Nov 19 at 22:06












3 Answers
3






active

oldest

votes


















2














You can use Files.lines() and a Pattern:



private static final Pattern SPACES = Pattern.compile("[^a-zA-Z]+");

public static Stream<String> getWordStream(String path) throws IOException{
return Files.lines(Paths.get(path))
.flatMap(SPACES::splitAsStream)
.filter(word -> word.length() >= 3);
}





share|improve this answer























  • There is tested for a book with 105 K words. This method is the fastest, took 0.29s.
    – a_chubenko
    Nov 20 at 12:05






  • 1




    @Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.
    – Holger
    Nov 20 at 12:07












  • Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");
    – a_chubenko
    Nov 20 at 12:10










  • @Alex You should probably use Pattern.compile("[^a-zA-Z]+") (notice the + at the end). So you won't get "empty" words, e.g a text like: "I have 100 dollars" would produce an array: ["I", "have", "", "", "", "", "dollars"] with the current pattern
    – Lino
    Nov 20 at 12:19





















5














The worst part of your code is the following part



FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);


So when the file is absent, you will print the FileNotFoundException stack trace and proceed with a null input stream, leading to a NullPointerException. Instead of requiring the caller to deal with a spurious NullPointerException, you should declare the FileNotFoundException in the method signature. Otherwise, return an empty stream in the erroneous case.



But you don’t need to contruct a FileInputStream at all, as Scanner offers constructors accepting a File or Path. Combine this with the capability of returning a stream of matches (since Java 9) and you get:



private Stream<String> getWordsStream(String path) {
try {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
} catch(IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
return Stream.empty();
}
}


or preferably



private Stream<String> getWordsStream(String path) throws IOException {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
}


You don’t even need .useDelimiter("([^a-zA-Z])") here, as skipping all nonmatching stuff is the default behavior.



Closing the returned Stream will also close the Scanner.



So the caller should use it like this



try(Stream<String> s = getWordsStream("path/to/file")) {
s.forEach(System.out::println);
}





share|improve this answer























  • There is tested for a book with 105 K words. This method took about 0.6s.
    – a_chubenko
    Nov 20 at 12:06



















0














Thre're much easier approach: read lines from file to the Stream and filter it with required condition (e.g. length >= 3). Files.lines() has lazy loading, so it does not ready all words from the file at the beginning, it does it every time when next word is required



public static void main(String... args) throws IOException {
getWordsStream(Paths.get("d:/words.txt")).forEach(System.out::println);
}

public static Stream<String> getWordsStream(Path path) throws IOException {
final Scanner scan = new Scanner(path);

return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(Long.MAX_VALUE,
Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(Consumer<? super String> action) {
while (scan.hasNext()) {
String word = scan.next();

// you can use RegExp if you have more complicated condition
if (word.length() < 3)
continue;

action.accept(word);
return true;
}

return false;
}
}, false).onClose(scan::close);
}





share|improve this answer



















  • 1




    and lines() will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines
    – Carlos Heuberger
    Nov 19 at 22:17












  • It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.
    – a_chubenko
    Nov 19 at 22:29








  • 1




    well, actually lines() returns all lines, not only the ones with more than 3 letter. It's getWordsStream() that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines
    – Carlos Heuberger
    Nov 20 at 1:05










  • With java-11, you need not to change the signature of that method and can use Files.lines(Path.of(path)) instead.
    – nullpointer
    Nov 20 at 2:12










  • Fixed. Same lazy load approach. Pease a cake.
    – oleg.cherednik
    Nov 20 at 5:22











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383064%2fcreate-words-stream-using-scanner%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














You can use Files.lines() and a Pattern:



private static final Pattern SPACES = Pattern.compile("[^a-zA-Z]+");

public static Stream<String> getWordStream(String path) throws IOException{
return Files.lines(Paths.get(path))
.flatMap(SPACES::splitAsStream)
.filter(word -> word.length() >= 3);
}





share|improve this answer























  • There is tested for a book with 105 K words. This method is the fastest, took 0.29s.
    – a_chubenko
    Nov 20 at 12:05






  • 1




    @Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.
    – Holger
    Nov 20 at 12:07












  • Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");
    – a_chubenko
    Nov 20 at 12:10










  • @Alex You should probably use Pattern.compile("[^a-zA-Z]+") (notice the + at the end). So you won't get "empty" words, e.g a text like: "I have 100 dollars" would produce an array: ["I", "have", "", "", "", "", "dollars"] with the current pattern
    – Lino
    Nov 20 at 12:19


















2














You can use Files.lines() and a Pattern:



private static final Pattern SPACES = Pattern.compile("[^a-zA-Z]+");

public static Stream<String> getWordStream(String path) throws IOException{
return Files.lines(Paths.get(path))
.flatMap(SPACES::splitAsStream)
.filter(word -> word.length() >= 3);
}





share|improve this answer























  • There is tested for a book with 105 K words. This method is the fastest, took 0.29s.
    – a_chubenko
    Nov 20 at 12:05






  • 1




    @Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.
    – Holger
    Nov 20 at 12:07












  • Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");
    – a_chubenko
    Nov 20 at 12:10










  • @Alex You should probably use Pattern.compile("[^a-zA-Z]+") (notice the + at the end). So you won't get "empty" words, e.g a text like: "I have 100 dollars" would produce an array: ["I", "have", "", "", "", "", "dollars"] with the current pattern
    – Lino
    Nov 20 at 12:19
















2












2








2






You can use Files.lines() and a Pattern:



private static final Pattern SPACES = Pattern.compile("[^a-zA-Z]+");

public static Stream<String> getWordStream(String path) throws IOException{
return Files.lines(Paths.get(path))
.flatMap(SPACES::splitAsStream)
.filter(word -> word.length() >= 3);
}





share|improve this answer














You can use Files.lines() and a Pattern:



private static final Pattern SPACES = Pattern.compile("[^a-zA-Z]+");

public static Stream<String> getWordStream(String path) throws IOException{
return Files.lines(Paths.get(path))
.flatMap(SPACES::splitAsStream)
.filter(word -> word.length() >= 3);
}






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 at 12:43

























answered Nov 20 at 7:23









Lino

7,05021936




7,05021936












  • There is tested for a book with 105 K words. This method is the fastest, took 0.29s.
    – a_chubenko
    Nov 20 at 12:05






  • 1




    @Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.
    – Holger
    Nov 20 at 12:07












  • Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");
    – a_chubenko
    Nov 20 at 12:10










  • @Alex You should probably use Pattern.compile("[^a-zA-Z]+") (notice the + at the end). So you won't get "empty" words, e.g a text like: "I have 100 dollars" would produce an array: ["I", "have", "", "", "", "", "dollars"] with the current pattern
    – Lino
    Nov 20 at 12:19




















  • There is tested for a book with 105 K words. This method is the fastest, took 0.29s.
    – a_chubenko
    Nov 20 at 12:05






  • 1




    @Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.
    – Holger
    Nov 20 at 12:07












  • Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");
    – a_chubenko
    Nov 20 at 12:10










  • @Alex You should probably use Pattern.compile("[^a-zA-Z]+") (notice the + at the end). So you won't get "empty" words, e.g a text like: "I have 100 dollars" would produce an array: ["I", "have", "", "", "", "", "dollars"] with the current pattern
    – Lino
    Nov 20 at 12:19


















There is tested for a book with 105 K words. This method is the fastest, took 0.29s.
– a_chubenko
Nov 20 at 12:05




There is tested for a book with 105 K words. This method is the fastest, took 0.29s.
– a_chubenko
Nov 20 at 12:05




1




1




@Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.
– Holger
Nov 20 at 12:07






@Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.
– Holger
Nov 20 at 12:07














Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");
– a_chubenko
Nov 20 at 12:10




Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");
– a_chubenko
Nov 20 at 12:10












@Alex You should probably use Pattern.compile("[^a-zA-Z]+") (notice the + at the end). So you won't get "empty" words, e.g a text like: "I have 100 dollars" would produce an array: ["I", "have", "", "", "", "", "dollars"] with the current pattern
– Lino
Nov 20 at 12:19






@Alex You should probably use Pattern.compile("[^a-zA-Z]+") (notice the + at the end). So you won't get "empty" words, e.g a text like: "I have 100 dollars" would produce an array: ["I", "have", "", "", "", "", "dollars"] with the current pattern
– Lino
Nov 20 at 12:19















5














The worst part of your code is the following part



FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);


So when the file is absent, you will print the FileNotFoundException stack trace and proceed with a null input stream, leading to a NullPointerException. Instead of requiring the caller to deal with a spurious NullPointerException, you should declare the FileNotFoundException in the method signature. Otherwise, return an empty stream in the erroneous case.



But you don’t need to contruct a FileInputStream at all, as Scanner offers constructors accepting a File or Path. Combine this with the capability of returning a stream of matches (since Java 9) and you get:



private Stream<String> getWordsStream(String path) {
try {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
} catch(IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
return Stream.empty();
}
}


or preferably



private Stream<String> getWordsStream(String path) throws IOException {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
}


You don’t even need .useDelimiter("([^a-zA-Z])") here, as skipping all nonmatching stuff is the default behavior.



Closing the returned Stream will also close the Scanner.



So the caller should use it like this



try(Stream<String> s = getWordsStream("path/to/file")) {
s.forEach(System.out::println);
}





share|improve this answer























  • There is tested for a book with 105 K words. This method took about 0.6s.
    – a_chubenko
    Nov 20 at 12:06
















5














The worst part of your code is the following part



FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);


So when the file is absent, you will print the FileNotFoundException stack trace and proceed with a null input stream, leading to a NullPointerException. Instead of requiring the caller to deal with a spurious NullPointerException, you should declare the FileNotFoundException in the method signature. Otherwise, return an empty stream in the erroneous case.



But you don’t need to contruct a FileInputStream at all, as Scanner offers constructors accepting a File or Path. Combine this with the capability of returning a stream of matches (since Java 9) and you get:



private Stream<String> getWordsStream(String path) {
try {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
} catch(IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
return Stream.empty();
}
}


or preferably



private Stream<String> getWordsStream(String path) throws IOException {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
}


You don’t even need .useDelimiter("([^a-zA-Z])") here, as skipping all nonmatching stuff is the default behavior.



Closing the returned Stream will also close the Scanner.



So the caller should use it like this



try(Stream<String> s = getWordsStream("path/to/file")) {
s.forEach(System.out::println);
}





share|improve this answer























  • There is tested for a book with 105 K words. This method took about 0.6s.
    – a_chubenko
    Nov 20 at 12:06














5












5








5






The worst part of your code is the following part



FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);


So when the file is absent, you will print the FileNotFoundException stack trace and proceed with a null input stream, leading to a NullPointerException. Instead of requiring the caller to deal with a spurious NullPointerException, you should declare the FileNotFoundException in the method signature. Otherwise, return an empty stream in the erroneous case.



But you don’t need to contruct a FileInputStream at all, as Scanner offers constructors accepting a File or Path. Combine this with the capability of returning a stream of matches (since Java 9) and you get:



private Stream<String> getWordsStream(String path) {
try {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
} catch(IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
return Stream.empty();
}
}


or preferably



private Stream<String> getWordsStream(String path) throws IOException {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
}


You don’t even need .useDelimiter("([^a-zA-Z])") here, as skipping all nonmatching stuff is the default behavior.



Closing the returned Stream will also close the Scanner.



So the caller should use it like this



try(Stream<String> s = getWordsStream("path/to/file")) {
s.forEach(System.out::println);
}





share|improve this answer














The worst part of your code is the following part



FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);


So when the file is absent, you will print the FileNotFoundException stack trace and proceed with a null input stream, leading to a NullPointerException. Instead of requiring the caller to deal with a spurious NullPointerException, you should declare the FileNotFoundException in the method signature. Otherwise, return an empty stream in the erroneous case.



But you don’t need to contruct a FileInputStream at all, as Scanner offers constructors accepting a File or Path. Combine this with the capability of returning a stream of matches (since Java 9) and you get:



private Stream<String> getWordsStream(String path) {
try {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
} catch(IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
return Stream.empty();
}
}


or preferably



private Stream<String> getWordsStream(String path) throws IOException {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
}


You don’t even need .useDelimiter("([^a-zA-Z])") here, as skipping all nonmatching stuff is the default behavior.



Closing the returned Stream will also close the Scanner.



So the caller should use it like this



try(Stream<String> s = getWordsStream("path/to/file")) {
s.forEach(System.out::println);
}






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 at 8:19

























answered Nov 20 at 8:03









Holger

162k23229435




162k23229435












  • There is tested for a book with 105 K words. This method took about 0.6s.
    – a_chubenko
    Nov 20 at 12:06


















  • There is tested for a book with 105 K words. This method took about 0.6s.
    – a_chubenko
    Nov 20 at 12:06
















There is tested for a book with 105 K words. This method took about 0.6s.
– a_chubenko
Nov 20 at 12:06




There is tested for a book with 105 K words. This method took about 0.6s.
– a_chubenko
Nov 20 at 12:06











0














Thre're much easier approach: read lines from file to the Stream and filter it with required condition (e.g. length >= 3). Files.lines() has lazy loading, so it does not ready all words from the file at the beginning, it does it every time when next word is required



public static void main(String... args) throws IOException {
getWordsStream(Paths.get("d:/words.txt")).forEach(System.out::println);
}

public static Stream<String> getWordsStream(Path path) throws IOException {
final Scanner scan = new Scanner(path);

return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(Long.MAX_VALUE,
Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(Consumer<? super String> action) {
while (scan.hasNext()) {
String word = scan.next();

// you can use RegExp if you have more complicated condition
if (word.length() < 3)
continue;

action.accept(word);
return true;
}

return false;
}
}, false).onClose(scan::close);
}





share|improve this answer



















  • 1




    and lines() will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines
    – Carlos Heuberger
    Nov 19 at 22:17












  • It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.
    – a_chubenko
    Nov 19 at 22:29








  • 1




    well, actually lines() returns all lines, not only the ones with more than 3 letter. It's getWordsStream() that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines
    – Carlos Heuberger
    Nov 20 at 1:05










  • With java-11, you need not to change the signature of that method and can use Files.lines(Path.of(path)) instead.
    – nullpointer
    Nov 20 at 2:12










  • Fixed. Same lazy load approach. Pease a cake.
    – oleg.cherednik
    Nov 20 at 5:22
















0














Thre're much easier approach: read lines from file to the Stream and filter it with required condition (e.g. length >= 3). Files.lines() has lazy loading, so it does not ready all words from the file at the beginning, it does it every time when next word is required



public static void main(String... args) throws IOException {
getWordsStream(Paths.get("d:/words.txt")).forEach(System.out::println);
}

public static Stream<String> getWordsStream(Path path) throws IOException {
final Scanner scan = new Scanner(path);

return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(Long.MAX_VALUE,
Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(Consumer<? super String> action) {
while (scan.hasNext()) {
String word = scan.next();

// you can use RegExp if you have more complicated condition
if (word.length() < 3)
continue;

action.accept(word);
return true;
}

return false;
}
}, false).onClose(scan::close);
}





share|improve this answer



















  • 1




    and lines() will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines
    – Carlos Heuberger
    Nov 19 at 22:17












  • It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.
    – a_chubenko
    Nov 19 at 22:29








  • 1




    well, actually lines() returns all lines, not only the ones with more than 3 letter. It's getWordsStream() that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines
    – Carlos Heuberger
    Nov 20 at 1:05










  • With java-11, you need not to change the signature of that method and can use Files.lines(Path.of(path)) instead.
    – nullpointer
    Nov 20 at 2:12










  • Fixed. Same lazy load approach. Pease a cake.
    – oleg.cherednik
    Nov 20 at 5:22














0












0








0






Thre're much easier approach: read lines from file to the Stream and filter it with required condition (e.g. length >= 3). Files.lines() has lazy loading, so it does not ready all words from the file at the beginning, it does it every time when next word is required



public static void main(String... args) throws IOException {
getWordsStream(Paths.get("d:/words.txt")).forEach(System.out::println);
}

public static Stream<String> getWordsStream(Path path) throws IOException {
final Scanner scan = new Scanner(path);

return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(Long.MAX_VALUE,
Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(Consumer<? super String> action) {
while (scan.hasNext()) {
String word = scan.next();

// you can use RegExp if you have more complicated condition
if (word.length() < 3)
continue;

action.accept(word);
return true;
}

return false;
}
}, false).onClose(scan::close);
}





share|improve this answer














Thre're much easier approach: read lines from file to the Stream and filter it with required condition (e.g. length >= 3). Files.lines() has lazy loading, so it does not ready all words from the file at the beginning, it does it every time when next word is required



public static void main(String... args) throws IOException {
getWordsStream(Paths.get("d:/words.txt")).forEach(System.out::println);
}

public static Stream<String> getWordsStream(Path path) throws IOException {
final Scanner scan = new Scanner(path);

return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(Long.MAX_VALUE,
Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(Consumer<? super String> action) {
while (scan.hasNext()) {
String word = scan.next();

// you can use RegExp if you have more complicated condition
if (word.length() < 3)
continue;

action.accept(word);
return true;
}

return false;
}
}, false).onClose(scan::close);
}






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 at 8:34

























answered Nov 19 at 22:04









oleg.cherednik

5,49521017




5,49521017








  • 1




    and lines() will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines
    – Carlos Heuberger
    Nov 19 at 22:17












  • It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.
    – a_chubenko
    Nov 19 at 22:29








  • 1




    well, actually lines() returns all lines, not only the ones with more than 3 letter. It's getWordsStream() that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines
    – Carlos Heuberger
    Nov 20 at 1:05










  • With java-11, you need not to change the signature of that method and can use Files.lines(Path.of(path)) instead.
    – nullpointer
    Nov 20 at 2:12










  • Fixed. Same lazy load approach. Pease a cake.
    – oleg.cherednik
    Nov 20 at 5:22














  • 1




    and lines() will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines
    – Carlos Heuberger
    Nov 19 at 22:17












  • It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.
    – a_chubenko
    Nov 19 at 22:29








  • 1




    well, actually lines() returns all lines, not only the ones with more than 3 letter. It's getWordsStream() that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines
    – Carlos Heuberger
    Nov 20 at 1:05










  • With java-11, you need not to change the signature of that method and can use Files.lines(Path.of(path)) instead.
    – nullpointer
    Nov 20 at 2:12










  • Fixed. Same lazy load approach. Pease a cake.
    – oleg.cherednik
    Nov 20 at 5:22








1




1




and lines() will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines
– Carlos Heuberger
Nov 19 at 22:17






and lines() will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines
– Carlos Heuberger
Nov 19 at 22:17














It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.
– a_chubenko
Nov 19 at 22:29






It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.
– a_chubenko
Nov 19 at 22:29






1




1




well, actually lines() returns all lines, not only the ones with more than 3 letter. It's getWordsStream() that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines
– Carlos Heuberger
Nov 20 at 1:05




well, actually lines() returns all lines, not only the ones with more than 3 letter. It's getWordsStream() that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines
– Carlos Heuberger
Nov 20 at 1:05












With java-11, you need not to change the signature of that method and can use Files.lines(Path.of(path)) instead.
– nullpointer
Nov 20 at 2:12




With java-11, you need not to change the signature of that method and can use Files.lines(Path.of(path)) instead.
– nullpointer
Nov 20 at 2:12












Fixed. Same lazy load approach. Pease a cake.
– oleg.cherednik
Nov 20 at 5:22




Fixed. Same lazy load approach. Pease a cake.
– oleg.cherednik
Nov 20 at 5:22


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383064%2fcreate-words-stream-using-scanner%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

Origin of the phrase “under your belt”?