ggplot2 and CSV “inventing” data that isn't in my input











up vote
0
down vote

favorite












I'm attempting to produce an attractive graph of bandwidth data across a number of machines and tests. My attempts seem to work for small manually entered amounts of data, but when I feed the "full" 1773 entries, I get results in my graph that don't seem to exist in the input data.



I believe this is likely because the different tests are each of different duration, but I can't seem to prove this. If I use the following input data as csv (sorry, off-site because of size) I end up with a strange upwards-curve on my geom_smooth line, and additional data points that I can't actually see in my .csv input data. (I have much more data in real life, this is a subset that produces the strange behaviour)



I would expect the first four tries (try01-try04) to flat-line at zero, and try05 to carry on at around 1GBit/sec. Here's my code



library("ggplot2")
library("RColorBrewer")

speed = read.csv(file="data.csv")

svg("all_results.svg",width=24)
ggplot(speed,
aes(x = Second, y = Bandwidth, group=Test, colour=Test)) +
scale_fill_brewer(palette="Paired") +
geom_point() +
geom_smooth()
dev.off()


Here's the image produced



@Gregor seems to be exactly right in that the seconds are interpreted as text, when they should represent the number of the seconds since the start of that test.
Here's some example input data - please note the times are not always on a .00 second boundary due to the output of iperf.



structure(list(Machine = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "valhalla", class = "factor"),
User = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "alice", class = "factor"),
Test = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "try01", class = "factor"),
Second = structure(c(1L, 2L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), .Label = c("0.00-1.00",
"1.00-2.00", "10.00-11.00", "11.00-12.00", "12.00-13.00",
"13.00-14.00", "14.00-15.00", "15.00-16.00", "16.00-17.00",
"17.00-18.00", "18.00-19.00", "19.00-20.00", "2.00-3.00",
"3.00-4.00", "4.00-5.00", "5.00-6.00", "6.00-7.00", "7.00-8.00",
"8.00-9.00", "9.00-10.00"), class = "factor"), Bandwidth = c(937,
943, 944, 943, 943, 943, 943, 944, 658, 943, 944, 943, 944,
644, 943, 943, 943, 944, 943, 943)), row.names = c(NA, 20L
), class = "data.frame")


I'll try casting (or whatever R calls it) those to a float now.










share|improve this question
























  • Your x-axis looks categorical, which means it is probably a factor and is ordered alphabetically. You don't share any data, and we can't read any values off your chart, but I would guess that your Second column should be treated as numeric and you should convert it. If you share some sample data we can help with that.
    – Gregor
    Nov 19 at 16:04










  • (And by don't share any data, I mean in the question itself. dput(droplevels(head(speed, 20))) would be a great way to share the top 20 rows of your data, in a copy/pasteable way that shows the object structure and classes. And it doesn't require asking people to download and import some large data.
    – Gregor
    Nov 19 at 16:06












  • Ah, you're exactly correct @Gregor - my time is being treated as text. It's of the form "9.01-10.00" and "12.00-13.00" (i.e. approximately one second per sample). I'll update my question to include the dput as it's too large for the comment
    – user9793038
    Nov 19 at 16:41

















up vote
0
down vote

favorite












I'm attempting to produce an attractive graph of bandwidth data across a number of machines and tests. My attempts seem to work for small manually entered amounts of data, but when I feed the "full" 1773 entries, I get results in my graph that don't seem to exist in the input data.



I believe this is likely because the different tests are each of different duration, but I can't seem to prove this. If I use the following input data as csv (sorry, off-site because of size) I end up with a strange upwards-curve on my geom_smooth line, and additional data points that I can't actually see in my .csv input data. (I have much more data in real life, this is a subset that produces the strange behaviour)



I would expect the first four tries (try01-try04) to flat-line at zero, and try05 to carry on at around 1GBit/sec. Here's my code



library("ggplot2")
library("RColorBrewer")

speed = read.csv(file="data.csv")

svg("all_results.svg",width=24)
ggplot(speed,
aes(x = Second, y = Bandwidth, group=Test, colour=Test)) +
scale_fill_brewer(palette="Paired") +
geom_point() +
geom_smooth()
dev.off()


Here's the image produced



@Gregor seems to be exactly right in that the seconds are interpreted as text, when they should represent the number of the seconds since the start of that test.
Here's some example input data - please note the times are not always on a .00 second boundary due to the output of iperf.



structure(list(Machine = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "valhalla", class = "factor"),
User = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "alice", class = "factor"),
Test = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "try01", class = "factor"),
Second = structure(c(1L, 2L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), .Label = c("0.00-1.00",
"1.00-2.00", "10.00-11.00", "11.00-12.00", "12.00-13.00",
"13.00-14.00", "14.00-15.00", "15.00-16.00", "16.00-17.00",
"17.00-18.00", "18.00-19.00", "19.00-20.00", "2.00-3.00",
"3.00-4.00", "4.00-5.00", "5.00-6.00", "6.00-7.00", "7.00-8.00",
"8.00-9.00", "9.00-10.00"), class = "factor"), Bandwidth = c(937,
943, 944, 943, 943, 943, 943, 944, 658, 943, 944, 943, 944,
644, 943, 943, 943, 944, 943, 943)), row.names = c(NA, 20L
), class = "data.frame")


I'll try casting (or whatever R calls it) those to a float now.










share|improve this question
























  • Your x-axis looks categorical, which means it is probably a factor and is ordered alphabetically. You don't share any data, and we can't read any values off your chart, but I would guess that your Second column should be treated as numeric and you should convert it. If you share some sample data we can help with that.
    – Gregor
    Nov 19 at 16:04










  • (And by don't share any data, I mean in the question itself. dput(droplevels(head(speed, 20))) would be a great way to share the top 20 rows of your data, in a copy/pasteable way that shows the object structure and classes. And it doesn't require asking people to download and import some large data.
    – Gregor
    Nov 19 at 16:06












  • Ah, you're exactly correct @Gregor - my time is being treated as text. It's of the form "9.01-10.00" and "12.00-13.00" (i.e. approximately one second per sample). I'll update my question to include the dput as it's too large for the comment
    – user9793038
    Nov 19 at 16:41















up vote
0
down vote

favorite









up vote
0
down vote

favorite











I'm attempting to produce an attractive graph of bandwidth data across a number of machines and tests. My attempts seem to work for small manually entered amounts of data, but when I feed the "full" 1773 entries, I get results in my graph that don't seem to exist in the input data.



I believe this is likely because the different tests are each of different duration, but I can't seem to prove this. If I use the following input data as csv (sorry, off-site because of size) I end up with a strange upwards-curve on my geom_smooth line, and additional data points that I can't actually see in my .csv input data. (I have much more data in real life, this is a subset that produces the strange behaviour)



I would expect the first four tries (try01-try04) to flat-line at zero, and try05 to carry on at around 1GBit/sec. Here's my code



library("ggplot2")
library("RColorBrewer")

speed = read.csv(file="data.csv")

svg("all_results.svg",width=24)
ggplot(speed,
aes(x = Second, y = Bandwidth, group=Test, colour=Test)) +
scale_fill_brewer(palette="Paired") +
geom_point() +
geom_smooth()
dev.off()


Here's the image produced



@Gregor seems to be exactly right in that the seconds are interpreted as text, when they should represent the number of the seconds since the start of that test.
Here's some example input data - please note the times are not always on a .00 second boundary due to the output of iperf.



structure(list(Machine = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "valhalla", class = "factor"),
User = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "alice", class = "factor"),
Test = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "try01", class = "factor"),
Second = structure(c(1L, 2L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), .Label = c("0.00-1.00",
"1.00-2.00", "10.00-11.00", "11.00-12.00", "12.00-13.00",
"13.00-14.00", "14.00-15.00", "15.00-16.00", "16.00-17.00",
"17.00-18.00", "18.00-19.00", "19.00-20.00", "2.00-3.00",
"3.00-4.00", "4.00-5.00", "5.00-6.00", "6.00-7.00", "7.00-8.00",
"8.00-9.00", "9.00-10.00"), class = "factor"), Bandwidth = c(937,
943, 944, 943, 943, 943, 943, 944, 658, 943, 944, 943, 944,
644, 943, 943, 943, 944, 943, 943)), row.names = c(NA, 20L
), class = "data.frame")


I'll try casting (or whatever R calls it) those to a float now.










share|improve this question















I'm attempting to produce an attractive graph of bandwidth data across a number of machines and tests. My attempts seem to work for small manually entered amounts of data, but when I feed the "full" 1773 entries, I get results in my graph that don't seem to exist in the input data.



I believe this is likely because the different tests are each of different duration, but I can't seem to prove this. If I use the following input data as csv (sorry, off-site because of size) I end up with a strange upwards-curve on my geom_smooth line, and additional data points that I can't actually see in my .csv input data. (I have much more data in real life, this is a subset that produces the strange behaviour)



I would expect the first four tries (try01-try04) to flat-line at zero, and try05 to carry on at around 1GBit/sec. Here's my code



library("ggplot2")
library("RColorBrewer")

speed = read.csv(file="data.csv")

svg("all_results.svg",width=24)
ggplot(speed,
aes(x = Second, y = Bandwidth, group=Test, colour=Test)) +
scale_fill_brewer(palette="Paired") +
geom_point() +
geom_smooth()
dev.off()


Here's the image produced



@Gregor seems to be exactly right in that the seconds are interpreted as text, when they should represent the number of the seconds since the start of that test.
Here's some example input data - please note the times are not always on a .00 second boundary due to the output of iperf.



structure(list(Machine = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "valhalla", class = "factor"),
User = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "alice", class = "factor"),
Test = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "try01", class = "factor"),
Second = structure(c(1L, 2L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), .Label = c("0.00-1.00",
"1.00-2.00", "10.00-11.00", "11.00-12.00", "12.00-13.00",
"13.00-14.00", "14.00-15.00", "15.00-16.00", "16.00-17.00",
"17.00-18.00", "18.00-19.00", "19.00-20.00", "2.00-3.00",
"3.00-4.00", "4.00-5.00", "5.00-6.00", "6.00-7.00", "7.00-8.00",
"8.00-9.00", "9.00-10.00"), class = "factor"), Bandwidth = c(937,
943, 944, 943, 943, 943, 943, 944, 658, 943, 944, 943, 944,
644, 943, 943, 943, 944, 943, 943)), row.names = c(NA, 20L
), class = "data.frame")


I'll try casting (or whatever R calls it) those to a float now.







r csv ggplot2






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 at 16:46

























asked Nov 19 at 13:45









user9793038

104




104












  • Your x-axis looks categorical, which means it is probably a factor and is ordered alphabetically. You don't share any data, and we can't read any values off your chart, but I would guess that your Second column should be treated as numeric and you should convert it. If you share some sample data we can help with that.
    – Gregor
    Nov 19 at 16:04










  • (And by don't share any data, I mean in the question itself. dput(droplevels(head(speed, 20))) would be a great way to share the top 20 rows of your data, in a copy/pasteable way that shows the object structure and classes. And it doesn't require asking people to download and import some large data.
    – Gregor
    Nov 19 at 16:06












  • Ah, you're exactly correct @Gregor - my time is being treated as text. It's of the form "9.01-10.00" and "12.00-13.00" (i.e. approximately one second per sample). I'll update my question to include the dput as it's too large for the comment
    – user9793038
    Nov 19 at 16:41




















  • Your x-axis looks categorical, which means it is probably a factor and is ordered alphabetically. You don't share any data, and we can't read any values off your chart, but I would guess that your Second column should be treated as numeric and you should convert it. If you share some sample data we can help with that.
    – Gregor
    Nov 19 at 16:04










  • (And by don't share any data, I mean in the question itself. dput(droplevels(head(speed, 20))) would be a great way to share the top 20 rows of your data, in a copy/pasteable way that shows the object structure and classes. And it doesn't require asking people to download and import some large data.
    – Gregor
    Nov 19 at 16:06












  • Ah, you're exactly correct @Gregor - my time is being treated as text. It's of the form "9.01-10.00" and "12.00-13.00" (i.e. approximately one second per sample). I'll update my question to include the dput as it's too large for the comment
    – user9793038
    Nov 19 at 16:41


















Your x-axis looks categorical, which means it is probably a factor and is ordered alphabetically. You don't share any data, and we can't read any values off your chart, but I would guess that your Second column should be treated as numeric and you should convert it. If you share some sample data we can help with that.
– Gregor
Nov 19 at 16:04




Your x-axis looks categorical, which means it is probably a factor and is ordered alphabetically. You don't share any data, and we can't read any values off your chart, but I would guess that your Second column should be treated as numeric and you should convert it. If you share some sample data we can help with that.
– Gregor
Nov 19 at 16:04












(And by don't share any data, I mean in the question itself. dput(droplevels(head(speed, 20))) would be a great way to share the top 20 rows of your data, in a copy/pasteable way that shows the object structure and classes. And it doesn't require asking people to download and import some large data.
– Gregor
Nov 19 at 16:06






(And by don't share any data, I mean in the question itself. dput(droplevels(head(speed, 20))) would be a great way to share the top 20 rows of your data, in a copy/pasteable way that shows the object structure and classes. And it doesn't require asking people to download and import some large data.
– Gregor
Nov 19 at 16:06














Ah, you're exactly correct @Gregor - my time is being treated as text. It's of the form "9.01-10.00" and "12.00-13.00" (i.e. approximately one second per sample). I'll update my question to include the dput as it's too large for the comment
– user9793038
Nov 19 at 16:41






Ah, you're exactly correct @Gregor - my time is being treated as text. It's of the form "9.01-10.00" and "12.00-13.00" (i.e. approximately one second per sample). I'll update my question to include the dput as it's too large for the comment
– user9793038
Nov 19 at 16:41














1 Answer
1






active

oldest

votes

















up vote
0
down vote



accepted










Points have a single x value, not a range of x-values, so we'll separate your Second column into beginning and end of the interval and plot the points at the beginning. Calling your data dd"



library(tidyr)
library(dplyr)
dd = dd %>%
separate(Second, into = c("sec_start", "sec_end"), sep = "-", remove = FALSE) %>%
mutate(sec_start = as.numeric(sec_start),
sec_end = as.numeric(sec_end))


After that the plotting should go just fine if you put sec_start or sec_end on the x-axis. (Or calculate the middle, whatever you want...)



If you want to visualize the durations, you could use geom_segment and aes(x = sec_start, xend = sec_end, y = Bandwidth, yend = Bandwidth), but since everything is just about the same duration, it doesn't seem like this would add much value.






share|improve this answer





















  • Thanks Gregor, your answer worked more or less verbatim. I now need to read up on tidyr and dplyr, and where that "%>" syntax comes from. I'm not yet deep enough into R for the landscape to make complete sense, and I'm still stuck in my more perl-aware thinking...
    – user9793038
    Nov 20 at 9:04










  • Glad it helped. I'd strongly recommend the package vignette An Introduction to dplyr. Covers all the dplyr basics, including %>%.
    – Gregor
    Nov 20 at 14:10













Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53375984%2fggplot2-and-csv-inventing-data-that-isnt-in-my-input%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote



accepted










Points have a single x value, not a range of x-values, so we'll separate your Second column into beginning and end of the interval and plot the points at the beginning. Calling your data dd"



library(tidyr)
library(dplyr)
dd = dd %>%
separate(Second, into = c("sec_start", "sec_end"), sep = "-", remove = FALSE) %>%
mutate(sec_start = as.numeric(sec_start),
sec_end = as.numeric(sec_end))


After that the plotting should go just fine if you put sec_start or sec_end on the x-axis. (Or calculate the middle, whatever you want...)



If you want to visualize the durations, you could use geom_segment and aes(x = sec_start, xend = sec_end, y = Bandwidth, yend = Bandwidth), but since everything is just about the same duration, it doesn't seem like this would add much value.






share|improve this answer





















  • Thanks Gregor, your answer worked more or less verbatim. I now need to read up on tidyr and dplyr, and where that "%>" syntax comes from. I'm not yet deep enough into R for the landscape to make complete sense, and I'm still stuck in my more perl-aware thinking...
    – user9793038
    Nov 20 at 9:04










  • Glad it helped. I'd strongly recommend the package vignette An Introduction to dplyr. Covers all the dplyr basics, including %>%.
    – Gregor
    Nov 20 at 14:10

















up vote
0
down vote



accepted










Points have a single x value, not a range of x-values, so we'll separate your Second column into beginning and end of the interval and plot the points at the beginning. Calling your data dd"



library(tidyr)
library(dplyr)
dd = dd %>%
separate(Second, into = c("sec_start", "sec_end"), sep = "-", remove = FALSE) %>%
mutate(sec_start = as.numeric(sec_start),
sec_end = as.numeric(sec_end))


After that the plotting should go just fine if you put sec_start or sec_end on the x-axis. (Or calculate the middle, whatever you want...)



If you want to visualize the durations, you could use geom_segment and aes(x = sec_start, xend = sec_end, y = Bandwidth, yend = Bandwidth), but since everything is just about the same duration, it doesn't seem like this would add much value.






share|improve this answer





















  • Thanks Gregor, your answer worked more or less verbatim. I now need to read up on tidyr and dplyr, and where that "%>" syntax comes from. I'm not yet deep enough into R for the landscape to make complete sense, and I'm still stuck in my more perl-aware thinking...
    – user9793038
    Nov 20 at 9:04










  • Glad it helped. I'd strongly recommend the package vignette An Introduction to dplyr. Covers all the dplyr basics, including %>%.
    – Gregor
    Nov 20 at 14:10















up vote
0
down vote



accepted







up vote
0
down vote



accepted






Points have a single x value, not a range of x-values, so we'll separate your Second column into beginning and end of the interval and plot the points at the beginning. Calling your data dd"



library(tidyr)
library(dplyr)
dd = dd %>%
separate(Second, into = c("sec_start", "sec_end"), sep = "-", remove = FALSE) %>%
mutate(sec_start = as.numeric(sec_start),
sec_end = as.numeric(sec_end))


After that the plotting should go just fine if you put sec_start or sec_end on the x-axis. (Or calculate the middle, whatever you want...)



If you want to visualize the durations, you could use geom_segment and aes(x = sec_start, xend = sec_end, y = Bandwidth, yend = Bandwidth), but since everything is just about the same duration, it doesn't seem like this would add much value.






share|improve this answer












Points have a single x value, not a range of x-values, so we'll separate your Second column into beginning and end of the interval and plot the points at the beginning. Calling your data dd"



library(tidyr)
library(dplyr)
dd = dd %>%
separate(Second, into = c("sec_start", "sec_end"), sep = "-", remove = FALSE) %>%
mutate(sec_start = as.numeric(sec_start),
sec_end = as.numeric(sec_end))


After that the plotting should go just fine if you put sec_start or sec_end on the x-axis. (Or calculate the middle, whatever you want...)



If you want to visualize the durations, you could use geom_segment and aes(x = sec_start, xend = sec_end, y = Bandwidth, yend = Bandwidth), but since everything is just about the same duration, it doesn't seem like this would add much value.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 19 at 17:52









Gregor

62.3k988163




62.3k988163












  • Thanks Gregor, your answer worked more or less verbatim. I now need to read up on tidyr and dplyr, and where that "%>" syntax comes from. I'm not yet deep enough into R for the landscape to make complete sense, and I'm still stuck in my more perl-aware thinking...
    – user9793038
    Nov 20 at 9:04










  • Glad it helped. I'd strongly recommend the package vignette An Introduction to dplyr. Covers all the dplyr basics, including %>%.
    – Gregor
    Nov 20 at 14:10




















  • Thanks Gregor, your answer worked more or less verbatim. I now need to read up on tidyr and dplyr, and where that "%>" syntax comes from. I'm not yet deep enough into R for the landscape to make complete sense, and I'm still stuck in my more perl-aware thinking...
    – user9793038
    Nov 20 at 9:04










  • Glad it helped. I'd strongly recommend the package vignette An Introduction to dplyr. Covers all the dplyr basics, including %>%.
    – Gregor
    Nov 20 at 14:10


















Thanks Gregor, your answer worked more or less verbatim. I now need to read up on tidyr and dplyr, and where that "%>" syntax comes from. I'm not yet deep enough into R for the landscape to make complete sense, and I'm still stuck in my more perl-aware thinking...
– user9793038
Nov 20 at 9:04




Thanks Gregor, your answer worked more or less verbatim. I now need to read up on tidyr and dplyr, and where that "%>" syntax comes from. I'm not yet deep enough into R for the landscape to make complete sense, and I'm still stuck in my more perl-aware thinking...
– user9793038
Nov 20 at 9:04












Glad it helped. I'd strongly recommend the package vignette An Introduction to dplyr. Covers all the dplyr basics, including %>%.
– Gregor
Nov 20 at 14:10






Glad it helped. I'd strongly recommend the package vignette An Introduction to dplyr. Covers all the dplyr basics, including %>%.
– Gregor
Nov 20 at 14:10




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53375984%2fggplot2-and-csv-inventing-data-that-isnt-in-my-input%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Paul Cézanne

UIScrollView CustomStickyHeader Resize height generates problems when scroll is too fast

Angular material date-picker (MatDatepicker) auto completes the date on focus out