Drawing equally-sized samples from differently-sized substrata of a dataframe in R [duplicate]











up vote
0
down vote

favorite













This question already has an answer here:




  • Sample n random rows per group in a dataframe

    5 answers



  • Stratified random sampling from data frame

    4 answers




I have a dataframe with multiple columns containing, inter alia, words and their position in sentences. For some positions, there's more rows than for other positions. Here's a mock example:



df <- data.frame(
word = sample(LETTERS, 100, replace = T),
position = sample(1:5, 100, replace = T)
)
head(df)
word position
1 K 1
2 R 5
3 J 2
4 Y 5
5 Z 5
6 U 4


Obviously, the tranches of 'position' are differently sized:



table(df$position)
1 2 3 4 5
15 15 17 28 25


To make the different tranches more easily comparable I'd like to draw equally sized samples on the variable 'position' within one dataframe. This can theoretically be done in steps, such as these:



df_pos1 <- df[df$position==1,]
df_pos1_sample <- df_pos1[sample(1:nrow(df_pos1), 3),]

df_pos2 <- df[df$position==2,]
df_pos2_sample <- df_pos2[sample(1:nrow(df_pos2), 3),]

df_pos3 <- df[df$position==3,]
df_pos3_sample <- df_pos3[sample(1:nrow(df_pos3), 3),]

df_pos4 <- df[df$position==4,]
df_pos4_sample <- df_pos4[sample(1:nrow(df_pos4), 3),]

df_pos5 <- df[df$position==5,]
df_pos5_sample <- df_pos5[sample(1:nrow(df_pos5), 3),]


and so on, to finally combine the individual samples in a single dataframe:



df_samples <- rbind(df_pos1_sample, df_pos2_sample, df_pos3_sample, df_pos4_sample, df_pos5_sample)


but this procedure is cumbersome and error-prone. A more economical solution might be a for loop. I've tried this code so far, which, however, returns, not a combination of the individual samples for each position value but a single sample drawn from all values for 'position':



df_samples <-c()
for(i in unique(df$position)){
df_samples <- rbind(df[sample(1:nrow(df[df$position==i,]), 3),])
}
df_samples
word position
13 D 2
2 R 5
12 G 3
4 Y 5
16 Z 3
11 S 3
6 U 4
14 J 3
9 O 5
1 K 1


What's wrong with this code and how can it be improved?










share|improve this question













marked as duplicate by Henrik r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
11 hours ago


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















    up vote
    0
    down vote

    favorite













    This question already has an answer here:




    • Sample n random rows per group in a dataframe

      5 answers



    • Stratified random sampling from data frame

      4 answers




    I have a dataframe with multiple columns containing, inter alia, words and their position in sentences. For some positions, there's more rows than for other positions. Here's a mock example:



    df <- data.frame(
    word = sample(LETTERS, 100, replace = T),
    position = sample(1:5, 100, replace = T)
    )
    head(df)
    word position
    1 K 1
    2 R 5
    3 J 2
    4 Y 5
    5 Z 5
    6 U 4


    Obviously, the tranches of 'position' are differently sized:



    table(df$position)
    1 2 3 4 5
    15 15 17 28 25


    To make the different tranches more easily comparable I'd like to draw equally sized samples on the variable 'position' within one dataframe. This can theoretically be done in steps, such as these:



    df_pos1 <- df[df$position==1,]
    df_pos1_sample <- df_pos1[sample(1:nrow(df_pos1), 3),]

    df_pos2 <- df[df$position==2,]
    df_pos2_sample <- df_pos2[sample(1:nrow(df_pos2), 3),]

    df_pos3 <- df[df$position==3,]
    df_pos3_sample <- df_pos3[sample(1:nrow(df_pos3), 3),]

    df_pos4 <- df[df$position==4,]
    df_pos4_sample <- df_pos4[sample(1:nrow(df_pos4), 3),]

    df_pos5 <- df[df$position==5,]
    df_pos5_sample <- df_pos5[sample(1:nrow(df_pos5), 3),]


    and so on, to finally combine the individual samples in a single dataframe:



    df_samples <- rbind(df_pos1_sample, df_pos2_sample, df_pos3_sample, df_pos4_sample, df_pos5_sample)


    but this procedure is cumbersome and error-prone. A more economical solution might be a for loop. I've tried this code so far, which, however, returns, not a combination of the individual samples for each position value but a single sample drawn from all values for 'position':



    df_samples <-c()
    for(i in unique(df$position)){
    df_samples <- rbind(df[sample(1:nrow(df[df$position==i,]), 3),])
    }
    df_samples
    word position
    13 D 2
    2 R 5
    12 G 3
    4 Y 5
    16 Z 3
    11 S 3
    6 U 4
    14 J 3
    9 O 5
    1 K 1


    What's wrong with this code and how can it be improved?










    share|improve this question













    marked as duplicate by Henrik r
    Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

    StackExchange.ready(function() {
    if (StackExchange.options.isMobile) return;

    $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
    var $hover = $(this).addClass('hover-bound'),
    $msg = $hover.siblings('.dupe-hammer-message');

    $hover.hover(
    function() {
    $hover.showInfoMessage('', {
    messageElement: $msg.clone().show(),
    transient: false,
    position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
    dismissable: false,
    relativeToBody: true
    });
    },
    function() {
    StackExchange.helpers.removeMessages();
    }
    );
    });
    });
    11 hours ago


    This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

















      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite












      This question already has an answer here:




      • Sample n random rows per group in a dataframe

        5 answers



      • Stratified random sampling from data frame

        4 answers




      I have a dataframe with multiple columns containing, inter alia, words and their position in sentences. For some positions, there's more rows than for other positions. Here's a mock example:



      df <- data.frame(
      word = sample(LETTERS, 100, replace = T),
      position = sample(1:5, 100, replace = T)
      )
      head(df)
      word position
      1 K 1
      2 R 5
      3 J 2
      4 Y 5
      5 Z 5
      6 U 4


      Obviously, the tranches of 'position' are differently sized:



      table(df$position)
      1 2 3 4 5
      15 15 17 28 25


      To make the different tranches more easily comparable I'd like to draw equally sized samples on the variable 'position' within one dataframe. This can theoretically be done in steps, such as these:



      df_pos1 <- df[df$position==1,]
      df_pos1_sample <- df_pos1[sample(1:nrow(df_pos1), 3),]

      df_pos2 <- df[df$position==2,]
      df_pos2_sample <- df_pos2[sample(1:nrow(df_pos2), 3),]

      df_pos3 <- df[df$position==3,]
      df_pos3_sample <- df_pos3[sample(1:nrow(df_pos3), 3),]

      df_pos4 <- df[df$position==4,]
      df_pos4_sample <- df_pos4[sample(1:nrow(df_pos4), 3),]

      df_pos5 <- df[df$position==5,]
      df_pos5_sample <- df_pos5[sample(1:nrow(df_pos5), 3),]


      and so on, to finally combine the individual samples in a single dataframe:



      df_samples <- rbind(df_pos1_sample, df_pos2_sample, df_pos3_sample, df_pos4_sample, df_pos5_sample)


      but this procedure is cumbersome and error-prone. A more economical solution might be a for loop. I've tried this code so far, which, however, returns, not a combination of the individual samples for each position value but a single sample drawn from all values for 'position':



      df_samples <-c()
      for(i in unique(df$position)){
      df_samples <- rbind(df[sample(1:nrow(df[df$position==i,]), 3),])
      }
      df_samples
      word position
      13 D 2
      2 R 5
      12 G 3
      4 Y 5
      16 Z 3
      11 S 3
      6 U 4
      14 J 3
      9 O 5
      1 K 1


      What's wrong with this code and how can it be improved?










      share|improve this question














      This question already has an answer here:




      • Sample n random rows per group in a dataframe

        5 answers



      • Stratified random sampling from data frame

        4 answers




      I have a dataframe with multiple columns containing, inter alia, words and their position in sentences. For some positions, there's more rows than for other positions. Here's a mock example:



      df <- data.frame(
      word = sample(LETTERS, 100, replace = T),
      position = sample(1:5, 100, replace = T)
      )
      head(df)
      word position
      1 K 1
      2 R 5
      3 J 2
      4 Y 5
      5 Z 5
      6 U 4


      Obviously, the tranches of 'position' are differently sized:



      table(df$position)
      1 2 3 4 5
      15 15 17 28 25


      To make the different tranches more easily comparable I'd like to draw equally sized samples on the variable 'position' within one dataframe. This can theoretically be done in steps, such as these:



      df_pos1 <- df[df$position==1,]
      df_pos1_sample <- df_pos1[sample(1:nrow(df_pos1), 3),]

      df_pos2 <- df[df$position==2,]
      df_pos2_sample <- df_pos2[sample(1:nrow(df_pos2), 3),]

      df_pos3 <- df[df$position==3,]
      df_pos3_sample <- df_pos3[sample(1:nrow(df_pos3), 3),]

      df_pos4 <- df[df$position==4,]
      df_pos4_sample <- df_pos4[sample(1:nrow(df_pos4), 3),]

      df_pos5 <- df[df$position==5,]
      df_pos5_sample <- df_pos5[sample(1:nrow(df_pos5), 3),]


      and so on, to finally combine the individual samples in a single dataframe:



      df_samples <- rbind(df_pos1_sample, df_pos2_sample, df_pos3_sample, df_pos4_sample, df_pos5_sample)


      but this procedure is cumbersome and error-prone. A more economical solution might be a for loop. I've tried this code so far, which, however, returns, not a combination of the individual samples for each position value but a single sample drawn from all values for 'position':



      df_samples <-c()
      for(i in unique(df$position)){
      df_samples <- rbind(df[sample(1:nrow(df[df$position==i,]), 3),])
      }
      df_samples
      word position
      13 D 2
      2 R 5
      12 G 3
      4 Y 5
      16 Z 3
      11 S 3
      6 U 4
      14 J 3
      9 O 5
      1 K 1


      What's wrong with this code and how can it be improved?





      This question already has an answer here:




      • Sample n random rows per group in a dataframe

        5 answers



      • Stratified random sampling from data frame

        4 answers








      r for-loop sample






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 11 hours ago









      Chris Ruehlemann

      1288




      1288




      marked as duplicate by Henrik r
      Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

      StackExchange.ready(function() {
      if (StackExchange.options.isMobile) return;

      $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
      var $hover = $(this).addClass('hover-bound'),
      $msg = $hover.siblings('.dupe-hammer-message');

      $hover.hover(
      function() {
      $hover.showInfoMessage('', {
      messageElement: $msg.clone().show(),
      transient: false,
      position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
      dismissable: false,
      relativeToBody: true
      });
      },
      function() {
      StackExchange.helpers.removeMessages();
      }
      );
      });
      });
      11 hours ago


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






      marked as duplicate by Henrik r
      Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

      StackExchange.ready(function() {
      if (StackExchange.options.isMobile) return;

      $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
      var $hover = $(this).addClass('hover-bound'),
      $msg = $hover.siblings('.dupe-hammer-message');

      $hover.hover(
      function() {
      $hover.showInfoMessage('', {
      messageElement: $msg.clone().show(),
      transient: false,
      position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
      dismissable: false,
      relativeToBody: true
      });
      },
      function() {
      StackExchange.helpers.removeMessages();
      }
      );
      });
      });
      11 hours ago


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.


























          3 Answers
          3






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          Consider by to split data frame by position with needed sampling. Then rbind all dfs together outside the loop with do.call().



          df_list <- by(df, df$position, function(sub) sub[sample(1:nrow(sub), 3),])

          final_df <- do.call(rbind, df_list)


          Currently you index the entire (not subsetted) data frame in each iteration. Also, you are using rbind inside a for loop which is memory-intensive and not advised.



          Specifically,





          • by is the object-oriented wrapper to tapply and essentially splits a data frame into subsets by factor(s) and passes each subset into a defined function. Here sub is just the name of subsetted variable (can be named anything). The result here is a list of data frames.


          • do.call essentially runs a compact version of an expanded call across multiple elements where rbind(df1, df2, df3) is equivalent to do.call(rbind, list(df1, df2, df3)). The key here to note is rbind is not called inside a loop (avoiding the danger of growing complex objects like a data frame inside an iteration) but once outside the loop.






          share|improve this answer























          • Could you maybe comment on the key elements of the code such as 'by', 'sub', and 'do-call'? Much appreciated!
            – Chris Ruehlemann
            9 hours ago


















          up vote
          0
          down vote













          Each time you run the loop you are overwriting the last entry. Try:



          df_samples <- data.frame()
          df_samples <- rbind(df_samples, df[sample(1:nrow(df[df$position==i,]), 3),])





          share|improve this answer








          New contributor




          xsabatox is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.

























            up vote
            0
            down vote













            We can use data.table with a group by sample of the row index .I and use that to subset the dataset. This would be very efficient



            i1 <- setDT(df)[, sample(.I, 3), position]$V1
            df[i1]




            Or use sample_n from tidyverse



            library(tidyverse)
            df %>%
            group_by(position) %>%
            sample_n(3)




            Or as a function



            f1 <- function(data) {
            data as.data.table(data)
            i1 <- data[, sample(.I, 3), by = position]$V1
            data[i1]
            }





            share|improve this answer






























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              2
              down vote



              accepted










              Consider by to split data frame by position with needed sampling. Then rbind all dfs together outside the loop with do.call().



              df_list <- by(df, df$position, function(sub) sub[sample(1:nrow(sub), 3),])

              final_df <- do.call(rbind, df_list)


              Currently you index the entire (not subsetted) data frame in each iteration. Also, you are using rbind inside a for loop which is memory-intensive and not advised.



              Specifically,





              • by is the object-oriented wrapper to tapply and essentially splits a data frame into subsets by factor(s) and passes each subset into a defined function. Here sub is just the name of subsetted variable (can be named anything). The result here is a list of data frames.


              • do.call essentially runs a compact version of an expanded call across multiple elements where rbind(df1, df2, df3) is equivalent to do.call(rbind, list(df1, df2, df3)). The key here to note is rbind is not called inside a loop (avoiding the danger of growing complex objects like a data frame inside an iteration) but once outside the loop.






              share|improve this answer























              • Could you maybe comment on the key elements of the code such as 'by', 'sub', and 'do-call'? Much appreciated!
                – Chris Ruehlemann
                9 hours ago















              up vote
              2
              down vote



              accepted










              Consider by to split data frame by position with needed sampling. Then rbind all dfs together outside the loop with do.call().



              df_list <- by(df, df$position, function(sub) sub[sample(1:nrow(sub), 3),])

              final_df <- do.call(rbind, df_list)


              Currently you index the entire (not subsetted) data frame in each iteration. Also, you are using rbind inside a for loop which is memory-intensive and not advised.



              Specifically,





              • by is the object-oriented wrapper to tapply and essentially splits a data frame into subsets by factor(s) and passes each subset into a defined function. Here sub is just the name of subsetted variable (can be named anything). The result here is a list of data frames.


              • do.call essentially runs a compact version of an expanded call across multiple elements where rbind(df1, df2, df3) is equivalent to do.call(rbind, list(df1, df2, df3)). The key here to note is rbind is not called inside a loop (avoiding the danger of growing complex objects like a data frame inside an iteration) but once outside the loop.






              share|improve this answer























              • Could you maybe comment on the key elements of the code such as 'by', 'sub', and 'do-call'? Much appreciated!
                – Chris Ruehlemann
                9 hours ago













              up vote
              2
              down vote



              accepted







              up vote
              2
              down vote



              accepted






              Consider by to split data frame by position with needed sampling. Then rbind all dfs together outside the loop with do.call().



              df_list <- by(df, df$position, function(sub) sub[sample(1:nrow(sub), 3),])

              final_df <- do.call(rbind, df_list)


              Currently you index the entire (not subsetted) data frame in each iteration. Also, you are using rbind inside a for loop which is memory-intensive and not advised.



              Specifically,





              • by is the object-oriented wrapper to tapply and essentially splits a data frame into subsets by factor(s) and passes each subset into a defined function. Here sub is just the name of subsetted variable (can be named anything). The result here is a list of data frames.


              • do.call essentially runs a compact version of an expanded call across multiple elements where rbind(df1, df2, df3) is equivalent to do.call(rbind, list(df1, df2, df3)). The key here to note is rbind is not called inside a loop (avoiding the danger of growing complex objects like a data frame inside an iteration) but once outside the loop.






              share|improve this answer














              Consider by to split data frame by position with needed sampling. Then rbind all dfs together outside the loop with do.call().



              df_list <- by(df, df$position, function(sub) sub[sample(1:nrow(sub), 3),])

              final_df <- do.call(rbind, df_list)


              Currently you index the entire (not subsetted) data frame in each iteration. Also, you are using rbind inside a for loop which is memory-intensive and not advised.



              Specifically,





              • by is the object-oriented wrapper to tapply and essentially splits a data frame into subsets by factor(s) and passes each subset into a defined function. Here sub is just the name of subsetted variable (can be named anything). The result here is a list of data frames.


              • do.call essentially runs a compact version of an expanded call across multiple elements where rbind(df1, df2, df3) is equivalent to do.call(rbind, list(df1, df2, df3)). The key here to note is rbind is not called inside a loop (avoiding the danger of growing complex objects like a data frame inside an iteration) but once outside the loop.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 7 hours ago

























              answered 11 hours ago









              Parfait

              47.9k84066




              47.9k84066












              • Could you maybe comment on the key elements of the code such as 'by', 'sub', and 'do-call'? Much appreciated!
                – Chris Ruehlemann
                9 hours ago


















              • Could you maybe comment on the key elements of the code such as 'by', 'sub', and 'do-call'? Much appreciated!
                – Chris Ruehlemann
                9 hours ago
















              Could you maybe comment on the key elements of the code such as 'by', 'sub', and 'do-call'? Much appreciated!
              – Chris Ruehlemann
              9 hours ago




              Could you maybe comment on the key elements of the code such as 'by', 'sub', and 'do-call'? Much appreciated!
              – Chris Ruehlemann
              9 hours ago












              up vote
              0
              down vote













              Each time you run the loop you are overwriting the last entry. Try:



              df_samples <- data.frame()
              df_samples <- rbind(df_samples, df[sample(1:nrow(df[df$position==i,]), 3),])





              share|improve this answer








              New contributor




              xsabatox is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






















                up vote
                0
                down vote













                Each time you run the loop you are overwriting the last entry. Try:



                df_samples <- data.frame()
                df_samples <- rbind(df_samples, df[sample(1:nrow(df[df$position==i,]), 3),])





                share|improve this answer








                New contributor




                xsabatox is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.




















                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  Each time you run the loop you are overwriting the last entry. Try:



                  df_samples <- data.frame()
                  df_samples <- rbind(df_samples, df[sample(1:nrow(df[df$position==i,]), 3),])





                  share|improve this answer








                  New contributor




                  xsabatox is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  Each time you run the loop you are overwriting the last entry. Try:



                  df_samples <- data.frame()
                  df_samples <- rbind(df_samples, df[sample(1:nrow(df[df$position==i,]), 3),])






                  share|improve this answer








                  New contributor




                  xsabatox is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  share|improve this answer



                  share|improve this answer






                  New contributor




                  xsabatox is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  answered 11 hours ago









                  xsabatox

                  1




                  1




                  New contributor




                  xsabatox is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





                  New contributor





                  xsabatox is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  xsabatox is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






















                      up vote
                      0
                      down vote













                      We can use data.table with a group by sample of the row index .I and use that to subset the dataset. This would be very efficient



                      i1 <- setDT(df)[, sample(.I, 3), position]$V1
                      df[i1]




                      Or use sample_n from tidyverse



                      library(tidyverse)
                      df %>%
                      group_by(position) %>%
                      sample_n(3)




                      Or as a function



                      f1 <- function(data) {
                      data as.data.table(data)
                      i1 <- data[, sample(.I, 3), by = position]$V1
                      data[i1]
                      }





                      share|improve this answer



























                        up vote
                        0
                        down vote













                        We can use data.table with a group by sample of the row index .I and use that to subset the dataset. This would be very efficient



                        i1 <- setDT(df)[, sample(.I, 3), position]$V1
                        df[i1]




                        Or use sample_n from tidyverse



                        library(tidyverse)
                        df %>%
                        group_by(position) %>%
                        sample_n(3)




                        Or as a function



                        f1 <- function(data) {
                        data as.data.table(data)
                        i1 <- data[, sample(.I, 3), by = position]$V1
                        data[i1]
                        }





                        share|improve this answer

























                          up vote
                          0
                          down vote










                          up vote
                          0
                          down vote









                          We can use data.table with a group by sample of the row index .I and use that to subset the dataset. This would be very efficient



                          i1 <- setDT(df)[, sample(.I, 3), position]$V1
                          df[i1]




                          Or use sample_n from tidyverse



                          library(tidyverse)
                          df %>%
                          group_by(position) %>%
                          sample_n(3)




                          Or as a function



                          f1 <- function(data) {
                          data as.data.table(data)
                          i1 <- data[, sample(.I, 3), by = position]$V1
                          data[i1]
                          }





                          share|improve this answer














                          We can use data.table with a group by sample of the row index .I and use that to subset the dataset. This would be very efficient



                          i1 <- setDT(df)[, sample(.I, 3), position]$V1
                          df[i1]




                          Or use sample_n from tidyverse



                          library(tidyverse)
                          df %>%
                          group_by(position) %>%
                          sample_n(3)




                          Or as a function



                          f1 <- function(data) {
                          data as.data.table(data)
                          i1 <- data[, sample(.I, 3), by = position]$V1
                          data[i1]
                          }






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited 11 hours ago

























                          answered 11 hours ago









                          akrun

                          388k13177250




                          388k13177250















                              Popular posts from this blog

                              If I really need a card on my start hand, how many mulligans make sense? [duplicate]

                              Alcedinidae

                              Can an atomic nucleus contain both particles and antiparticles? [duplicate]