Creation prediction function for kmean in R











up vote
1
down vote

favorite












I want create predict function which predicts for which cluster, observation belong



data(iris)
mydata=iris
m=mydata[1:4]
train=head(m,100)
xNew=head(m,10)


rownames(train)<-1:nrow(train)

norm_eucl=function(train)
train/apply(train,1,function(x)sum(x^2)^.5)
m_norm=norm_eucl(train)


result=kmeans(m_norm,3,30)

predict.kmean <- function(cluster, newdata)
{
simMat <- m_norm(rbind(cluster, newdata),
sel=(1:nrow(newdata)) + nrow(cluster))[1:nrow(cluster), ]
unname(apply(simMat, 2, which.max))
}

## assign new data samples to exemplars
predict.kmean(m_norm, x[result$cluster, ], xNew)


After i get the error



Error in predict.kmean(m_norm, x[result$cluster, ], xNew) : 
unused argument (xNew)


i understand that i am making something wrong function, cause I'm just learning to do it, but I can't understand where exactly.



indeed i want adopt similar function of apcluster ( i had seen similar topic, but for apcluster)



predict.apcluster <- function(s, exemplars, newdata)
{
simMat <- s(rbind(exemplars, newdata),
sel=(1:nrow(newdata)) + nrow(exemplars))[1:nrow(exemplars), ]
unname(apply(simMat, 2, which.max))
}

## assign new data samples to exemplars
predict.apcluster(negDistMat(r=2), x[apres@exemplars, ], xNew)


how to do it?










share|improve this question


























    up vote
    1
    down vote

    favorite












    I want create predict function which predicts for which cluster, observation belong



    data(iris)
    mydata=iris
    m=mydata[1:4]
    train=head(m,100)
    xNew=head(m,10)


    rownames(train)<-1:nrow(train)

    norm_eucl=function(train)
    train/apply(train,1,function(x)sum(x^2)^.5)
    m_norm=norm_eucl(train)


    result=kmeans(m_norm,3,30)

    predict.kmean <- function(cluster, newdata)
    {
    simMat <- m_norm(rbind(cluster, newdata),
    sel=(1:nrow(newdata)) + nrow(cluster))[1:nrow(cluster), ]
    unname(apply(simMat, 2, which.max))
    }

    ## assign new data samples to exemplars
    predict.kmean(m_norm, x[result$cluster, ], xNew)


    After i get the error



    Error in predict.kmean(m_norm, x[result$cluster, ], xNew) : 
    unused argument (xNew)


    i understand that i am making something wrong function, cause I'm just learning to do it, but I can't understand where exactly.



    indeed i want adopt similar function of apcluster ( i had seen similar topic, but for apcluster)



    predict.apcluster <- function(s, exemplars, newdata)
    {
    simMat <- s(rbind(exemplars, newdata),
    sel=(1:nrow(newdata)) + nrow(exemplars))[1:nrow(exemplars), ]
    unname(apply(simMat, 2, which.max))
    }

    ## assign new data samples to exemplars
    predict.apcluster(negDistMat(r=2), x[apres@exemplars, ], xNew)


    how to do it?










    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I want create predict function which predicts for which cluster, observation belong



      data(iris)
      mydata=iris
      m=mydata[1:4]
      train=head(m,100)
      xNew=head(m,10)


      rownames(train)<-1:nrow(train)

      norm_eucl=function(train)
      train/apply(train,1,function(x)sum(x^2)^.5)
      m_norm=norm_eucl(train)


      result=kmeans(m_norm,3,30)

      predict.kmean <- function(cluster, newdata)
      {
      simMat <- m_norm(rbind(cluster, newdata),
      sel=(1:nrow(newdata)) + nrow(cluster))[1:nrow(cluster), ]
      unname(apply(simMat, 2, which.max))
      }

      ## assign new data samples to exemplars
      predict.kmean(m_norm, x[result$cluster, ], xNew)


      After i get the error



      Error in predict.kmean(m_norm, x[result$cluster, ], xNew) : 
      unused argument (xNew)


      i understand that i am making something wrong function, cause I'm just learning to do it, but I can't understand where exactly.



      indeed i want adopt similar function of apcluster ( i had seen similar topic, but for apcluster)



      predict.apcluster <- function(s, exemplars, newdata)
      {
      simMat <- s(rbind(exemplars, newdata),
      sel=(1:nrow(newdata)) + nrow(exemplars))[1:nrow(exemplars), ]
      unname(apply(simMat, 2, which.max))
      }

      ## assign new data samples to exemplars
      predict.apcluster(negDistMat(r=2), x[apres@exemplars, ], xNew)


      how to do it?










      share|improve this question













      I want create predict function which predicts for which cluster, observation belong



      data(iris)
      mydata=iris
      m=mydata[1:4]
      train=head(m,100)
      xNew=head(m,10)


      rownames(train)<-1:nrow(train)

      norm_eucl=function(train)
      train/apply(train,1,function(x)sum(x^2)^.5)
      m_norm=norm_eucl(train)


      result=kmeans(m_norm,3,30)

      predict.kmean <- function(cluster, newdata)
      {
      simMat <- m_norm(rbind(cluster, newdata),
      sel=(1:nrow(newdata)) + nrow(cluster))[1:nrow(cluster), ]
      unname(apply(simMat, 2, which.max))
      }

      ## assign new data samples to exemplars
      predict.kmean(m_norm, x[result$cluster, ], xNew)


      After i get the error



      Error in predict.kmean(m_norm, x[result$cluster, ], xNew) : 
      unused argument (xNew)


      i understand that i am making something wrong function, cause I'm just learning to do it, but I can't understand where exactly.



      indeed i want adopt similar function of apcluster ( i had seen similar topic, but for apcluster)



      predict.apcluster <- function(s, exemplars, newdata)
      {
      simMat <- s(rbind(exemplars, newdata),
      sel=(1:nrow(newdata)) + nrow(exemplars))[1:nrow(exemplars), ]
      unname(apply(simMat, 2, which.max))
      }

      ## assign new data samples to exemplars
      predict.apcluster(negDistMat(r=2), x[apres@exemplars, ], xNew)


      how to do it?







      r k-means






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 17 at 15:00









      d-max

      728




      728
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          Rather than trying to replicate something, let's come up with our own function. For a given vector x, we want to assign a cluster using some prior k-means output. Given how k-means algorithm works, what we want is to find which cluster's center is closest to x. That can be done as



          predict.kmeans <- function(x, newdata)
          apply(newdata, 1, function(r) which.min(colSums((t(x$centers) - r)^2)))


          That is, we go over newdata row by row and compute the corresponding row's distance to each of the centers and find the minimal one. Then, e.g.,



          head(predict(result, train / sqrt(rowSums(train^2))), 3)
          # 1 2 3
          # 2 2 2
          all.equal(predict(result, train / sqrt(rowSums(train^2))), result$cluster)
          # [1] TRUE


          which confirms that our predicting function assigned all the same clusters to the training observations. Then also



          predict(result, xNew / sqrt(rowSums(xNew^2)))
          # 1 2 3 4 5 6 7 8 9 10
          # 2 2 2 2 2 2 2 2 2 2


          Notice also that I'm calling simply predict rather than predict.kmeans. That is because result is of class kmeans and a right method is automatically chosen. Also notice how I normalize the data in a vectorized manner, without using apply.






          share|improve this answer





















          • I am ashamed to ask you to help, because you have already helped me two times. But can you help in this topic? stackoverflow.com/questions/53359595/…
            – d-max
            Nov 18 at 9:49











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53352409%2fcreation-prediction-function-for-kmean-in-r%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          2
          down vote



          accepted










          Rather than trying to replicate something, let's come up with our own function. For a given vector x, we want to assign a cluster using some prior k-means output. Given how k-means algorithm works, what we want is to find which cluster's center is closest to x. That can be done as



          predict.kmeans <- function(x, newdata)
          apply(newdata, 1, function(r) which.min(colSums((t(x$centers) - r)^2)))


          That is, we go over newdata row by row and compute the corresponding row's distance to each of the centers and find the minimal one. Then, e.g.,



          head(predict(result, train / sqrt(rowSums(train^2))), 3)
          # 1 2 3
          # 2 2 2
          all.equal(predict(result, train / sqrt(rowSums(train^2))), result$cluster)
          # [1] TRUE


          which confirms that our predicting function assigned all the same clusters to the training observations. Then also



          predict(result, xNew / sqrt(rowSums(xNew^2)))
          # 1 2 3 4 5 6 7 8 9 10
          # 2 2 2 2 2 2 2 2 2 2


          Notice also that I'm calling simply predict rather than predict.kmeans. That is because result is of class kmeans and a right method is automatically chosen. Also notice how I normalize the data in a vectorized manner, without using apply.






          share|improve this answer





















          • I am ashamed to ask you to help, because you have already helped me two times. But can you help in this topic? stackoverflow.com/questions/53359595/…
            – d-max
            Nov 18 at 9:49















          up vote
          2
          down vote



          accepted










          Rather than trying to replicate something, let's come up with our own function. For a given vector x, we want to assign a cluster using some prior k-means output. Given how k-means algorithm works, what we want is to find which cluster's center is closest to x. That can be done as



          predict.kmeans <- function(x, newdata)
          apply(newdata, 1, function(r) which.min(colSums((t(x$centers) - r)^2)))


          That is, we go over newdata row by row and compute the corresponding row's distance to each of the centers and find the minimal one. Then, e.g.,



          head(predict(result, train / sqrt(rowSums(train^2))), 3)
          # 1 2 3
          # 2 2 2
          all.equal(predict(result, train / sqrt(rowSums(train^2))), result$cluster)
          # [1] TRUE


          which confirms that our predicting function assigned all the same clusters to the training observations. Then also



          predict(result, xNew / sqrt(rowSums(xNew^2)))
          # 1 2 3 4 5 6 7 8 9 10
          # 2 2 2 2 2 2 2 2 2 2


          Notice also that I'm calling simply predict rather than predict.kmeans. That is because result is of class kmeans and a right method is automatically chosen. Also notice how I normalize the data in a vectorized manner, without using apply.






          share|improve this answer





















          • I am ashamed to ask you to help, because you have already helped me two times. But can you help in this topic? stackoverflow.com/questions/53359595/…
            – d-max
            Nov 18 at 9:49













          up vote
          2
          down vote



          accepted







          up vote
          2
          down vote



          accepted






          Rather than trying to replicate something, let's come up with our own function. For a given vector x, we want to assign a cluster using some prior k-means output. Given how k-means algorithm works, what we want is to find which cluster's center is closest to x. That can be done as



          predict.kmeans <- function(x, newdata)
          apply(newdata, 1, function(r) which.min(colSums((t(x$centers) - r)^2)))


          That is, we go over newdata row by row and compute the corresponding row's distance to each of the centers and find the minimal one. Then, e.g.,



          head(predict(result, train / sqrt(rowSums(train^2))), 3)
          # 1 2 3
          # 2 2 2
          all.equal(predict(result, train / sqrt(rowSums(train^2))), result$cluster)
          # [1] TRUE


          which confirms that our predicting function assigned all the same clusters to the training observations. Then also



          predict(result, xNew / sqrt(rowSums(xNew^2)))
          # 1 2 3 4 5 6 7 8 9 10
          # 2 2 2 2 2 2 2 2 2 2


          Notice also that I'm calling simply predict rather than predict.kmeans. That is because result is of class kmeans and a right method is automatically chosen. Also notice how I normalize the data in a vectorized manner, without using apply.






          share|improve this answer












          Rather than trying to replicate something, let's come up with our own function. For a given vector x, we want to assign a cluster using some prior k-means output. Given how k-means algorithm works, what we want is to find which cluster's center is closest to x. That can be done as



          predict.kmeans <- function(x, newdata)
          apply(newdata, 1, function(r) which.min(colSums((t(x$centers) - r)^2)))


          That is, we go over newdata row by row and compute the corresponding row's distance to each of the centers and find the minimal one. Then, e.g.,



          head(predict(result, train / sqrt(rowSums(train^2))), 3)
          # 1 2 3
          # 2 2 2
          all.equal(predict(result, train / sqrt(rowSums(train^2))), result$cluster)
          # [1] TRUE


          which confirms that our predicting function assigned all the same clusters to the training observations. Then also



          predict(result, xNew / sqrt(rowSums(xNew^2)))
          # 1 2 3 4 5 6 7 8 9 10
          # 2 2 2 2 2 2 2 2 2 2


          Notice also that I'm calling simply predict rather than predict.kmeans. That is because result is of class kmeans and a right method is automatically chosen. Also notice how I normalize the data in a vectorized manner, without using apply.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 17 at 16:00









          Julius Vainora

          27k75877




          27k75877












          • I am ashamed to ask you to help, because you have already helped me two times. But can you help in this topic? stackoverflow.com/questions/53359595/…
            – d-max
            Nov 18 at 9:49


















          • I am ashamed to ask you to help, because you have already helped me two times. But can you help in this topic? stackoverflow.com/questions/53359595/…
            – d-max
            Nov 18 at 9:49
















          I am ashamed to ask you to help, because you have already helped me two times. But can you help in this topic? stackoverflow.com/questions/53359595/…
          – d-max
          Nov 18 at 9:49




          I am ashamed to ask you to help, because you have already helped me two times. But can you help in this topic? stackoverflow.com/questions/53359595/…
          – d-max
          Nov 18 at 9:49


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53352409%2fcreation-prediction-function-for-kmean-in-r%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

          Alcedinidae

          Origin of the phrase “under your belt”?