Import cosine similarities into a clustering method [Python 3]











up vote
-1
down vote

favorite












I am currently having a dilemma of how to use the cosine similarities into a clustering method. What I did was take the public firms in the US for a given year, get their business description and then computed the cosine similarity between the documents. For example, this is what a portion of the data looks like:



enter image description here



Now, the next step is to group those companies into groups based on the similarity between them. However, I don't know how to put the similarity scores to use. I looked at a few tutorials online on how to use clustering algorithms in python with Scikit-learn, but from what I have seen, they had an X and Y variable for a data point, while I have one measure between 2 variables (company 1 and company 2). If anyone can point me in the right direct or has some insight about what I should do please share. Much appreciated! Thank you for you time!




If you have any questions please let me know











share|improve this question


























    up vote
    -1
    down vote

    favorite












    I am currently having a dilemma of how to use the cosine similarities into a clustering method. What I did was take the public firms in the US for a given year, get their business description and then computed the cosine similarity between the documents. For example, this is what a portion of the data looks like:



    enter image description here



    Now, the next step is to group those companies into groups based on the similarity between them. However, I don't know how to put the similarity scores to use. I looked at a few tutorials online on how to use clustering algorithms in python with Scikit-learn, but from what I have seen, they had an X and Y variable for a data point, while I have one measure between 2 variables (company 1 and company 2). If anyone can point me in the right direct or has some insight about what I should do please share. Much appreciated! Thank you for you time!




    If you have any questions please let me know











    share|improve this question
























      up vote
      -1
      down vote

      favorite









      up vote
      -1
      down vote

      favorite











      I am currently having a dilemma of how to use the cosine similarities into a clustering method. What I did was take the public firms in the US for a given year, get their business description and then computed the cosine similarity between the documents. For example, this is what a portion of the data looks like:



      enter image description here



      Now, the next step is to group those companies into groups based on the similarity between them. However, I don't know how to put the similarity scores to use. I looked at a few tutorials online on how to use clustering algorithms in python with Scikit-learn, but from what I have seen, they had an X and Y variable for a data point, while I have one measure between 2 variables (company 1 and company 2). If anyone can point me in the right direct or has some insight about what I should do please share. Much appreciated! Thank you for you time!




      If you have any questions please let me know











      share|improve this question













      I am currently having a dilemma of how to use the cosine similarities into a clustering method. What I did was take the public firms in the US for a given year, get their business description and then computed the cosine similarity between the documents. For example, this is what a portion of the data looks like:



      enter image description here



      Now, the next step is to group those companies into groups based on the similarity between them. However, I don't know how to put the similarity scores to use. I looked at a few tutorials online on how to use clustering algorithms in python with Scikit-learn, but from what I have seen, they had an X and Y variable for a data point, while I have one measure between 2 variables (company 1 and company 2). If anyone can point me in the right direct or has some insight about what I should do please share. Much appreciated! Thank you for you time!




      If you have any questions please let me know








      python-3.x cluster-analysis hierarchical-clustering cosine-similarity






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 18 at 13:45









      Adrian

      678




      678





























          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53361556%2fimport-cosine-similarities-into-a-clustering-method-python-3%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown






























          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53361556%2fimport-cosine-similarities-into-a-clustering-method-python-3%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

          Alcedinidae

          Origin of the phrase “under your belt”?