find elements by xpath selenium phantomjs





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















I'm using Rselenium for scrapping. For this, I have installed java and JDK's, chromedriver, selenium server standalone and the headless browser phantomjs in my VM instance of Google Cloud.



I need to catch the text of the first rating:



remDr <- remoteDriver(browserName = 'chrome', port = 4444L)
remDr$open()
remDr$setWindowSize(1280L, 1024L)
remDr$navigate("https://www.ratebeer.com/reviews/sullerica-1561/294423")
text_post = remDr$findElements("xpath",'//*[@id="root"]/div/div[2]/div/div[2]/div[2]/div/div[1]/div[3]/div/div[2]/div[1]/div/div[2]/div/div[1]/div/div/div[1]')

text_post
## list()


Finally text_post is empty.



However, If I test the same script on my local laptop with RSelenium, chrome browser and the same XPath, it's a success!



What's going on?



Is it due to using phantomjs?



Thanks in advance.



sessionInfo()

R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS









share|improve this question































    1















    I'm using Rselenium for scrapping. For this, I have installed java and JDK's, chromedriver, selenium server standalone and the headless browser phantomjs in my VM instance of Google Cloud.



    I need to catch the text of the first rating:



    remDr <- remoteDriver(browserName = 'chrome', port = 4444L)
    remDr$open()
    remDr$setWindowSize(1280L, 1024L)
    remDr$navigate("https://www.ratebeer.com/reviews/sullerica-1561/294423")
    text_post = remDr$findElements("xpath",'//*[@id="root"]/div/div[2]/div/div[2]/div[2]/div/div[1]/div[3]/div/div[2]/div[1]/div/div[2]/div/div[1]/div/div/div[1]')

    text_post
    ## list()


    Finally text_post is empty.



    However, If I test the same script on my local laptop with RSelenium, chrome browser and the same XPath, it's a success!



    What's going on?



    Is it due to using phantomjs?



    Thanks in advance.



    sessionInfo()

    R version 3.4.4 (2018-03-15)
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: Ubuntu 16.04.5 LTS









    share|improve this question



























      1












      1








      1








      I'm using Rselenium for scrapping. For this, I have installed java and JDK's, chromedriver, selenium server standalone and the headless browser phantomjs in my VM instance of Google Cloud.



      I need to catch the text of the first rating:



      remDr <- remoteDriver(browserName = 'chrome', port = 4444L)
      remDr$open()
      remDr$setWindowSize(1280L, 1024L)
      remDr$navigate("https://www.ratebeer.com/reviews/sullerica-1561/294423")
      text_post = remDr$findElements("xpath",'//*[@id="root"]/div/div[2]/div/div[2]/div[2]/div/div[1]/div[3]/div/div[2]/div[1]/div/div[2]/div/div[1]/div/div/div[1]')

      text_post
      ## list()


      Finally text_post is empty.



      However, If I test the same script on my local laptop with RSelenium, chrome browser and the same XPath, it's a success!



      What's going on?



      Is it due to using phantomjs?



      Thanks in advance.



      sessionInfo()

      R version 3.4.4 (2018-03-15)
      Platform: x86_64-pc-linux-gnu (64-bit)
      Running under: Ubuntu 16.04.5 LTS









      share|improve this question
















      I'm using Rselenium for scrapping. For this, I have installed java and JDK's, chromedriver, selenium server standalone and the headless browser phantomjs in my VM instance of Google Cloud.



      I need to catch the text of the first rating:



      remDr <- remoteDriver(browserName = 'chrome', port = 4444L)
      remDr$open()
      remDr$setWindowSize(1280L, 1024L)
      remDr$navigate("https://www.ratebeer.com/reviews/sullerica-1561/294423")
      text_post = remDr$findElements("xpath",'//*[@id="root"]/div/div[2]/div/div[2]/div[2]/div/div[1]/div[3]/div/div[2]/div[1]/div/div[2]/div/div[1]/div/div/div[1]')

      text_post
      ## list()


      Finally text_post is empty.



      However, If I test the same script on my local laptop with RSelenium, chrome browser and the same XPath, it's a success!



      What's going on?



      Is it due to using phantomjs?



      Thanks in advance.



      sessionInfo()

      R version 3.4.4 (2018-03-15)
      Platform: x86_64-pc-linux-gnu (64-bit)
      Running under: Ubuntu 16.04.5 LTS






      r selenium selenium-webdriver phantomjs rselenium






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 23 '18 at 13:04









      hrbrmstr

      62k694154




      62k694154










      asked Nov 23 '18 at 12:23









      Mario M.Mario M.

      354213




      354213
























          2 Answers
          2






          active

          oldest

          votes


















          0














          As per the HTML you can use the xpath as:



          //div[@id="root"]//span[contains(.,'20')]//following::div[contains(@class,'LinesEllipsis')]


          Note: As the elements are dynamically generated elements you have to induce WebDriverWait for the Elements to be visible.






          share|improve this answer
























          • @MarioM. Upvote the answer if this/any answer is/was helpful to you for the benefit of the future readers.

            – DebanjanB
            Nov 26 '18 at 11:15



















          0














          You don't need a heavyweight, third-party dependency. That site uses graphql POST requests under the hood in asynchronous XHR requests to retrieve the data. You can see it if you open Developer Tools andenter image description here



          I did a "Copy POST Data" (usually the same or rly similar context menu item in all browsers) and un-minimized the graphql query in the Response tab just to show you what it is and to also, perhaps, make it easier for you to see the query and augment it on your own (what I just said is out of scope for "but what about…" follow on questions in comments; please file a new question if you want help with that).



          '[
          {
          "operationName": "beer",
          "query": "query beer($beerId: ID!) {n info: beer(id: $beerId) {n idn namen __typenamen }n}n",
          "variables": {
          "beerId": "294423"
          }
          },
          {
          "operationName": "beer",
          "query": "query beer($beerId: ID!) {n info: beer(id: $beerId) {n idn namen styleScoren overallScoren averageRatingn ratingCountn __typenamen }n}n",
          "variables": {
          "beerId": "294423"
          }
          },
          {
          "operationName": "beerReviews",
          "query": "query beerReviews($beerId: ID!, $authorId: ID, $order: ReviewOrder, $after: ID) {n beerReviewsArr: beerReviews(beerId: $beerId, authorId: $authorId, order: $order, after: $after) {n items {n ...ReviewItemn __typenamen }n totalCountn lastn __typenamen }n}nnfragment ReviewItem on Review {n idn commentn scoren scores {n appearancen aroman flavorn mouthfeeln overalln __typenamen }n author {n idn usernamen reviewCountn __typenamen }n checkin {n idn place {n idn namen cityn state {n idn namen __typenamen }n country {n idn namen __typenamen }n __typenamen }n __typenamen }n servedInn likeCountn likedByMen createdAtn updatedAtn __typenamen}n",
          "variables": {
          "beerId": "294423",
          "first": 7,
          "order": "RECENT"
          }
          }
          ]' -> graphql_query


          We will need to scrunch that back into one line for the API call (which I do with the gsub() below. We also need to manually specify the content type and ensure httr does not try to mangle the body data by setting the encoding to raw:



          httr::POST(
          url = "https://beta.ratebeer.com/v1/api/graphql/",
          httr::content_type("application/json"),
          encode = "raw",
          body = gsub("n", " ", graphql_query),
          httr::verbose()
          ) -> res


          Now we have a structured, but heavily nested, list with your ifo in it. Pretty sure you're after items below:



          str(httr::content(res), 4)
          ## List of 3
          ## $ :List of 1
          ## ..$ data:List of 1
          ## .. ..$ info:List of 3
          ## .. .. ..$ id : chr "294423"
          ## .. .. ..$ name : chr "Sullerica 1561"
          ## .. .. ..$ __typename: chr "Beer"
          ## $ :List of 1
          ## ..$ data:List of 1
          ## .. ..$ info:List of 7
          ## .. .. ..$ id : chr "294423"
          ## .. .. ..$ name : chr "Sullerica 1561"
          ## .. .. ..$ styleScore : num 35.1
          ## .. .. ..$ overallScore : num 51.8
          ## .. .. ..$ averageRating: num 3.25
          ## .. .. ..$ ratingCount : int 21
          ## .. .. ..$ __typename : chr "Beer"
          ## $ :List of 1
          ## ..$ data:List of 1
          ## .. ..$ beerReviewsArr:List of 4
          ## .. .. ..$ items :List of 10
          ## .. .. ..$ totalCount: int 21
          ## .. .. ..$ last : chr "7177326"
          ## .. .. ..$ __typename: chr "ReviewList"


          It does only have 10 out of 21 so scroll down in your browser window with Developer Tools open and look at the second POST request that gets made, see what parameters changed and now you will have an even better idea of how to access the site's back-end API vs have to scrape for content.






          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53446669%2ffind-elements-by-xpath-selenium-phantomjs%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            As per the HTML you can use the xpath as:



            //div[@id="root"]//span[contains(.,'20')]//following::div[contains(@class,'LinesEllipsis')]


            Note: As the elements are dynamically generated elements you have to induce WebDriverWait for the Elements to be visible.






            share|improve this answer
























            • @MarioM. Upvote the answer if this/any answer is/was helpful to you for the benefit of the future readers.

              – DebanjanB
              Nov 26 '18 at 11:15
















            0














            As per the HTML you can use the xpath as:



            //div[@id="root"]//span[contains(.,'20')]//following::div[contains(@class,'LinesEllipsis')]


            Note: As the elements are dynamically generated elements you have to induce WebDriverWait for the Elements to be visible.






            share|improve this answer
























            • @MarioM. Upvote the answer if this/any answer is/was helpful to you for the benefit of the future readers.

              – DebanjanB
              Nov 26 '18 at 11:15














            0












            0








            0







            As per the HTML you can use the xpath as:



            //div[@id="root"]//span[contains(.,'20')]//following::div[contains(@class,'LinesEllipsis')]


            Note: As the elements are dynamically generated elements you have to induce WebDriverWait for the Elements to be visible.






            share|improve this answer













            As per the HTML you can use the xpath as:



            //div[@id="root"]//span[contains(.,'20')]//following::div[contains(@class,'LinesEllipsis')]


            Note: As the elements are dynamically generated elements you have to induce WebDriverWait for the Elements to be visible.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 23 '18 at 12:42









            DebanjanBDebanjanB

            46.8k134790




            46.8k134790













            • @MarioM. Upvote the answer if this/any answer is/was helpful to you for the benefit of the future readers.

              – DebanjanB
              Nov 26 '18 at 11:15



















            • @MarioM. Upvote the answer if this/any answer is/was helpful to you for the benefit of the future readers.

              – DebanjanB
              Nov 26 '18 at 11:15

















            @MarioM. Upvote the answer if this/any answer is/was helpful to you for the benefit of the future readers.

            – DebanjanB
            Nov 26 '18 at 11:15





            @MarioM. Upvote the answer if this/any answer is/was helpful to you for the benefit of the future readers.

            – DebanjanB
            Nov 26 '18 at 11:15













            0














            You don't need a heavyweight, third-party dependency. That site uses graphql POST requests under the hood in asynchronous XHR requests to retrieve the data. You can see it if you open Developer Tools andenter image description here



            I did a "Copy POST Data" (usually the same or rly similar context menu item in all browsers) and un-minimized the graphql query in the Response tab just to show you what it is and to also, perhaps, make it easier for you to see the query and augment it on your own (what I just said is out of scope for "but what about…" follow on questions in comments; please file a new question if you want help with that).



            '[
            {
            "operationName": "beer",
            "query": "query beer($beerId: ID!) {n info: beer(id: $beerId) {n idn namen __typenamen }n}n",
            "variables": {
            "beerId": "294423"
            }
            },
            {
            "operationName": "beer",
            "query": "query beer($beerId: ID!) {n info: beer(id: $beerId) {n idn namen styleScoren overallScoren averageRatingn ratingCountn __typenamen }n}n",
            "variables": {
            "beerId": "294423"
            }
            },
            {
            "operationName": "beerReviews",
            "query": "query beerReviews($beerId: ID!, $authorId: ID, $order: ReviewOrder, $after: ID) {n beerReviewsArr: beerReviews(beerId: $beerId, authorId: $authorId, order: $order, after: $after) {n items {n ...ReviewItemn __typenamen }n totalCountn lastn __typenamen }n}nnfragment ReviewItem on Review {n idn commentn scoren scores {n appearancen aroman flavorn mouthfeeln overalln __typenamen }n author {n idn usernamen reviewCountn __typenamen }n checkin {n idn place {n idn namen cityn state {n idn namen __typenamen }n country {n idn namen __typenamen }n __typenamen }n __typenamen }n servedInn likeCountn likedByMen createdAtn updatedAtn __typenamen}n",
            "variables": {
            "beerId": "294423",
            "first": 7,
            "order": "RECENT"
            }
            }
            ]' -> graphql_query


            We will need to scrunch that back into one line for the API call (which I do with the gsub() below. We also need to manually specify the content type and ensure httr does not try to mangle the body data by setting the encoding to raw:



            httr::POST(
            url = "https://beta.ratebeer.com/v1/api/graphql/",
            httr::content_type("application/json"),
            encode = "raw",
            body = gsub("n", " ", graphql_query),
            httr::verbose()
            ) -> res


            Now we have a structured, but heavily nested, list with your ifo in it. Pretty sure you're after items below:



            str(httr::content(res), 4)
            ## List of 3
            ## $ :List of 1
            ## ..$ data:List of 1
            ## .. ..$ info:List of 3
            ## .. .. ..$ id : chr "294423"
            ## .. .. ..$ name : chr "Sullerica 1561"
            ## .. .. ..$ __typename: chr "Beer"
            ## $ :List of 1
            ## ..$ data:List of 1
            ## .. ..$ info:List of 7
            ## .. .. ..$ id : chr "294423"
            ## .. .. ..$ name : chr "Sullerica 1561"
            ## .. .. ..$ styleScore : num 35.1
            ## .. .. ..$ overallScore : num 51.8
            ## .. .. ..$ averageRating: num 3.25
            ## .. .. ..$ ratingCount : int 21
            ## .. .. ..$ __typename : chr "Beer"
            ## $ :List of 1
            ## ..$ data:List of 1
            ## .. ..$ beerReviewsArr:List of 4
            ## .. .. ..$ items :List of 10
            ## .. .. ..$ totalCount: int 21
            ## .. .. ..$ last : chr "7177326"
            ## .. .. ..$ __typename: chr "ReviewList"


            It does only have 10 out of 21 so scroll down in your browser window with Developer Tools open and look at the second POST request that gets made, see what parameters changed and now you will have an even better idea of how to access the site's back-end API vs have to scrape for content.






            share|improve this answer




























              0














              You don't need a heavyweight, third-party dependency. That site uses graphql POST requests under the hood in asynchronous XHR requests to retrieve the data. You can see it if you open Developer Tools andenter image description here



              I did a "Copy POST Data" (usually the same or rly similar context menu item in all browsers) and un-minimized the graphql query in the Response tab just to show you what it is and to also, perhaps, make it easier for you to see the query and augment it on your own (what I just said is out of scope for "but what about…" follow on questions in comments; please file a new question if you want help with that).



              '[
              {
              "operationName": "beer",
              "query": "query beer($beerId: ID!) {n info: beer(id: $beerId) {n idn namen __typenamen }n}n",
              "variables": {
              "beerId": "294423"
              }
              },
              {
              "operationName": "beer",
              "query": "query beer($beerId: ID!) {n info: beer(id: $beerId) {n idn namen styleScoren overallScoren averageRatingn ratingCountn __typenamen }n}n",
              "variables": {
              "beerId": "294423"
              }
              },
              {
              "operationName": "beerReviews",
              "query": "query beerReviews($beerId: ID!, $authorId: ID, $order: ReviewOrder, $after: ID) {n beerReviewsArr: beerReviews(beerId: $beerId, authorId: $authorId, order: $order, after: $after) {n items {n ...ReviewItemn __typenamen }n totalCountn lastn __typenamen }n}nnfragment ReviewItem on Review {n idn commentn scoren scores {n appearancen aroman flavorn mouthfeeln overalln __typenamen }n author {n idn usernamen reviewCountn __typenamen }n checkin {n idn place {n idn namen cityn state {n idn namen __typenamen }n country {n idn namen __typenamen }n __typenamen }n __typenamen }n servedInn likeCountn likedByMen createdAtn updatedAtn __typenamen}n",
              "variables": {
              "beerId": "294423",
              "first": 7,
              "order": "RECENT"
              }
              }
              ]' -> graphql_query


              We will need to scrunch that back into one line for the API call (which I do with the gsub() below. We also need to manually specify the content type and ensure httr does not try to mangle the body data by setting the encoding to raw:



              httr::POST(
              url = "https://beta.ratebeer.com/v1/api/graphql/",
              httr::content_type("application/json"),
              encode = "raw",
              body = gsub("n", " ", graphql_query),
              httr::verbose()
              ) -> res


              Now we have a structured, but heavily nested, list with your ifo in it. Pretty sure you're after items below:



              str(httr::content(res), 4)
              ## List of 3
              ## $ :List of 1
              ## ..$ data:List of 1
              ## .. ..$ info:List of 3
              ## .. .. ..$ id : chr "294423"
              ## .. .. ..$ name : chr "Sullerica 1561"
              ## .. .. ..$ __typename: chr "Beer"
              ## $ :List of 1
              ## ..$ data:List of 1
              ## .. ..$ info:List of 7
              ## .. .. ..$ id : chr "294423"
              ## .. .. ..$ name : chr "Sullerica 1561"
              ## .. .. ..$ styleScore : num 35.1
              ## .. .. ..$ overallScore : num 51.8
              ## .. .. ..$ averageRating: num 3.25
              ## .. .. ..$ ratingCount : int 21
              ## .. .. ..$ __typename : chr "Beer"
              ## $ :List of 1
              ## ..$ data:List of 1
              ## .. ..$ beerReviewsArr:List of 4
              ## .. .. ..$ items :List of 10
              ## .. .. ..$ totalCount: int 21
              ## .. .. ..$ last : chr "7177326"
              ## .. .. ..$ __typename: chr "ReviewList"


              It does only have 10 out of 21 so scroll down in your browser window with Developer Tools open and look at the second POST request that gets made, see what parameters changed and now you will have an even better idea of how to access the site's back-end API vs have to scrape for content.






              share|improve this answer


























                0












                0








                0







                You don't need a heavyweight, third-party dependency. That site uses graphql POST requests under the hood in asynchronous XHR requests to retrieve the data. You can see it if you open Developer Tools andenter image description here



                I did a "Copy POST Data" (usually the same or rly similar context menu item in all browsers) and un-minimized the graphql query in the Response tab just to show you what it is and to also, perhaps, make it easier for you to see the query and augment it on your own (what I just said is out of scope for "but what about…" follow on questions in comments; please file a new question if you want help with that).



                '[
                {
                "operationName": "beer",
                "query": "query beer($beerId: ID!) {n info: beer(id: $beerId) {n idn namen __typenamen }n}n",
                "variables": {
                "beerId": "294423"
                }
                },
                {
                "operationName": "beer",
                "query": "query beer($beerId: ID!) {n info: beer(id: $beerId) {n idn namen styleScoren overallScoren averageRatingn ratingCountn __typenamen }n}n",
                "variables": {
                "beerId": "294423"
                }
                },
                {
                "operationName": "beerReviews",
                "query": "query beerReviews($beerId: ID!, $authorId: ID, $order: ReviewOrder, $after: ID) {n beerReviewsArr: beerReviews(beerId: $beerId, authorId: $authorId, order: $order, after: $after) {n items {n ...ReviewItemn __typenamen }n totalCountn lastn __typenamen }n}nnfragment ReviewItem on Review {n idn commentn scoren scores {n appearancen aroman flavorn mouthfeeln overalln __typenamen }n author {n idn usernamen reviewCountn __typenamen }n checkin {n idn place {n idn namen cityn state {n idn namen __typenamen }n country {n idn namen __typenamen }n __typenamen }n __typenamen }n servedInn likeCountn likedByMen createdAtn updatedAtn __typenamen}n",
                "variables": {
                "beerId": "294423",
                "first": 7,
                "order": "RECENT"
                }
                }
                ]' -> graphql_query


                We will need to scrunch that back into one line for the API call (which I do with the gsub() below. We also need to manually specify the content type and ensure httr does not try to mangle the body data by setting the encoding to raw:



                httr::POST(
                url = "https://beta.ratebeer.com/v1/api/graphql/",
                httr::content_type("application/json"),
                encode = "raw",
                body = gsub("n", " ", graphql_query),
                httr::verbose()
                ) -> res


                Now we have a structured, but heavily nested, list with your ifo in it. Pretty sure you're after items below:



                str(httr::content(res), 4)
                ## List of 3
                ## $ :List of 1
                ## ..$ data:List of 1
                ## .. ..$ info:List of 3
                ## .. .. ..$ id : chr "294423"
                ## .. .. ..$ name : chr "Sullerica 1561"
                ## .. .. ..$ __typename: chr "Beer"
                ## $ :List of 1
                ## ..$ data:List of 1
                ## .. ..$ info:List of 7
                ## .. .. ..$ id : chr "294423"
                ## .. .. ..$ name : chr "Sullerica 1561"
                ## .. .. ..$ styleScore : num 35.1
                ## .. .. ..$ overallScore : num 51.8
                ## .. .. ..$ averageRating: num 3.25
                ## .. .. ..$ ratingCount : int 21
                ## .. .. ..$ __typename : chr "Beer"
                ## $ :List of 1
                ## ..$ data:List of 1
                ## .. ..$ beerReviewsArr:List of 4
                ## .. .. ..$ items :List of 10
                ## .. .. ..$ totalCount: int 21
                ## .. .. ..$ last : chr "7177326"
                ## .. .. ..$ __typename: chr "ReviewList"


                It does only have 10 out of 21 so scroll down in your browser window with Developer Tools open and look at the second POST request that gets made, see what parameters changed and now you will have an even better idea of how to access the site's back-end API vs have to scrape for content.






                share|improve this answer













                You don't need a heavyweight, third-party dependency. That site uses graphql POST requests under the hood in asynchronous XHR requests to retrieve the data. You can see it if you open Developer Tools andenter image description here



                I did a "Copy POST Data" (usually the same or rly similar context menu item in all browsers) and un-minimized the graphql query in the Response tab just to show you what it is and to also, perhaps, make it easier for you to see the query and augment it on your own (what I just said is out of scope for "but what about…" follow on questions in comments; please file a new question if you want help with that).



                '[
                {
                "operationName": "beer",
                "query": "query beer($beerId: ID!) {n info: beer(id: $beerId) {n idn namen __typenamen }n}n",
                "variables": {
                "beerId": "294423"
                }
                },
                {
                "operationName": "beer",
                "query": "query beer($beerId: ID!) {n info: beer(id: $beerId) {n idn namen styleScoren overallScoren averageRatingn ratingCountn __typenamen }n}n",
                "variables": {
                "beerId": "294423"
                }
                },
                {
                "operationName": "beerReviews",
                "query": "query beerReviews($beerId: ID!, $authorId: ID, $order: ReviewOrder, $after: ID) {n beerReviewsArr: beerReviews(beerId: $beerId, authorId: $authorId, order: $order, after: $after) {n items {n ...ReviewItemn __typenamen }n totalCountn lastn __typenamen }n}nnfragment ReviewItem on Review {n idn commentn scoren scores {n appearancen aroman flavorn mouthfeeln overalln __typenamen }n author {n idn usernamen reviewCountn __typenamen }n checkin {n idn place {n idn namen cityn state {n idn namen __typenamen }n country {n idn namen __typenamen }n __typenamen }n __typenamen }n servedInn likeCountn likedByMen createdAtn updatedAtn __typenamen}n",
                "variables": {
                "beerId": "294423",
                "first": 7,
                "order": "RECENT"
                }
                }
                ]' -> graphql_query


                We will need to scrunch that back into one line for the API call (which I do with the gsub() below. We also need to manually specify the content type and ensure httr does not try to mangle the body data by setting the encoding to raw:



                httr::POST(
                url = "https://beta.ratebeer.com/v1/api/graphql/",
                httr::content_type("application/json"),
                encode = "raw",
                body = gsub("n", " ", graphql_query),
                httr::verbose()
                ) -> res


                Now we have a structured, but heavily nested, list with your ifo in it. Pretty sure you're after items below:



                str(httr::content(res), 4)
                ## List of 3
                ## $ :List of 1
                ## ..$ data:List of 1
                ## .. ..$ info:List of 3
                ## .. .. ..$ id : chr "294423"
                ## .. .. ..$ name : chr "Sullerica 1561"
                ## .. .. ..$ __typename: chr "Beer"
                ## $ :List of 1
                ## ..$ data:List of 1
                ## .. ..$ info:List of 7
                ## .. .. ..$ id : chr "294423"
                ## .. .. ..$ name : chr "Sullerica 1561"
                ## .. .. ..$ styleScore : num 35.1
                ## .. .. ..$ overallScore : num 51.8
                ## .. .. ..$ averageRating: num 3.25
                ## .. .. ..$ ratingCount : int 21
                ## .. .. ..$ __typename : chr "Beer"
                ## $ :List of 1
                ## ..$ data:List of 1
                ## .. ..$ beerReviewsArr:List of 4
                ## .. .. ..$ items :List of 10
                ## .. .. ..$ totalCount: int 21
                ## .. .. ..$ last : chr "7177326"
                ## .. .. ..$ __typename: chr "ReviewList"


                It does only have 10 out of 21 so scroll down in your browser window with Developer Tools open and look at the second POST request that gets made, see what parameters changed and now you will have an even better idea of how to access the site's back-end API vs have to scrape for content.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 23 '18 at 13:02









                hrbrmstrhrbrmstr

                62k694154




                62k694154






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53446669%2ffind-elements-by-xpath-selenium-phantomjs%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                    Alcedinidae

                    RAC Tourist Trophy