Create Tuple out of Array(Array[String) of Varying Sizes using Scala












0














I am new to scala and I am trying to make a Tuple pair out an RDD of type Array(Array[String]) that looks like:



(122abc,223cde,334vbn,445das),(221bca,321dsa),(231dsa,653asd,698poq,897qwa)


I am trying to create Tuple Pairs out of these arrays so that the first element of each array is key and and any other part of the array is a value. For example the output would look like:



122abc    223cde
122abc 334vbn
122abc 445das
221bca 321dsa
231dsa 653asd
231dsa 698poq
231dsa 897qwa


I can't figure out how to separate the first element from each array and then map it to every other element.










share|improve this question
























  • Why do you have two of 221bca 321dsa?
    – smac89
    Nov 20 at 1:43










  • @smac89 that was a typo sorry. Changed now.
    – AntarianCoder
    Nov 20 at 1:45












  • Are you trying to map an RDD[Array[Array[String]]] to an RDD[(String,String)]?
    – Jack Leow
    Nov 20 at 2:45






  • 1




    @JackLeow yes I am trying to map RDD[Array[Array[String]]] to an RDD[(String,String)]. Sorry if I was not being clear enough.
    – AntarianCoder
    Nov 20 at 2:50
















0














I am new to scala and I am trying to make a Tuple pair out an RDD of type Array(Array[String]) that looks like:



(122abc,223cde,334vbn,445das),(221bca,321dsa),(231dsa,653asd,698poq,897qwa)


I am trying to create Tuple Pairs out of these arrays so that the first element of each array is key and and any other part of the array is a value. For example the output would look like:



122abc    223cde
122abc 334vbn
122abc 445das
221bca 321dsa
231dsa 653asd
231dsa 698poq
231dsa 897qwa


I can't figure out how to separate the first element from each array and then map it to every other element.










share|improve this question
























  • Why do you have two of 221bca 321dsa?
    – smac89
    Nov 20 at 1:43










  • @smac89 that was a typo sorry. Changed now.
    – AntarianCoder
    Nov 20 at 1:45












  • Are you trying to map an RDD[Array[Array[String]]] to an RDD[(String,String)]?
    – Jack Leow
    Nov 20 at 2:45






  • 1




    @JackLeow yes I am trying to map RDD[Array[Array[String]]] to an RDD[(String,String)]. Sorry if I was not being clear enough.
    – AntarianCoder
    Nov 20 at 2:50














0












0








0


1





I am new to scala and I am trying to make a Tuple pair out an RDD of type Array(Array[String]) that looks like:



(122abc,223cde,334vbn,445das),(221bca,321dsa),(231dsa,653asd,698poq,897qwa)


I am trying to create Tuple Pairs out of these arrays so that the first element of each array is key and and any other part of the array is a value. For example the output would look like:



122abc    223cde
122abc 334vbn
122abc 445das
221bca 321dsa
231dsa 653asd
231dsa 698poq
231dsa 897qwa


I can't figure out how to separate the first element from each array and then map it to every other element.










share|improve this question















I am new to scala and I am trying to make a Tuple pair out an RDD of type Array(Array[String]) that looks like:



(122abc,223cde,334vbn,445das),(221bca,321dsa),(231dsa,653asd,698poq,897qwa)


I am trying to create Tuple Pairs out of these arrays so that the first element of each array is key and and any other part of the array is a value. For example the output would look like:



122abc    223cde
122abc 334vbn
122abc 445das
221bca 321dsa
231dsa 653asd
231dsa 698poq
231dsa 897qwa


I can't figure out how to separate the first element from each array and then map it to every other element.







arrays scala apache-spark rdd






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 at 1:44

























asked Nov 20 at 1:39









AntarianCoder

204




204












  • Why do you have two of 221bca 321dsa?
    – smac89
    Nov 20 at 1:43










  • @smac89 that was a typo sorry. Changed now.
    – AntarianCoder
    Nov 20 at 1:45












  • Are you trying to map an RDD[Array[Array[String]]] to an RDD[(String,String)]?
    – Jack Leow
    Nov 20 at 2:45






  • 1




    @JackLeow yes I am trying to map RDD[Array[Array[String]]] to an RDD[(String,String)]. Sorry if I was not being clear enough.
    – AntarianCoder
    Nov 20 at 2:50


















  • Why do you have two of 221bca 321dsa?
    – smac89
    Nov 20 at 1:43










  • @smac89 that was a typo sorry. Changed now.
    – AntarianCoder
    Nov 20 at 1:45












  • Are you trying to map an RDD[Array[Array[String]]] to an RDD[(String,String)]?
    – Jack Leow
    Nov 20 at 2:45






  • 1




    @JackLeow yes I am trying to map RDD[Array[Array[String]]] to an RDD[(String,String)]. Sorry if I was not being clear enough.
    – AntarianCoder
    Nov 20 at 2:50
















Why do you have two of 221bca 321dsa?
– smac89
Nov 20 at 1:43




Why do you have two of 221bca 321dsa?
– smac89
Nov 20 at 1:43












@smac89 that was a typo sorry. Changed now.
– AntarianCoder
Nov 20 at 1:45






@smac89 that was a typo sorry. Changed now.
– AntarianCoder
Nov 20 at 1:45














Are you trying to map an RDD[Array[Array[String]]] to an RDD[(String,String)]?
– Jack Leow
Nov 20 at 2:45




Are you trying to map an RDD[Array[Array[String]]] to an RDD[(String,String)]?
– Jack Leow
Nov 20 at 2:45




1




1




@JackLeow yes I am trying to map RDD[Array[Array[String]]] to an RDD[(String,String)]. Sorry if I was not being clear enough.
– AntarianCoder
Nov 20 at 2:50




@JackLeow yes I am trying to map RDD[Array[Array[String]]] to an RDD[(String,String)]. Sorry if I was not being clear enough.
– AntarianCoder
Nov 20 at 2:50












4 Answers
4






active

oldest

votes


















2














If I'm reading it correctly, the core of your question has to do with separating the head (first element) of the inner arrays from the tail (remaining elements), which you can use the head and tail methods. RDDs behave a lot like Scala lists, so you can do this all with what looks like pure Scala code.



Given the following input RDD:



val input: RDD[Array[Array[String]]] = sc.parallelize(
Seq(
Array(
Array("122abc","223cde","334vbn","445das"),
Array("221bca","321dsa"),
Array("231dsa","653asd","698poq","897qwa")
)
)
)


The following should do what you want:



val output: RDD[(String,String)] =
input.flatMap { arrArrStr: Array[Array[String]] =>
arrArrStr.flatMap { arrStrs: Array[String] =>
arrStrs.tail.map { value => arrStrs.head -> value }
}
}


And in fact, because of how the flatMap/map is composed, you could re-write it as a for-comprehension.:



val output: RDD[(String,String)] =
for {
arrArrStr: Array[Array[String]] <- input
arrStr: Array[String] <- arrArrStr
str: String <- arrStr.tail
} yield (arrStr.head -> str)


Which one you go with is ultimately a matter of personal preference (though in this case, I prefer the latter, as you don't have to indent code as much).



For verification:



output.collect().foreach(println)


Should print out:



(122abc,223cde)
(122abc,334vbn)
(122abc,445das)
(221bca,321dsa)
(231dsa,653asd)
(231dsa,698poq)
(231dsa,897qwa)





share|improve this answer































    1














    This is a classic fold operation; but folding in Spark is calling aggregate:



    // Start with an empty array
    data.aggregate(Array.empty[(String, String)]) {
    // `arr.drop(1).map(e => (arr.head, e))` will create tuples of
    // all elements in each row and the first element.
    // Append this to the aggregate array.
    case (acc, arr) => acc ++ arr.drop(1).map(e => (arr.head, e))
    }


    The solution is a non-Spark environment:



    scala> val data = Array(Array("122abc","223cde","334vbn","445das"),Array("221bca","321dsa"),Array("231dsa","653asd","698poq","897qwa"))
    scala> data.foldLeft(Array.empty[(String, String)]) { case (acc, arr) =>
    | acc ++ arr.drop(1).map(e => (arr.head, e))
    | }
    res0: Array[(String, String)] = Array((122abc,223cde), (122abc,334vbn), (122abc,445das), (221bca,321dsa), (231dsa,653asd), (231dsa,698poq), (231dsa,897qwa))





    share|improve this answer































      1














      Convert your input element to seq and all and then try to write the wrapper which will give you List(List(item1,item2), List(item1,item2),...)



      Try below code



      val seqs = Seq("122abc","223cde","334vbn","445das")++
      Seq("221bca","321dsa")++
      Seq("231dsa","653asd","698poq","897qwa")


      Write a wrapper to convert seq into a pair of two



      def toPairs[A](xs: Seq[A]): Seq[(A,A)] = xs.zip(xs.tail)


      Now send your seq as params and it it will give your pair of two



      toPairs(seqs).mkString(" ")


      After making it to string you will get the output like



      res8: String = (122abc,223cde) (223cde,334vbn) (334vbn,445das) (445das,221bca) (221bca,321dsa) (321dsa,231dsa) (231dsa,653asd) (653asd,698poq) (698poq,897qwa)


      Now you can convert your string, however, you want.






      share|improve this answer























      • I'm not sure, but your output doesn't really look like OP's.
        – erip
        Nov 20 at 2:42










      • toPairs(seqs) will give you List(List(item1,item2),List(item1,item2)...) so it is pretty much which are supposed to come and then you can convert into however you want.
        – Amit Prasad
        Nov 20 at 2:47










      • No, that's not what OP wants. OP wants to create a single array of tuples where the tuples came from each subarray's first element combined with the rest of elements of the subarray for each subarray in the original RDD.
        – erip
        Nov 20 at 2:48





















      1














      Using df and explode.



        val df =   Seq(
      Array("122abc","223cde","334vbn","445das"),
      Array("221bca","321dsa"),
      Array("231dsa","653asd","698poq","897qwa")
      ).toDF("arr")
      val df2 = df.withColumn("key", 'arr(0)).withColumn("values",explode('arr)).filter('key =!= 'values).drop('arr).withColumn("tuple",struct('key,'values))
      df2.show(false)
      df2.rdd.map( x => Row( (x(0),x(1)) )).collect.foreach(println)


      Output:



      +------+------+---------------+
      |key |values|tuple |
      +------+------+---------------+
      |122abc|223cde|[122abc,223cde]|
      |122abc|334vbn|[122abc,334vbn]|
      |122abc|445das|[122abc,445das]|
      |221bca|321dsa|[221bca,321dsa]|
      |231dsa|653asd|[231dsa,653asd]|
      |231dsa|698poq|[231dsa,698poq]|
      |231dsa|897qwa|[231dsa,897qwa]|
      +------+------+---------------+


      [(122abc,223cde)]
      [(122abc,334vbn)]
      [(122abc,445das)]
      [(221bca,321dsa)]
      [(231dsa,653asd)]
      [(231dsa,698poq)]
      [(231dsa,897qwa)]


      Update1:



      Using paired rdd



      val df =   Seq(
      Array("122abc","223cde","334vbn","445das"),
      Array("221bca","321dsa"),
      Array("231dsa","653asd","698poq","897qwa")
      ).toDF("arr")
      import scala.collection.mutable._
      val rdd1 = df.rdd.map( x => { val y = x.getAs[mutable.WrappedArray[String]]("arr")(0); (y,x)} )
      val pair = new PairRDDFunctions(rdd1)
      pair.flatMapValues( x => x.getAs[mutable.WrappedArray[String]]("arr") )
      .filter( x=> x._1 != x._2)
      .collect.foreach(println)


      Results:



      (122abc,223cde)
      (122abc,334vbn)
      (122abc,445das)
      (221bca,321dsa)
      (231dsa,653asd)
      (231dsa,698poq)
      (231dsa,897qwa)





      share|improve this answer























        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385051%2fcreate-tuple-out-of-arrayarraystring-of-varying-sizes-using-scala%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        4 Answers
        4






        active

        oldest

        votes








        4 Answers
        4






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        2














        If I'm reading it correctly, the core of your question has to do with separating the head (first element) of the inner arrays from the tail (remaining elements), which you can use the head and tail methods. RDDs behave a lot like Scala lists, so you can do this all with what looks like pure Scala code.



        Given the following input RDD:



        val input: RDD[Array[Array[String]]] = sc.parallelize(
        Seq(
        Array(
        Array("122abc","223cde","334vbn","445das"),
        Array("221bca","321dsa"),
        Array("231dsa","653asd","698poq","897qwa")
        )
        )
        )


        The following should do what you want:



        val output: RDD[(String,String)] =
        input.flatMap { arrArrStr: Array[Array[String]] =>
        arrArrStr.flatMap { arrStrs: Array[String] =>
        arrStrs.tail.map { value => arrStrs.head -> value }
        }
        }


        And in fact, because of how the flatMap/map is composed, you could re-write it as a for-comprehension.:



        val output: RDD[(String,String)] =
        for {
        arrArrStr: Array[Array[String]] <- input
        arrStr: Array[String] <- arrArrStr
        str: String <- arrStr.tail
        } yield (arrStr.head -> str)


        Which one you go with is ultimately a matter of personal preference (though in this case, I prefer the latter, as you don't have to indent code as much).



        For verification:



        output.collect().foreach(println)


        Should print out:



        (122abc,223cde)
        (122abc,334vbn)
        (122abc,445das)
        (221bca,321dsa)
        (231dsa,653asd)
        (231dsa,698poq)
        (231dsa,897qwa)





        share|improve this answer




























          2














          If I'm reading it correctly, the core of your question has to do with separating the head (first element) of the inner arrays from the tail (remaining elements), which you can use the head and tail methods. RDDs behave a lot like Scala lists, so you can do this all with what looks like pure Scala code.



          Given the following input RDD:



          val input: RDD[Array[Array[String]]] = sc.parallelize(
          Seq(
          Array(
          Array("122abc","223cde","334vbn","445das"),
          Array("221bca","321dsa"),
          Array("231dsa","653asd","698poq","897qwa")
          )
          )
          )


          The following should do what you want:



          val output: RDD[(String,String)] =
          input.flatMap { arrArrStr: Array[Array[String]] =>
          arrArrStr.flatMap { arrStrs: Array[String] =>
          arrStrs.tail.map { value => arrStrs.head -> value }
          }
          }


          And in fact, because of how the flatMap/map is composed, you could re-write it as a for-comprehension.:



          val output: RDD[(String,String)] =
          for {
          arrArrStr: Array[Array[String]] <- input
          arrStr: Array[String] <- arrArrStr
          str: String <- arrStr.tail
          } yield (arrStr.head -> str)


          Which one you go with is ultimately a matter of personal preference (though in this case, I prefer the latter, as you don't have to indent code as much).



          For verification:



          output.collect().foreach(println)


          Should print out:



          (122abc,223cde)
          (122abc,334vbn)
          (122abc,445das)
          (221bca,321dsa)
          (231dsa,653asd)
          (231dsa,698poq)
          (231dsa,897qwa)





          share|improve this answer


























            2












            2








            2






            If I'm reading it correctly, the core of your question has to do with separating the head (first element) of the inner arrays from the tail (remaining elements), which you can use the head and tail methods. RDDs behave a lot like Scala lists, so you can do this all with what looks like pure Scala code.



            Given the following input RDD:



            val input: RDD[Array[Array[String]]] = sc.parallelize(
            Seq(
            Array(
            Array("122abc","223cde","334vbn","445das"),
            Array("221bca","321dsa"),
            Array("231dsa","653asd","698poq","897qwa")
            )
            )
            )


            The following should do what you want:



            val output: RDD[(String,String)] =
            input.flatMap { arrArrStr: Array[Array[String]] =>
            arrArrStr.flatMap { arrStrs: Array[String] =>
            arrStrs.tail.map { value => arrStrs.head -> value }
            }
            }


            And in fact, because of how the flatMap/map is composed, you could re-write it as a for-comprehension.:



            val output: RDD[(String,String)] =
            for {
            arrArrStr: Array[Array[String]] <- input
            arrStr: Array[String] <- arrArrStr
            str: String <- arrStr.tail
            } yield (arrStr.head -> str)


            Which one you go with is ultimately a matter of personal preference (though in this case, I prefer the latter, as you don't have to indent code as much).



            For verification:



            output.collect().foreach(println)


            Should print out:



            (122abc,223cde)
            (122abc,334vbn)
            (122abc,445das)
            (221bca,321dsa)
            (231dsa,653asd)
            (231dsa,698poq)
            (231dsa,897qwa)





            share|improve this answer














            If I'm reading it correctly, the core of your question has to do with separating the head (first element) of the inner arrays from the tail (remaining elements), which you can use the head and tail methods. RDDs behave a lot like Scala lists, so you can do this all with what looks like pure Scala code.



            Given the following input RDD:



            val input: RDD[Array[Array[String]]] = sc.parallelize(
            Seq(
            Array(
            Array("122abc","223cde","334vbn","445das"),
            Array("221bca","321dsa"),
            Array("231dsa","653asd","698poq","897qwa")
            )
            )
            )


            The following should do what you want:



            val output: RDD[(String,String)] =
            input.flatMap { arrArrStr: Array[Array[String]] =>
            arrArrStr.flatMap { arrStrs: Array[String] =>
            arrStrs.tail.map { value => arrStrs.head -> value }
            }
            }


            And in fact, because of how the flatMap/map is composed, you could re-write it as a for-comprehension.:



            val output: RDD[(String,String)] =
            for {
            arrArrStr: Array[Array[String]] <- input
            arrStr: Array[String] <- arrArrStr
            str: String <- arrStr.tail
            } yield (arrStr.head -> str)


            Which one you go with is ultimately a matter of personal preference (though in this case, I prefer the latter, as you don't have to indent code as much).



            For verification:



            output.collect().foreach(println)


            Should print out:



            (122abc,223cde)
            (122abc,334vbn)
            (122abc,445das)
            (221bca,321dsa)
            (231dsa,653asd)
            (231dsa,698poq)
            (231dsa,897qwa)






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 30 at 20:54

























            answered Nov 20 at 3:05









            Jack Leow

            18.1k34446




            18.1k34446

























                1














                This is a classic fold operation; but folding in Spark is calling aggregate:



                // Start with an empty array
                data.aggregate(Array.empty[(String, String)]) {
                // `arr.drop(1).map(e => (arr.head, e))` will create tuples of
                // all elements in each row and the first element.
                // Append this to the aggregate array.
                case (acc, arr) => acc ++ arr.drop(1).map(e => (arr.head, e))
                }


                The solution is a non-Spark environment:



                scala> val data = Array(Array("122abc","223cde","334vbn","445das"),Array("221bca","321dsa"),Array("231dsa","653asd","698poq","897qwa"))
                scala> data.foldLeft(Array.empty[(String, String)]) { case (acc, arr) =>
                | acc ++ arr.drop(1).map(e => (arr.head, e))
                | }
                res0: Array[(String, String)] = Array((122abc,223cde), (122abc,334vbn), (122abc,445das), (221bca,321dsa), (231dsa,653asd), (231dsa,698poq), (231dsa,897qwa))





                share|improve this answer




























                  1














                  This is a classic fold operation; but folding in Spark is calling aggregate:



                  // Start with an empty array
                  data.aggregate(Array.empty[(String, String)]) {
                  // `arr.drop(1).map(e => (arr.head, e))` will create tuples of
                  // all elements in each row and the first element.
                  // Append this to the aggregate array.
                  case (acc, arr) => acc ++ arr.drop(1).map(e => (arr.head, e))
                  }


                  The solution is a non-Spark environment:



                  scala> val data = Array(Array("122abc","223cde","334vbn","445das"),Array("221bca","321dsa"),Array("231dsa","653asd","698poq","897qwa"))
                  scala> data.foldLeft(Array.empty[(String, String)]) { case (acc, arr) =>
                  | acc ++ arr.drop(1).map(e => (arr.head, e))
                  | }
                  res0: Array[(String, String)] = Array((122abc,223cde), (122abc,334vbn), (122abc,445das), (221bca,321dsa), (231dsa,653asd), (231dsa,698poq), (231dsa,897qwa))





                  share|improve this answer


























                    1












                    1








                    1






                    This is a classic fold operation; but folding in Spark is calling aggregate:



                    // Start with an empty array
                    data.aggregate(Array.empty[(String, String)]) {
                    // `arr.drop(1).map(e => (arr.head, e))` will create tuples of
                    // all elements in each row and the first element.
                    // Append this to the aggregate array.
                    case (acc, arr) => acc ++ arr.drop(1).map(e => (arr.head, e))
                    }


                    The solution is a non-Spark environment:



                    scala> val data = Array(Array("122abc","223cde","334vbn","445das"),Array("221bca","321dsa"),Array("231dsa","653asd","698poq","897qwa"))
                    scala> data.foldLeft(Array.empty[(String, String)]) { case (acc, arr) =>
                    | acc ++ arr.drop(1).map(e => (arr.head, e))
                    | }
                    res0: Array[(String, String)] = Array((122abc,223cde), (122abc,334vbn), (122abc,445das), (221bca,321dsa), (231dsa,653asd), (231dsa,698poq), (231dsa,897qwa))





                    share|improve this answer














                    This is a classic fold operation; but folding in Spark is calling aggregate:



                    // Start with an empty array
                    data.aggregate(Array.empty[(String, String)]) {
                    // `arr.drop(1).map(e => (arr.head, e))` will create tuples of
                    // all elements in each row and the first element.
                    // Append this to the aggregate array.
                    case (acc, arr) => acc ++ arr.drop(1).map(e => (arr.head, e))
                    }


                    The solution is a non-Spark environment:



                    scala> val data = Array(Array("122abc","223cde","334vbn","445das"),Array("221bca","321dsa"),Array("231dsa","653asd","698poq","897qwa"))
                    scala> data.foldLeft(Array.empty[(String, String)]) { case (acc, arr) =>
                    | acc ++ arr.drop(1).map(e => (arr.head, e))
                    | }
                    res0: Array[(String, String)] = Array((122abc,223cde), (122abc,334vbn), (122abc,445das), (221bca,321dsa), (231dsa,653asd), (231dsa,698poq), (231dsa,897qwa))






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Nov 20 at 2:23

























                    answered Nov 20 at 2:14









                    erip

                    10.2k43774




                    10.2k43774























                        1














                        Convert your input element to seq and all and then try to write the wrapper which will give you List(List(item1,item2), List(item1,item2),...)



                        Try below code



                        val seqs = Seq("122abc","223cde","334vbn","445das")++
                        Seq("221bca","321dsa")++
                        Seq("231dsa","653asd","698poq","897qwa")


                        Write a wrapper to convert seq into a pair of two



                        def toPairs[A](xs: Seq[A]): Seq[(A,A)] = xs.zip(xs.tail)


                        Now send your seq as params and it it will give your pair of two



                        toPairs(seqs).mkString(" ")


                        After making it to string you will get the output like



                        res8: String = (122abc,223cde) (223cde,334vbn) (334vbn,445das) (445das,221bca) (221bca,321dsa) (321dsa,231dsa) (231dsa,653asd) (653asd,698poq) (698poq,897qwa)


                        Now you can convert your string, however, you want.






                        share|improve this answer























                        • I'm not sure, but your output doesn't really look like OP's.
                          – erip
                          Nov 20 at 2:42










                        • toPairs(seqs) will give you List(List(item1,item2),List(item1,item2)...) so it is pretty much which are supposed to come and then you can convert into however you want.
                          – Amit Prasad
                          Nov 20 at 2:47










                        • No, that's not what OP wants. OP wants to create a single array of tuples where the tuples came from each subarray's first element combined with the rest of elements of the subarray for each subarray in the original RDD.
                          – erip
                          Nov 20 at 2:48


















                        1














                        Convert your input element to seq and all and then try to write the wrapper which will give you List(List(item1,item2), List(item1,item2),...)



                        Try below code



                        val seqs = Seq("122abc","223cde","334vbn","445das")++
                        Seq("221bca","321dsa")++
                        Seq("231dsa","653asd","698poq","897qwa")


                        Write a wrapper to convert seq into a pair of two



                        def toPairs[A](xs: Seq[A]): Seq[(A,A)] = xs.zip(xs.tail)


                        Now send your seq as params and it it will give your pair of two



                        toPairs(seqs).mkString(" ")


                        After making it to string you will get the output like



                        res8: String = (122abc,223cde) (223cde,334vbn) (334vbn,445das) (445das,221bca) (221bca,321dsa) (321dsa,231dsa) (231dsa,653asd) (653asd,698poq) (698poq,897qwa)


                        Now you can convert your string, however, you want.






                        share|improve this answer























                        • I'm not sure, but your output doesn't really look like OP's.
                          – erip
                          Nov 20 at 2:42










                        • toPairs(seqs) will give you List(List(item1,item2),List(item1,item2)...) so it is pretty much which are supposed to come and then you can convert into however you want.
                          – Amit Prasad
                          Nov 20 at 2:47










                        • No, that's not what OP wants. OP wants to create a single array of tuples where the tuples came from each subarray's first element combined with the rest of elements of the subarray for each subarray in the original RDD.
                          – erip
                          Nov 20 at 2:48
















                        1












                        1








                        1






                        Convert your input element to seq and all and then try to write the wrapper which will give you List(List(item1,item2), List(item1,item2),...)



                        Try below code



                        val seqs = Seq("122abc","223cde","334vbn","445das")++
                        Seq("221bca","321dsa")++
                        Seq("231dsa","653asd","698poq","897qwa")


                        Write a wrapper to convert seq into a pair of two



                        def toPairs[A](xs: Seq[A]): Seq[(A,A)] = xs.zip(xs.tail)


                        Now send your seq as params and it it will give your pair of two



                        toPairs(seqs).mkString(" ")


                        After making it to string you will get the output like



                        res8: String = (122abc,223cde) (223cde,334vbn) (334vbn,445das) (445das,221bca) (221bca,321dsa) (321dsa,231dsa) (231dsa,653asd) (653asd,698poq) (698poq,897qwa)


                        Now you can convert your string, however, you want.






                        share|improve this answer














                        Convert your input element to seq and all and then try to write the wrapper which will give you List(List(item1,item2), List(item1,item2),...)



                        Try below code



                        val seqs = Seq("122abc","223cde","334vbn","445das")++
                        Seq("221bca","321dsa")++
                        Seq("231dsa","653asd","698poq","897qwa")


                        Write a wrapper to convert seq into a pair of two



                        def toPairs[A](xs: Seq[A]): Seq[(A,A)] = xs.zip(xs.tail)


                        Now send your seq as params and it it will give your pair of two



                        toPairs(seqs).mkString(" ")


                        After making it to string you will get the output like



                        res8: String = (122abc,223cde) (223cde,334vbn) (334vbn,445das) (445das,221bca) (221bca,321dsa) (321dsa,231dsa) (231dsa,653asd) (653asd,698poq) (698poq,897qwa)


                        Now you can convert your string, however, you want.







                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Nov 20 at 2:29

























                        answered Nov 20 at 2:24









                        Amit Prasad

                        538315




                        538315












                        • I'm not sure, but your output doesn't really look like OP's.
                          – erip
                          Nov 20 at 2:42










                        • toPairs(seqs) will give you List(List(item1,item2),List(item1,item2)...) so it is pretty much which are supposed to come and then you can convert into however you want.
                          – Amit Prasad
                          Nov 20 at 2:47










                        • No, that's not what OP wants. OP wants to create a single array of tuples where the tuples came from each subarray's first element combined with the rest of elements of the subarray for each subarray in the original RDD.
                          – erip
                          Nov 20 at 2:48




















                        • I'm not sure, but your output doesn't really look like OP's.
                          – erip
                          Nov 20 at 2:42










                        • toPairs(seqs) will give you List(List(item1,item2),List(item1,item2)...) so it is pretty much which are supposed to come and then you can convert into however you want.
                          – Amit Prasad
                          Nov 20 at 2:47










                        • No, that's not what OP wants. OP wants to create a single array of tuples where the tuples came from each subarray's first element combined with the rest of elements of the subarray for each subarray in the original RDD.
                          – erip
                          Nov 20 at 2:48


















                        I'm not sure, but your output doesn't really look like OP's.
                        – erip
                        Nov 20 at 2:42




                        I'm not sure, but your output doesn't really look like OP's.
                        – erip
                        Nov 20 at 2:42












                        toPairs(seqs) will give you List(List(item1,item2),List(item1,item2)...) so it is pretty much which are supposed to come and then you can convert into however you want.
                        – Amit Prasad
                        Nov 20 at 2:47




                        toPairs(seqs) will give you List(List(item1,item2),List(item1,item2)...) so it is pretty much which are supposed to come and then you can convert into however you want.
                        – Amit Prasad
                        Nov 20 at 2:47












                        No, that's not what OP wants. OP wants to create a single array of tuples where the tuples came from each subarray's first element combined with the rest of elements of the subarray for each subarray in the original RDD.
                        – erip
                        Nov 20 at 2:48






                        No, that's not what OP wants. OP wants to create a single array of tuples where the tuples came from each subarray's first element combined with the rest of elements of the subarray for each subarray in the original RDD.
                        – erip
                        Nov 20 at 2:48













                        1














                        Using df and explode.



                          val df =   Seq(
                        Array("122abc","223cde","334vbn","445das"),
                        Array("221bca","321dsa"),
                        Array("231dsa","653asd","698poq","897qwa")
                        ).toDF("arr")
                        val df2 = df.withColumn("key", 'arr(0)).withColumn("values",explode('arr)).filter('key =!= 'values).drop('arr).withColumn("tuple",struct('key,'values))
                        df2.show(false)
                        df2.rdd.map( x => Row( (x(0),x(1)) )).collect.foreach(println)


                        Output:



                        +------+------+---------------+
                        |key |values|tuple |
                        +------+------+---------------+
                        |122abc|223cde|[122abc,223cde]|
                        |122abc|334vbn|[122abc,334vbn]|
                        |122abc|445das|[122abc,445das]|
                        |221bca|321dsa|[221bca,321dsa]|
                        |231dsa|653asd|[231dsa,653asd]|
                        |231dsa|698poq|[231dsa,698poq]|
                        |231dsa|897qwa|[231dsa,897qwa]|
                        +------+------+---------------+


                        [(122abc,223cde)]
                        [(122abc,334vbn)]
                        [(122abc,445das)]
                        [(221bca,321dsa)]
                        [(231dsa,653asd)]
                        [(231dsa,698poq)]
                        [(231dsa,897qwa)]


                        Update1:



                        Using paired rdd



                        val df =   Seq(
                        Array("122abc","223cde","334vbn","445das"),
                        Array("221bca","321dsa"),
                        Array("231dsa","653asd","698poq","897qwa")
                        ).toDF("arr")
                        import scala.collection.mutable._
                        val rdd1 = df.rdd.map( x => { val y = x.getAs[mutable.WrappedArray[String]]("arr")(0); (y,x)} )
                        val pair = new PairRDDFunctions(rdd1)
                        pair.flatMapValues( x => x.getAs[mutable.WrappedArray[String]]("arr") )
                        .filter( x=> x._1 != x._2)
                        .collect.foreach(println)


                        Results:



                        (122abc,223cde)
                        (122abc,334vbn)
                        (122abc,445das)
                        (221bca,321dsa)
                        (231dsa,653asd)
                        (231dsa,698poq)
                        (231dsa,897qwa)





                        share|improve this answer




























                          1














                          Using df and explode.



                            val df =   Seq(
                          Array("122abc","223cde","334vbn","445das"),
                          Array("221bca","321dsa"),
                          Array("231dsa","653asd","698poq","897qwa")
                          ).toDF("arr")
                          val df2 = df.withColumn("key", 'arr(0)).withColumn("values",explode('arr)).filter('key =!= 'values).drop('arr).withColumn("tuple",struct('key,'values))
                          df2.show(false)
                          df2.rdd.map( x => Row( (x(0),x(1)) )).collect.foreach(println)


                          Output:



                          +------+------+---------------+
                          |key |values|tuple |
                          +------+------+---------------+
                          |122abc|223cde|[122abc,223cde]|
                          |122abc|334vbn|[122abc,334vbn]|
                          |122abc|445das|[122abc,445das]|
                          |221bca|321dsa|[221bca,321dsa]|
                          |231dsa|653asd|[231dsa,653asd]|
                          |231dsa|698poq|[231dsa,698poq]|
                          |231dsa|897qwa|[231dsa,897qwa]|
                          +------+------+---------------+


                          [(122abc,223cde)]
                          [(122abc,334vbn)]
                          [(122abc,445das)]
                          [(221bca,321dsa)]
                          [(231dsa,653asd)]
                          [(231dsa,698poq)]
                          [(231dsa,897qwa)]


                          Update1:



                          Using paired rdd



                          val df =   Seq(
                          Array("122abc","223cde","334vbn","445das"),
                          Array("221bca","321dsa"),
                          Array("231dsa","653asd","698poq","897qwa")
                          ).toDF("arr")
                          import scala.collection.mutable._
                          val rdd1 = df.rdd.map( x => { val y = x.getAs[mutable.WrappedArray[String]]("arr")(0); (y,x)} )
                          val pair = new PairRDDFunctions(rdd1)
                          pair.flatMapValues( x => x.getAs[mutable.WrappedArray[String]]("arr") )
                          .filter( x=> x._1 != x._2)
                          .collect.foreach(println)


                          Results:



                          (122abc,223cde)
                          (122abc,334vbn)
                          (122abc,445das)
                          (221bca,321dsa)
                          (231dsa,653asd)
                          (231dsa,698poq)
                          (231dsa,897qwa)





                          share|improve this answer


























                            1












                            1








                            1






                            Using df and explode.



                              val df =   Seq(
                            Array("122abc","223cde","334vbn","445das"),
                            Array("221bca","321dsa"),
                            Array("231dsa","653asd","698poq","897qwa")
                            ).toDF("arr")
                            val df2 = df.withColumn("key", 'arr(0)).withColumn("values",explode('arr)).filter('key =!= 'values).drop('arr).withColumn("tuple",struct('key,'values))
                            df2.show(false)
                            df2.rdd.map( x => Row( (x(0),x(1)) )).collect.foreach(println)


                            Output:



                            +------+------+---------------+
                            |key |values|tuple |
                            +------+------+---------------+
                            |122abc|223cde|[122abc,223cde]|
                            |122abc|334vbn|[122abc,334vbn]|
                            |122abc|445das|[122abc,445das]|
                            |221bca|321dsa|[221bca,321dsa]|
                            |231dsa|653asd|[231dsa,653asd]|
                            |231dsa|698poq|[231dsa,698poq]|
                            |231dsa|897qwa|[231dsa,897qwa]|
                            +------+------+---------------+


                            [(122abc,223cde)]
                            [(122abc,334vbn)]
                            [(122abc,445das)]
                            [(221bca,321dsa)]
                            [(231dsa,653asd)]
                            [(231dsa,698poq)]
                            [(231dsa,897qwa)]


                            Update1:



                            Using paired rdd



                            val df =   Seq(
                            Array("122abc","223cde","334vbn","445das"),
                            Array("221bca","321dsa"),
                            Array("231dsa","653asd","698poq","897qwa")
                            ).toDF("arr")
                            import scala.collection.mutable._
                            val rdd1 = df.rdd.map( x => { val y = x.getAs[mutable.WrappedArray[String]]("arr")(0); (y,x)} )
                            val pair = new PairRDDFunctions(rdd1)
                            pair.flatMapValues( x => x.getAs[mutable.WrappedArray[String]]("arr") )
                            .filter( x=> x._1 != x._2)
                            .collect.foreach(println)


                            Results:



                            (122abc,223cde)
                            (122abc,334vbn)
                            (122abc,445das)
                            (221bca,321dsa)
                            (231dsa,653asd)
                            (231dsa,698poq)
                            (231dsa,897qwa)





                            share|improve this answer














                            Using df and explode.



                              val df =   Seq(
                            Array("122abc","223cde","334vbn","445das"),
                            Array("221bca","321dsa"),
                            Array("231dsa","653asd","698poq","897qwa")
                            ).toDF("arr")
                            val df2 = df.withColumn("key", 'arr(0)).withColumn("values",explode('arr)).filter('key =!= 'values).drop('arr).withColumn("tuple",struct('key,'values))
                            df2.show(false)
                            df2.rdd.map( x => Row( (x(0),x(1)) )).collect.foreach(println)


                            Output:



                            +------+------+---------------+
                            |key |values|tuple |
                            +------+------+---------------+
                            |122abc|223cde|[122abc,223cde]|
                            |122abc|334vbn|[122abc,334vbn]|
                            |122abc|445das|[122abc,445das]|
                            |221bca|321dsa|[221bca,321dsa]|
                            |231dsa|653asd|[231dsa,653asd]|
                            |231dsa|698poq|[231dsa,698poq]|
                            |231dsa|897qwa|[231dsa,897qwa]|
                            +------+------+---------------+


                            [(122abc,223cde)]
                            [(122abc,334vbn)]
                            [(122abc,445das)]
                            [(221bca,321dsa)]
                            [(231dsa,653asd)]
                            [(231dsa,698poq)]
                            [(231dsa,897qwa)]


                            Update1:



                            Using paired rdd



                            val df =   Seq(
                            Array("122abc","223cde","334vbn","445das"),
                            Array("221bca","321dsa"),
                            Array("231dsa","653asd","698poq","897qwa")
                            ).toDF("arr")
                            import scala.collection.mutable._
                            val rdd1 = df.rdd.map( x => { val y = x.getAs[mutable.WrappedArray[String]]("arr")(0); (y,x)} )
                            val pair = new PairRDDFunctions(rdd1)
                            pair.flatMapValues( x => x.getAs[mutable.WrappedArray[String]]("arr") )
                            .filter( x=> x._1 != x._2)
                            .collect.foreach(println)


                            Results:



                            (122abc,223cde)
                            (122abc,334vbn)
                            (122abc,445das)
                            (221bca,321dsa)
                            (231dsa,653asd)
                            (231dsa,698poq)
                            (231dsa,897qwa)






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Nov 20 at 15:51

























                            answered Nov 20 at 10:25









                            stack0114106

                            1,9751416




                            1,9751416






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.





                                Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                Please pay close attention to the following guidance:


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385051%2fcreate-tuple-out-of-arrayarraystring-of-varying-sizes-using-scala%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                                Alcedinidae

                                Origin of the phrase “under your belt”?