Create Tuple out of Array(Array[String) of Varying Sizes using Scala

I am new to scala and I am trying to make a Tuple pair out an RDD of type Array(Array[String]) that looks like:

(122abc,223cde,334vbn,445das),(221bca,321dsa),(231dsa,653asd,698poq,897qwa)

I am trying to create Tuple Pairs out of these arrays so that the first element of each array is key and and any other part of the array is a value. For example the output would look like:

122abc    223cde

122abc    334vbn

122abc    445das

221bca    321dsa

231dsa    653asd

231dsa    698poq

231dsa    897qwa

I can't figure out how to separate the first element from each array and then map it to every other element.

edited Nov 20 at 1:44

asked Nov 20 at 1:39

AntarianCoder

204

Why do you have two of 221bca 321dsa?
– smac89
Nov 20 at 1:43

@smac89 that was a typo sorry. Changed now.
– AntarianCoder
Nov 20 at 1:45

Are you trying to map an RDD[Array[Array[String]]] to an RDD[(String,String)]?
– Jack Leow
Nov 20 at 2:45

1

@JackLeow yes I am trying to map RDD[Array[Array[String]]] to an RDD[(String,String)]. Sorry if I was not being clear enough.
– AntarianCoder
Nov 20 at 2:50

add a comment |

I am new to scala and I am trying to make a Tuple pair out an RDD of type Array(Array[String]) that looks like:

(122abc,223cde,334vbn,445das),(221bca,321dsa),(231dsa,653asd,698poq,897qwa)

I am trying to create Tuple Pairs out of these arrays so that the first element of each array is key and and any other part of the array is a value. For example the output would look like:

122abc    223cde

122abc    334vbn

122abc    445das

221bca    321dsa

231dsa    653asd

231dsa    698poq

231dsa    897qwa

I can't figure out how to separate the first element from each array and then map it to every other element.

edited Nov 20 at 1:44

asked Nov 20 at 1:39

AntarianCoder

204

Why do you have two of 221bca 321dsa?
– smac89
Nov 20 at 1:43

@smac89 that was a typo sorry. Changed now.
– AntarianCoder
Nov 20 at 1:45

Are you trying to map an RDD[Array[Array[String]]] to an RDD[(String,String)]?
– Jack Leow
Nov 20 at 2:45

1

@JackLeow yes I am trying to map RDD[Array[Array[String]]] to an RDD[(String,String)]. Sorry if I was not being clear enough.
– AntarianCoder
Nov 20 at 2:50

add a comment |

I am new to scala and I am trying to make a Tuple pair out an RDD of type Array(Array[String]) that looks like:

(122abc,223cde,334vbn,445das),(221bca,321dsa),(231dsa,653asd,698poq,897qwa)

I am trying to create Tuple Pairs out of these arrays so that the first element of each array is key and and any other part of the array is a value. For example the output would look like:

122abc    223cde

122abc    334vbn

122abc    445das

221bca    321dsa

231dsa    653asd

231dsa    698poq

231dsa    897qwa

I can't figure out how to separate the first element from each array and then map it to every other element.

edited Nov 20 at 1:44

asked Nov 20 at 1:39

AntarianCoder

204

I am new to scala and I am trying to make a Tuple pair out an RDD of type Array(Array[String]) that looks like:

(122abc,223cde,334vbn,445das),(221bca,321dsa),(231dsa,653asd,698poq,897qwa)

I am trying to create Tuple Pairs out of these arrays so that the first element of each array is key and and any other part of the array is a value. For example the output would look like:

122abc    223cde

122abc    334vbn

122abc    445das

221bca    321dsa

231dsa    653asd

231dsa    698poq

231dsa    897qwa

I can't figure out how to separate the first element from each array and then map it to every other element.

arrays scala apache-spark rdd

edited Nov 20 at 1:44

asked Nov 20 at 1:39

AntarianCoder

204

edited Nov 20 at 1:44

asked Nov 20 at 1:39

AntarianCoder

204

edited Nov 20 at 1:44

asked Nov 20 at 1:39

AntarianCoder

204

asked Nov 20 at 1:39

AntarianCoder

204

asked Nov 20 at 1:39

AntarianCoder

204

Why do you have two of 221bca 321dsa?
– smac89
Nov 20 at 1:43

@smac89 that was a typo sorry. Changed now.
– AntarianCoder
Nov 20 at 1:45

Are you trying to map an RDD[Array[Array[String]]] to an RDD[(String,String)]?
– Jack Leow
Nov 20 at 2:45

1

@JackLeow yes I am trying to map RDD[Array[Array[String]]] to an RDD[(String,String)]. Sorry if I was not being clear enough.
– AntarianCoder
Nov 20 at 2:50

add a comment |

Why do you have two of 221bca 321dsa?
– smac89
Nov 20 at 1:43

@smac89 that was a typo sorry. Changed now.
– AntarianCoder
Nov 20 at 1:45

Are you trying to map an RDD[Array[Array[String]]] to an RDD[(String,String)]?
– Jack Leow
Nov 20 at 2:45

1

@JackLeow yes I am trying to map RDD[Array[Array[String]]] to an RDD[(String,String)]. Sorry if I was not being clear enough.
– AntarianCoder
Nov 20 at 2:50

Why do you have two of 221bca 321dsa?
– smac89
Nov 20 at 1:43

@smac89 that was a typo sorry. Changed now.
– AntarianCoder
Nov 20 at 1:45

Are you trying to map an RDD[Array[Array[String]]] to an RDD[(String,String)]?
– Jack Leow
Nov 20 at 2:45

@JackLeow yes I am trying to map RDD[Array[Array[String]]] to an RDD[(String,String)]. Sorry if I was not being clear enough.
– AntarianCoder
Nov 20 at 2:50

add a comment |

4 Answers
4

active

oldest

votes

If I'm reading it correctly, the core of your question has to do with separating the head (first element) of the inner arrays from the tail (remaining elements), which you can use the head and tail methods. RDDs behave a lot like Scala lists, so you can do this all with what looks like pure Scala code.

Given the following input RDD:

val input: RDD[Array[Array[String]]] = sc.parallelize(

  Seq(

    Array(

      Array("122abc","223cde","334vbn","445das"),

      Array("221bca","321dsa"),

      Array("231dsa","653asd","698poq","897qwa")

    )

  )

)

The following should do what you want:

val output: RDD[(String,String)] =

  input.flatMap { arrArrStr: Array[Array[String]] =>

    arrArrStr.flatMap { arrStrs: Array[String] =>

      arrStrs.tail.map { value => arrStrs.head -> value }

    }

  }

And in fact, because of how the flatMap/map is composed, you could re-write it as a for-comprehension.:

val output: RDD[(String,String)] =

  for {

    arrArrStr: Array[Array[String]] <- input

    arrStr: Array[String] <- arrArrStr

    str: String <- arrStr.tail

  } yield (arrStr.head -> str)

Which one you go with is ultimately a matter of personal preference (though in this case, I prefer the latter, as you don't have to indent code as much).

For verification:

output.collect().foreach(println)

Should print out:

(122abc,223cde)

(122abc,334vbn)

(122abc,445das)

(221bca,321dsa)

(231dsa,653asd)

(231dsa,698poq)

(231dsa,897qwa)

edited Nov 30 at 20:54

answered Nov 20 at 3:05

Jack Leow

18.1k34446

add a comment |

This is a classic fold operation; but folding in Spark is calling aggregate:

// Start with an empty array

data.aggregate(Array.empty[(String, String)]) { 

  // `arr.drop(1).map(e => (arr.head, e))` will create tuples of 

  // all elements in each row and the first element.

  // Append this to the aggregate array.

  case (acc, arr) => acc ++ arr.drop(1).map(e => (arr.head, e))

}

The solution is a non-Spark environment:

scala> val data = Array(Array("122abc","223cde","334vbn","445das"),Array("221bca","321dsa"),Array("231dsa","653asd","698poq","897qwa"))

scala> data.foldLeft(Array.empty[(String, String)]) { case (acc, arr) =>

     |     acc ++ arr.drop(1).map(e => (arr.head, e))

     | }

res0: Array[(String, String)] = Array((122abc,223cde), (122abc,334vbn), (122abc,445das), (221bca,321dsa), (231dsa,653asd), (231dsa,698poq), (231dsa,897qwa))

edited Nov 20 at 2:23

answered Nov 20 at 2:14

erip

10.2k43774

add a comment |

Convert your input element to seq and all and then try to write the wrapper which will give you List(List(item1,item2), List(item1,item2),...)

Try below code

val seqs = Seq("122abc","223cde","334vbn","445das")++

Seq("221bca","321dsa")++

Seq("231dsa","653asd","698poq","897qwa")

Write a wrapper to convert seq into a pair of two

def toPairs[A](xs: Seq[A]): Seq[(A,A)] = xs.zip(xs.tail)

Now send your seq as params and it it will give your pair of two

toPairs(seqs).mkString(" ")

After making it to string you will get the output like

res8: String = (122abc,223cde) (223cde,334vbn) (334vbn,445das) (445das,221bca) (221bca,321dsa) (321dsa,231dsa) (231dsa,653asd) (653asd,698poq) (698poq,897qwa)

Now you can convert your string, however, you want.

edited Nov 20 at 2:29

answered Nov 20 at 2:24

Amit Prasad

538315

I'm not sure, but your output doesn't really look like OP's.
– erip
Nov 20 at 2:42

toPairs(seqs) will give you List(List(item1,item2),List(item1,item2)...) so it is pretty much which are supposed to come and then you can convert into however you want.
– Amit Prasad
Nov 20 at 2:47

No, that's not what OP wants. OP wants to create a single array of tuples where the tuples came from each subarray's first element combined with the rest of elements of the subarray for each subarray in the original RDD.
– erip
Nov 20 at 2:48

add a comment |

Using df and explode.

  val df =   Seq(

      Array("122abc","223cde","334vbn","445das"),

      Array("221bca","321dsa"),

      Array("231dsa","653asd","698poq","897qwa")

    ).toDF("arr")

    val df2 = df.withColumn("key", 'arr(0)).withColumn("values",explode('arr)).filter('key =!= 'values).drop('arr).withColumn("tuple",struct('key,'values))

    df2.show(false)

    df2.rdd.map( x => Row( (x(0),x(1)) )).collect.foreach(println)

Output:

+------+------+---------------+

|key   |values|tuple          |

+------+------+---------------+

|122abc|223cde|[122abc,223cde]|

|122abc|334vbn|[122abc,334vbn]|

|122abc|445das|[122abc,445das]|

|221bca|321dsa|[221bca,321dsa]|

|231dsa|653asd|[231dsa,653asd]|

|231dsa|698poq|[231dsa,698poq]|

|231dsa|897qwa|[231dsa,897qwa]|

+------+------+---------------+





[(122abc,223cde)]

[(122abc,334vbn)]

[(122abc,445das)]

[(221bca,321dsa)]

[(231dsa,653asd)]

[(231dsa,698poq)]

[(231dsa,897qwa)]

Update1:

Using paired rdd

val df =   Seq(

  Array("122abc","223cde","334vbn","445das"),

  Array("221bca","321dsa"),

  Array("231dsa","653asd","698poq","897qwa")

).toDF("arr")

import scala.collection.mutable._

val rdd1 = df.rdd.map( x => { val y = x.getAs[mutable.WrappedArray[String]]("arr")(0); (y,x)} )

val pair = new PairRDDFunctions(rdd1)

pair.flatMapValues( x => x.getAs[mutable.WrappedArray[String]]("arr") )

    .filter( x=> x._1 != x._2)

    .collect.foreach(println)

Results:

(122abc,223cde)

(122abc,334vbn)

(122abc,445das)

(221bca,321dsa)

(231dsa,653asd)

(231dsa,698poq)

(231dsa,897qwa)

edited Nov 20 at 15:51

answered Nov 20 at 10:25

stack0114106

1,9751416

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385051%2fcreate-tuple-out-of-arrayarraystring-of-varying-sizes-using-scala%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Given the following input RDD:

val input: RDD[Array[Array[String]]] = sc.parallelize(

  Seq(

    Array(

      Array("122abc","223cde","334vbn","445das"),

      Array("221bca","321dsa"),

      Array("231dsa","653asd","698poq","897qwa")

    )

  )

)

The following should do what you want:

val output: RDD[(String,String)] =

  input.flatMap { arrArrStr: Array[Array[String]] =>

    arrArrStr.flatMap { arrStrs: Array[String] =>

      arrStrs.tail.map { value => arrStrs.head -> value }

    }

  }

And in fact, because of how the flatMap/map is composed, you could re-write it as a for-comprehension.:

val output: RDD[(String,String)] =

  for {

    arrArrStr: Array[Array[String]] <- input

    arrStr: Array[String] <- arrArrStr

    str: String <- arrStr.tail

  } yield (arrStr.head -> str)

Which one you go with is ultimately a matter of personal preference (though in this case, I prefer the latter, as you don't have to indent code as much).

For verification:

output.collect().foreach(println)

Should print out:

(122abc,223cde)

(122abc,334vbn)

(122abc,445das)

(221bca,321dsa)

(231dsa,653asd)

(231dsa,698poq)

(231dsa,897qwa)

edited Nov 30 at 20:54

answered Nov 20 at 3:05

Jack Leow

18.1k34446

add a comment |

Given the following input RDD:

val input: RDD[Array[Array[String]]] = sc.parallelize(

  Seq(

    Array(

      Array("122abc","223cde","334vbn","445das"),

      Array("221bca","321dsa"),

      Array("231dsa","653asd","698poq","897qwa")

    )

  )

)

The following should do what you want:

val output: RDD[(String,String)] =

  input.flatMap { arrArrStr: Array[Array[String]] =>

    arrArrStr.flatMap { arrStrs: Array[String] =>

      arrStrs.tail.map { value => arrStrs.head -> value }

    }

  }

And in fact, because of how the flatMap/map is composed, you could re-write it as a for-comprehension.:

val output: RDD[(String,String)] =

  for {

    arrArrStr: Array[Array[String]] <- input

    arrStr: Array[String] <- arrArrStr

    str: String <- arrStr.tail

  } yield (arrStr.head -> str)

Which one you go with is ultimately a matter of personal preference (though in this case, I prefer the latter, as you don't have to indent code as much).

For verification:

output.collect().foreach(println)

Should print out:

(122abc,223cde)

(122abc,334vbn)

(122abc,445das)

(221bca,321dsa)

(231dsa,653asd)

(231dsa,698poq)

(231dsa,897qwa)

edited Nov 30 at 20:54

answered Nov 20 at 3:05

Jack Leow

18.1k34446

add a comment |

Given the following input RDD:

val input: RDD[Array[Array[String]]] = sc.parallelize(

  Seq(

    Array(

      Array("122abc","223cde","334vbn","445das"),

      Array("221bca","321dsa"),

      Array("231dsa","653asd","698poq","897qwa")

    )

  )

)

The following should do what you want:

val output: RDD[(String,String)] =

  input.flatMap { arrArrStr: Array[Array[String]] =>

    arrArrStr.flatMap { arrStrs: Array[String] =>

      arrStrs.tail.map { value => arrStrs.head -> value }

    }

  }

And in fact, because of how the flatMap/map is composed, you could re-write it as a for-comprehension.:

val output: RDD[(String,String)] =

  for {

    arrArrStr: Array[Array[String]] <- input

    arrStr: Array[String] <- arrArrStr

    str: String <- arrStr.tail

  } yield (arrStr.head -> str)

Which one you go with is ultimately a matter of personal preference (though in this case, I prefer the latter, as you don't have to indent code as much).

For verification:

output.collect().foreach(println)

Should print out:

(122abc,223cde)

(122abc,334vbn)

(122abc,445das)

(221bca,321dsa)

(231dsa,653asd)

(231dsa,698poq)

(231dsa,897qwa)

edited Nov 30 at 20:54

answered Nov 20 at 3:05

Jack Leow

18.1k34446

Given the following input RDD:

val input: RDD[Array[Array[String]]] = sc.parallelize(

  Seq(

    Array(

      Array("122abc","223cde","334vbn","445das"),

      Array("221bca","321dsa"),

      Array("231dsa","653asd","698poq","897qwa")

    )

  )

)

The following should do what you want:

val output: RDD[(String,String)] =

  input.flatMap { arrArrStr: Array[Array[String]] =>

    arrArrStr.flatMap { arrStrs: Array[String] =>

      arrStrs.tail.map { value => arrStrs.head -> value }

    }

  }

And in fact, because of how the flatMap/map is composed, you could re-write it as a for-comprehension.:

val output: RDD[(String,String)] =

  for {

    arrArrStr: Array[Array[String]] <- input

    arrStr: Array[String] <- arrArrStr

    str: String <- arrStr.tail

  } yield (arrStr.head -> str)

Which one you go with is ultimately a matter of personal preference (though in this case, I prefer the latter, as you don't have to indent code as much).

For verification:

output.collect().foreach(println)

Should print out:

(122abc,223cde)

(122abc,334vbn)

(122abc,445das)

(221bca,321dsa)

(231dsa,653asd)

(231dsa,698poq)

(231dsa,897qwa)

edited Nov 30 at 20:54

answered Nov 20 at 3:05

Jack Leow

18.1k34446

edited Nov 30 at 20:54

answered Nov 20 at 3:05

Jack Leow

18.1k34446

answered Nov 20 at 3:05

Jack Leow

18.1k34446

answered Nov 20 at 3:05

Jack Leow

18.1k34446

add a comment |

This is a classic fold operation; but folding in Spark is calling aggregate:

// Start with an empty array

data.aggregate(Array.empty[(String, String)]) { 

  // `arr.drop(1).map(e => (arr.head, e))` will create tuples of 

  // all elements in each row and the first element.

  // Append this to the aggregate array.

  case (acc, arr) => acc ++ arr.drop(1).map(e => (arr.head, e))

}

The solution is a non-Spark environment:

scala> val data = Array(Array("122abc","223cde","334vbn","445das"),Array("221bca","321dsa"),Array("231dsa","653asd","698poq","897qwa"))

scala> data.foldLeft(Array.empty[(String, String)]) { case (acc, arr) =>

     |     acc ++ arr.drop(1).map(e => (arr.head, e))

     | }

res0: Array[(String, String)] = Array((122abc,223cde), (122abc,334vbn), (122abc,445das), (221bca,321dsa), (231dsa,653asd), (231dsa,698poq), (231dsa,897qwa))

edited Nov 20 at 2:23

answered Nov 20 at 2:14

erip

10.2k43774

add a comment |

This is a classic fold operation; but folding in Spark is calling aggregate:

// Start with an empty array

data.aggregate(Array.empty[(String, String)]) { 

  // `arr.drop(1).map(e => (arr.head, e))` will create tuples of 

  // all elements in each row and the first element.

  // Append this to the aggregate array.

  case (acc, arr) => acc ++ arr.drop(1).map(e => (arr.head, e))

}

The solution is a non-Spark environment:

scala> val data = Array(Array("122abc","223cde","334vbn","445das"),Array("221bca","321dsa"),Array("231dsa","653asd","698poq","897qwa"))

scala> data.foldLeft(Array.empty[(String, String)]) { case (acc, arr) =>

     |     acc ++ arr.drop(1).map(e => (arr.head, e))

     | }

res0: Array[(String, String)] = Array((122abc,223cde), (122abc,334vbn), (122abc,445das), (221bca,321dsa), (231dsa,653asd), (231dsa,698poq), (231dsa,897qwa))

edited Nov 20 at 2:23

answered Nov 20 at 2:14

erip

10.2k43774

add a comment |

This is a classic fold operation; but folding in Spark is calling aggregate:

// Start with an empty array

data.aggregate(Array.empty[(String, String)]) { 

  // `arr.drop(1).map(e => (arr.head, e))` will create tuples of 

  // all elements in each row and the first element.

  // Append this to the aggregate array.

  case (acc, arr) => acc ++ arr.drop(1).map(e => (arr.head, e))

}

The solution is a non-Spark environment:

scala> val data = Array(Array("122abc","223cde","334vbn","445das"),Array("221bca","321dsa"),Array("231dsa","653asd","698poq","897qwa"))

scala> data.foldLeft(Array.empty[(String, String)]) { case (acc, arr) =>

     |     acc ++ arr.drop(1).map(e => (arr.head, e))

     | }

res0: Array[(String, String)] = Array((122abc,223cde), (122abc,334vbn), (122abc,445das), (221bca,321dsa), (231dsa,653asd), (231dsa,698poq), (231dsa,897qwa))

edited Nov 20 at 2:23

answered Nov 20 at 2:14

erip

10.2k43774

This is a classic fold operation; but folding in Spark is calling aggregate:

// Start with an empty array

data.aggregate(Array.empty[(String, String)]) { 

  // `arr.drop(1).map(e => (arr.head, e))` will create tuples of 

  // all elements in each row and the first element.

  // Append this to the aggregate array.

  case (acc, arr) => acc ++ arr.drop(1).map(e => (arr.head, e))

}

The solution is a non-Spark environment:

scala> val data = Array(Array("122abc","223cde","334vbn","445das"),Array("221bca","321dsa"),Array("231dsa","653asd","698poq","897qwa"))

scala> data.foldLeft(Array.empty[(String, String)]) { case (acc, arr) =>

     |     acc ++ arr.drop(1).map(e => (arr.head, e))

     | }

res0: Array[(String, String)] = Array((122abc,223cde), (122abc,334vbn), (122abc,445das), (221bca,321dsa), (231dsa,653asd), (231dsa,698poq), (231dsa,897qwa))

edited Nov 20 at 2:23

answered Nov 20 at 2:14

erip

10.2k43774

edited Nov 20 at 2:23

answered Nov 20 at 2:14

erip

10.2k43774

answered Nov 20 at 2:14

erip

10.2k43774

answered Nov 20 at 2:14

erip

10.2k43774

add a comment |

Convert your input element to seq and all and then try to write the wrapper which will give you List(List(item1,item2), List(item1,item2),...)

Try below code

val seqs = Seq("122abc","223cde","334vbn","445das")++

Seq("221bca","321dsa")++

Seq("231dsa","653asd","698poq","897qwa")

Write a wrapper to convert seq into a pair of two

def toPairs[A](xs: Seq[A]): Seq[(A,A)] = xs.zip(xs.tail)

Now send your seq as params and it it will give your pair of two

toPairs(seqs).mkString(" ")

After making it to string you will get the output like

res8: String = (122abc,223cde) (223cde,334vbn) (334vbn,445das) (445das,221bca) (221bca,321dsa) (321dsa,231dsa) (231dsa,653asd) (653asd,698poq) (698poq,897qwa)

Now you can convert your string, however, you want.

edited Nov 20 at 2:29

answered Nov 20 at 2:24

Amit Prasad

538315

I'm not sure, but your output doesn't really look like OP's.
– erip
Nov 20 at 2:42

toPairs(seqs) will give you List(List(item1,item2),List(item1,item2)...) so it is pretty much which are supposed to come and then you can convert into however you want.
– Amit Prasad
Nov 20 at 2:47

No, that's not what OP wants. OP wants to create a single array of tuples where the tuples came from each subarray's first element combined with the rest of elements of the subarray for each subarray in the original RDD.
– erip
Nov 20 at 2:48

add a comment |

Convert your input element to seq and all and then try to write the wrapper which will give you List(List(item1,item2), List(item1,item2),...)

Try below code

val seqs = Seq("122abc","223cde","334vbn","445das")++

Seq("221bca","321dsa")++

Seq("231dsa","653asd","698poq","897qwa")

Write a wrapper to convert seq into a pair of two

def toPairs[A](xs: Seq[A]): Seq[(A,A)] = xs.zip(xs.tail)

Now send your seq as params and it it will give your pair of two

toPairs(seqs).mkString(" ")

After making it to string you will get the output like

res8: String = (122abc,223cde) (223cde,334vbn) (334vbn,445das) (445das,221bca) (221bca,321dsa) (321dsa,231dsa) (231dsa,653asd) (653asd,698poq) (698poq,897qwa)

Now you can convert your string, however, you want.

edited Nov 20 at 2:29

answered Nov 20 at 2:24

Amit Prasad

538315

I'm not sure, but your output doesn't really look like OP's.
– erip
Nov 20 at 2:42

toPairs(seqs) will give you List(List(item1,item2),List(item1,item2)...) so it is pretty much which are supposed to come and then you can convert into however you want.
– Amit Prasad
Nov 20 at 2:47

No, that's not what OP wants. OP wants to create a single array of tuples where the tuples came from each subarray's first element combined with the rest of elements of the subarray for each subarray in the original RDD.
– erip
Nov 20 at 2:48

add a comment |

Convert your input element to seq and all and then try to write the wrapper which will give you List(List(item1,item2), List(item1,item2),...)

Try below code

val seqs = Seq("122abc","223cde","334vbn","445das")++

Seq("221bca","321dsa")++

Seq("231dsa","653asd","698poq","897qwa")

Write a wrapper to convert seq into a pair of two

def toPairs[A](xs: Seq[A]): Seq[(A,A)] = xs.zip(xs.tail)

Now send your seq as params and it it will give your pair of two

toPairs(seqs).mkString(" ")

After making it to string you will get the output like

res8: String = (122abc,223cde) (223cde,334vbn) (334vbn,445das) (445das,221bca) (221bca,321dsa) (321dsa,231dsa) (231dsa,653asd) (653asd,698poq) (698poq,897qwa)

Now you can convert your string, however, you want.

edited Nov 20 at 2:29

answered Nov 20 at 2:24

Amit Prasad

538315

Convert your input element to seq and all and then try to write the wrapper which will give you List(List(item1,item2), List(item1,item2),...)

Try below code

val seqs = Seq("122abc","223cde","334vbn","445das")++

Seq("221bca","321dsa")++

Seq("231dsa","653asd","698poq","897qwa")

Write a wrapper to convert seq into a pair of two

def toPairs[A](xs: Seq[A]): Seq[(A,A)] = xs.zip(xs.tail)

Now send your seq as params and it it will give your pair of two

toPairs(seqs).mkString(" ")

After making it to string you will get the output like

res8: String = (122abc,223cde) (223cde,334vbn) (334vbn,445das) (445das,221bca) (221bca,321dsa) (321dsa,231dsa) (231dsa,653asd) (653asd,698poq) (698poq,897qwa)

Now you can convert your string, however, you want.

edited Nov 20 at 2:29

answered Nov 20 at 2:24

Amit Prasad

538315

edited Nov 20 at 2:29

answered Nov 20 at 2:24

Amit Prasad

538315

answered Nov 20 at 2:24

Amit Prasad

538315

answered Nov 20 at 2:24

Amit Prasad

538315

I'm not sure, but your output doesn't really look like OP's.
– erip
Nov 20 at 2:42

toPairs(seqs) will give you List(List(item1,item2),List(item1,item2)...) so it is pretty much which are supposed to come and then you can convert into however you want.
– Amit Prasad
Nov 20 at 2:47

No, that's not what OP wants. OP wants to create a single array of tuples where the tuples came from each subarray's first element combined with the rest of elements of the subarray for each subarray in the original RDD.
– erip
Nov 20 at 2:48

add a comment |

I'm not sure, but your output doesn't really look like OP's.
– erip
Nov 20 at 2:42

toPairs(seqs) will give you List(List(item1,item2),List(item1,item2)...) so it is pretty much which are supposed to come and then you can convert into however you want.
– Amit Prasad
Nov 20 at 2:47

No, that's not what OP wants. OP wants to create a single array of tuples where the tuples came from each subarray's first element combined with the rest of elements of the subarray for each subarray in the original RDD.
– erip
Nov 20 at 2:48

I'm not sure, but your output doesn't really look like OP's.
– erip
Nov 20 at 2:42

toPairs(seqs) will give you List(List(item1,item2),List(item1,item2)...) so it is pretty much which are supposed to come and then you can convert into however you want.
– Amit Prasad
Nov 20 at 2:47

No, that's not what OP wants. OP wants to create a single array of tuples where the tuples came from each subarray's first element combined with the rest of elements of the subarray for each subarray in the original RDD.
– erip
Nov 20 at 2:48

add a comment |

Using df and explode.

  val df =   Seq(

      Array("122abc","223cde","334vbn","445das"),

      Array("221bca","321dsa"),

      Array("231dsa","653asd","698poq","897qwa")

    ).toDF("arr")

    val df2 = df.withColumn("key", 'arr(0)).withColumn("values",explode('arr)).filter('key =!= 'values).drop('arr).withColumn("tuple",struct('key,'values))

    df2.show(false)

    df2.rdd.map( x => Row( (x(0),x(1)) )).collect.foreach(println)

Output:

+------+------+---------------+

|key   |values|tuple          |

+------+------+---------------+

|122abc|223cde|[122abc,223cde]|

|122abc|334vbn|[122abc,334vbn]|

|122abc|445das|[122abc,445das]|

|221bca|321dsa|[221bca,321dsa]|

|231dsa|653asd|[231dsa,653asd]|

|231dsa|698poq|[231dsa,698poq]|

|231dsa|897qwa|[231dsa,897qwa]|

+------+------+---------------+





[(122abc,223cde)]

[(122abc,334vbn)]

[(122abc,445das)]

[(221bca,321dsa)]

[(231dsa,653asd)]

[(231dsa,698poq)]

[(231dsa,897qwa)]

Update1:

Using paired rdd

val df =   Seq(

  Array("122abc","223cde","334vbn","445das"),

  Array("221bca","321dsa"),

  Array("231dsa","653asd","698poq","897qwa")

).toDF("arr")

import scala.collection.mutable._

val rdd1 = df.rdd.map( x => { val y = x.getAs[mutable.WrappedArray[String]]("arr")(0); (y,x)} )

val pair = new PairRDDFunctions(rdd1)

pair.flatMapValues( x => x.getAs[mutable.WrappedArray[String]]("arr") )

    .filter( x=> x._1 != x._2)

    .collect.foreach(println)

Results:

(122abc,223cde)

(122abc,334vbn)

(122abc,445das)

(221bca,321dsa)

(231dsa,653asd)

(231dsa,698poq)

(231dsa,897qwa)

edited Nov 20 at 15:51

answered Nov 20 at 10:25

stack0114106

1,9751416

add a comment |

Using df and explode.

  val df =   Seq(

      Array("122abc","223cde","334vbn","445das"),

      Array("221bca","321dsa"),

      Array("231dsa","653asd","698poq","897qwa")

    ).toDF("arr")

    val df2 = df.withColumn("key", 'arr(0)).withColumn("values",explode('arr)).filter('key =!= 'values).drop('arr).withColumn("tuple",struct('key,'values))

    df2.show(false)

    df2.rdd.map( x => Row( (x(0),x(1)) )).collect.foreach(println)

Output:

+------+------+---------------+

|key   |values|tuple          |

+------+------+---------------+

|122abc|223cde|[122abc,223cde]|

|122abc|334vbn|[122abc,334vbn]|

|122abc|445das|[122abc,445das]|

|221bca|321dsa|[221bca,321dsa]|

|231dsa|653asd|[231dsa,653asd]|

|231dsa|698poq|[231dsa,698poq]|

|231dsa|897qwa|[231dsa,897qwa]|

+------+------+---------------+





[(122abc,223cde)]

[(122abc,334vbn)]

[(122abc,445das)]

[(221bca,321dsa)]

[(231dsa,653asd)]

[(231dsa,698poq)]

[(231dsa,897qwa)]

Update1:

Using paired rdd

val df =   Seq(

  Array("122abc","223cde","334vbn","445das"),

  Array("221bca","321dsa"),

  Array("231dsa","653asd","698poq","897qwa")

).toDF("arr")

import scala.collection.mutable._

val rdd1 = df.rdd.map( x => { val y = x.getAs[mutable.WrappedArray[String]]("arr")(0); (y,x)} )

val pair = new PairRDDFunctions(rdd1)

pair.flatMapValues( x => x.getAs[mutable.WrappedArray[String]]("arr") )

    .filter( x=> x._1 != x._2)

    .collect.foreach(println)

Results:

(122abc,223cde)

(122abc,334vbn)

(122abc,445das)

(221bca,321dsa)

(231dsa,653asd)

(231dsa,698poq)

(231dsa,897qwa)

edited Nov 20 at 15:51

answered Nov 20 at 10:25

stack0114106

1,9751416

add a comment |

Using df and explode.

  val df =   Seq(

      Array("122abc","223cde","334vbn","445das"),

      Array("221bca","321dsa"),

      Array("231dsa","653asd","698poq","897qwa")

    ).toDF("arr")

    val df2 = df.withColumn("key", 'arr(0)).withColumn("values",explode('arr)).filter('key =!= 'values).drop('arr).withColumn("tuple",struct('key,'values))

    df2.show(false)

    df2.rdd.map( x => Row( (x(0),x(1)) )).collect.foreach(println)

Output:

+------+------+---------------+

|key   |values|tuple          |

+------+------+---------------+

|122abc|223cde|[122abc,223cde]|

|122abc|334vbn|[122abc,334vbn]|

|122abc|445das|[122abc,445das]|

|221bca|321dsa|[221bca,321dsa]|

|231dsa|653asd|[231dsa,653asd]|

|231dsa|698poq|[231dsa,698poq]|

|231dsa|897qwa|[231dsa,897qwa]|

+------+------+---------------+





[(122abc,223cde)]

[(122abc,334vbn)]

[(122abc,445das)]

[(221bca,321dsa)]

[(231dsa,653asd)]

[(231dsa,698poq)]

[(231dsa,897qwa)]

Update1:

Using paired rdd

val df =   Seq(

  Array("122abc","223cde","334vbn","445das"),

  Array("221bca","321dsa"),

  Array("231dsa","653asd","698poq","897qwa")

).toDF("arr")

import scala.collection.mutable._

val rdd1 = df.rdd.map( x => { val y = x.getAs[mutable.WrappedArray[String]]("arr")(0); (y,x)} )

val pair = new PairRDDFunctions(rdd1)

pair.flatMapValues( x => x.getAs[mutable.WrappedArray[String]]("arr") )

    .filter( x=> x._1 != x._2)

    .collect.foreach(println)

Results:

(122abc,223cde)

(122abc,334vbn)

(122abc,445das)

(221bca,321dsa)

(231dsa,653asd)

(231dsa,698poq)

(231dsa,897qwa)

edited Nov 20 at 15:51

answered Nov 20 at 10:25

stack0114106

1,9751416

Using df and explode.

  val df =   Seq(

      Array("122abc","223cde","334vbn","445das"),

      Array("221bca","321dsa"),

      Array("231dsa","653asd","698poq","897qwa")

    ).toDF("arr")

    val df2 = df.withColumn("key", 'arr(0)).withColumn("values",explode('arr)).filter('key =!= 'values).drop('arr).withColumn("tuple",struct('key,'values))

    df2.show(false)

    df2.rdd.map( x => Row( (x(0),x(1)) )).collect.foreach(println)

Output:

+------+------+---------------+

|key   |values|tuple          |

+------+------+---------------+

|122abc|223cde|[122abc,223cde]|

|122abc|334vbn|[122abc,334vbn]|

|122abc|445das|[122abc,445das]|

|221bca|321dsa|[221bca,321dsa]|

|231dsa|653asd|[231dsa,653asd]|

|231dsa|698poq|[231dsa,698poq]|

|231dsa|897qwa|[231dsa,897qwa]|

+------+------+---------------+





[(122abc,223cde)]

[(122abc,334vbn)]

[(122abc,445das)]

[(221bca,321dsa)]

[(231dsa,653asd)]

[(231dsa,698poq)]

[(231dsa,897qwa)]

Update1:

Using paired rdd

val df =   Seq(

  Array("122abc","223cde","334vbn","445das"),

  Array("221bca","321dsa"),

  Array("231dsa","653asd","698poq","897qwa")

).toDF("arr")

import scala.collection.mutable._

val rdd1 = df.rdd.map( x => { val y = x.getAs[mutable.WrappedArray[String]]("arr")(0); (y,x)} )

val pair = new PairRDDFunctions(rdd1)

pair.flatMapValues( x => x.getAs[mutable.WrappedArray[String]]("arr") )

    .filter( x=> x._1 != x._2)

    .collect.foreach(println)

Results:

(122abc,223cde)

(122abc,334vbn)

(122abc,445das)

(221bca,321dsa)

(231dsa,653asd)

(231dsa,698poq)

(231dsa,897qwa)

edited Nov 20 at 15:51

answered Nov 20 at 10:25

stack0114106

1,9751416

edited Nov 20 at 15:51

answered Nov 20 at 10:25

stack0114106

1,9751416

answered Nov 20 at 10:25

stack0114106

1,9751416

answered Nov 20 at 10:25

stack0114106

1,9751416

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr