Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
..........
val l = List(1,2,3,4,5)
l.map( x => x*2 )
.................
flattern (scala function)
val x = sc.parallelize(List("Venu", "satya", "goutami", "sumati", "supraja"))
val y = x.map(x => x.toUpperCase())
y.flatten // error
...................
flatmap() (scala function)
map+flattern
val flat = x.flatMap(x => x.toUpperCase())
..........
filter (scala )
Filter the data if it's true
val data = List("Venu", "satya", "goutami", "sumati", "supraja")
val x = sc.parallelize(data)
val y = x.filter(x => x.contains("venu"))
val y = x.filter(x => x.contains("Venu"))
val y = x.filter(x => x.startsWith("s"))
val y = x.filter(x => !x.startsWith("s"))
................
foreach
/////////////////////////////
val z = 1 to 9 toArray
z.groupBy(x => { if (x % 2 == 0) "even" else "odd" })
groupByKey(Spark function)
............... (touple)
val a = sc.parallelize(Array("dog", "tiger", "lion", "cat", "spider", "eagle"), 2)
val b = a.map(x => (x, x.length))
reduceByKey(Spark function)
.....................
val words = Array("one", "two", "two", "three", "three", "three")
val wordPairsRDD = sc.parallelize(words).map(word => (word, 1))
val wordCountsWithReduce = wordPairsRDD.reduceByKey((a,b)=> (a+b)).collect()
distinct(scala function)
..............................
val x = Array(3,44,3,44,33,4,66)
x.distinct
val x = Array(1,3,44,44,3,33,9)
val y = Array(22,3,8)
val z = x++y
val z = x.union(y)
c.sortByKey(true).collect
c.sortByKey(false).collect
val a = Array(1,2,3)
val b = Array(4,5,6)
val c = a.join(b)
joined.collect()
val leftJoined = rdd1.leftOuterJoin(rdd2)
leftJoined.collect()
val rightJoined = rdd1.rightOuterJoin(rdd2)
rightJoined.collect()
val fullJoined = rdd1.fullOuterJoin(rdd2).collect
.........................
val x = 1 to 9 toArray
x.drop(2).take(2)
//cogroup (scala)
val rdd1 = sc.parallelize(Seq(("key1", 1),("key2", 2),("key1", 3)))
val rdd2 = sc.parallelize(Seq(("key1", 5),("key2", 4)))
Actions:
..............
take(n) It's scala function
Return an array with the first n elements of the dataset.
..............
val rdd = Array(2,5,88,7,5,54,3)
rdd.take(3)
val rdd1 = sc.parallelize(Seq( ("Tendulkar", 55), ("Dravid", 56), ("Gambeer",
57), ("Dhoni", 58), ("Yuvaraj", 59), ("Pathan", 54)))
rdd1.take(2)
.......................
..............
val x = 5 to 20
x.reduce(_+_)
collect() Return all the elements of the dataset as an array at the driver
program. This is usually useful after a filter or other operation that returns a
sufficiently small subset of the data.