Main Collections Traits

Raffi Khatchadourian (based on “Scala for the Impatient” 2nd Edition by Cay Horstmann)
October 20, 2020

IterablesAn Iterable is any collection that can yield an Iterator, which allows you to systematically access each element of the collection (see the Iterator Pattern).
SequencesSeq is an ordered sequence of values like an array or a List.IndexedSeq allows fast random access through an integer index.An ArrayBuffer is an IndexedSeq but a LinkedList is not.
SetsSet is an unordered collection of values.SortedSet maintains an sorted visitation order.
MapsMap is a set of (key, value) pairs.SortedMap maintains a sorted visitation order by the keys.Each Scala collection trait or class has a companion object with an apply method that constructs instances of the collection.
toSeq, toSet, toMap, etc. to convert between collection types.to[C] for type C (the target collection type).scala.collection.mutable.Mapscala.collection.immutable.MapThe preference is to immutability.
Compute the set of all digits of an integer:

Vector is the immutable equivalent of an ArrayBuffer.Range is an integer sequence, e.g., 1 to 10 or 10.to(30, 10).Nil (empty) or an object with a head and a tail.tail is also a List.car and cdr operations in Lisp.val lst = List(4, 2) // : List[Int] = List(4, 2)
lst.head // : Int = 4
lst.tail // : List[Int] = List(2)
lst.tail.head // : Int = 2
lst.tail.tail // : List[Int] = List()ListsYou can use :: to create a List with a given head and tail:
List(9, 4, 2).cons operator in Lisp for constructing lists.ListsNatural fit for recursion:
def sum(lst: List[Int]): Int =
if (lst == Nil) 0 else lst.head + sum(lst.tail)
sum(List(9, 4, 2)) // : Int = 15Use pattern matching:
def sum(lst: List[Int]): Int = lst match {
case Nil => 0
case h :: t => h + sum(t) // h is lst.head, t is lst.tail
}
sum(List(9, 4, 2)) // : Int = 15This is just for demonstration purposes. Should really just use the built-in method:
SetsSet is an unordered collection of unique elements.Set that already exists in the Set has no effect.Set(2, 0, 1) + 1 == Set(2, 0, 1) // : Boolean = true
Set(2, 0, 1) + 4 == Set(2, 0, 1, 4) // : Boolean = trueLinkedHashSetsElements are traversed in the order for which they were inserted:
val weekdays = scala.collection.mutable.LinkedHashSet("Mo", "Tu", "We", "Th", "Fr")
weekdays.mkString(", ") // : String = Mo, Tu, We, Th, FSortedSetsElements are traversed in the sorted order:
Set Operations+ is used for adding elements to unordered collections.+: and :+ are for pre-pending and appending elements to ordered collections, respectively.Vector(1, 2, 3) :+ 5 // : Vector[Int] = Vector(1, 2, 3, 5)
1 +: Vector(1, 2, 3) // : Vector[Int] = Vector(1, 1, 2, 3)val numbers = ArrayBuffer(1, 2, 3) // : ArrayBuffer[Int] = ArrayBuffer(1, 2, 3)
numbers += 5 // : ArrayBuffer[Int] = ArrayBuffer(1, 2, 3, 5)map applies a function to each element of a collection and results in another collection containing the mapped elements.Same as:
However, the map version is preferred as it is more easily parallelizable (more on this later.)
If a function returns a collection, you can “flatten” each of these into a single result using flatMap.
Consider the following function that returns a collection:
Now, let’s map each of the names to the value produced by this function:
names.map(ulcase) // : List[Vector[String]] = Vector("PETER", "peter"), Vector("PAUL", "paul"), Vector("MARY", "mary")Suppose we just want a List of names rather than a List of Vectors of names:
transform instead of map.collect works with partial functions, i.e., those that may not be defined for all inputs.To apply a function where the return value is not important (e.g., it returns Unit), use foreach:
Outputs:
Peter
Paul
Mary
c.reduceLeft(op) applies op to successive elements.
reduceRightDoes the same but starts at the end of the collection:
Expands to: 1 - (7 - (2 - 9)) = 1 - 7 + 2 - 9 = -13
coll.foldLeft(init)(op).QUESTION: What kind of method is this?
QUESTION: Why use this kind of method?

Count the frequencies of letters in a String.
Map:val freq = scala.collection.mutable.Map[Char, Int]()
for (c <- "Mississippi") freq(c) = freq.getOrElse(c, 0) + 1After this code, freq becomes:
collection.mutable.Map[Char,Int] = Map(M -> 1, s -> 4, p -> 2, i -> 4)

op is the partially filled map.op is the new letter.Map[Char, Int]() is the initial, empty map.scanLeft and scanRight combine folding and mapping.Suppose we want to combine the following two collections:
Specifically, we want a list of pairs of corresponding elements from each list: (5.0, 10), (20.0, 2), ....
Apply a function to each pair:
Total price of all items:
zipAll:
List(5.0, 20.0, 9.95).zipAll(List(10, 2), 0.0, 1) // : List[(Double, Int)] = List((5.0,10), (20.0,2), (9.95,1))zipWithIndex:"Scala".zipWithIndex // : collection.immutable.IndexedSeq[(Char, Int)] = Vector((S,0), (c,1), (a,2), (l,3), (a,4))iterator method.Source.fromFile returns an iterator to avoid reading an entire file into memory.
IteratorIteratorOr:
Vector(1, 2, 3)
Vector(2, 3, 4)
Vector(3, 4, 5)
Vector(4, 5, 6)
Vector(5, 6, 7)
Vector(6, 7, 8)
Vector(7, 8, 9)
Vector(8, 9, 10)
Obtaining the number of elements “exhausts” the iterator:
Iterator to a CollectionIteratorIterator without consuming it.LazyListsStreams in Scala version 2.12.x.
LazyList is fully lazy.Stream, in Scala 2.12.x, was only tail lazy.Iterators are “lazy” alternatives to Collections.
Collections, useful for large or computationally intensive collections to construct.Iterators are fragile.
next (and even length) change the iterator.LazyList implements an immutable linked list.
LazyLists compute their elements on-demand, they can be infinite collections!
count, sum, max or min) will not terminate.LazyList Examples#:: operator is similar to :: for Lists.tail is unevaluated:head to compute values:LazyListsLazyLists are deferred:val squares = numsFrom(1).map(x => x * x)
// : scala.collection.immutable.LazyList[scala.math.BigInt] = LazyList(<not computed>)head to force the evaluation of the next element:To get multiple elements, you need to invoke operations that force the “realization” of the list.
Outputs:
1
4
9
16
25
Alternatively, squares.take(5).force results in LazyList(1, 4, 9, 16, 25).
LazyLists from IteratorsLazyList from an Iterator.Source.getLines returns Iterator[String], which allows you to only visit each line once.LazyLists, on the other hand, elements are memoized; that is, the value of each element is computed at most once.
import scala.io.Source
val words = Source.fromFile("/usr/share/dict/words").getLines().to(LazyList)
words // : scala.collection.immutable.LazyList[String] = LazyList(<not computed>)
words.head // : String = A
words(5) // : String = AOL's
words // : scala.collection.immutable.LazyList[String] = LazyList(A, A's, AMD, AMD's, AOL, AOL's, <not computed>)Lists, other kinds of Collections can also be made “lazy” using the view method.view method returns a Collection whose methods’ execution is deferred.import scala.math._
val palindromicSquares = (1 to 1000000).view
.map(x => x * x)
.filter(x => x.toString == x.toString.reverse) // : scala.collection.View[Int] = View(<not computed>)LazyLists, this View also is not yet computed.LazyLists, the results are not memoized.
LazyLists, the force method will force the computation of elements.apply method forces evaluation of the entire collection.Collection.Collection.The following code will set the elements in the given slice to 0 but leave the remaining elements unchanged:
To sum a large collection coll in parallel, simply write coll.par.sum.
par Methodpar method of Collection results a parallel implementation of the Collection.Collection implementation will execute parallel versions of its operations.Count the even numbers in coll by evaluating the predicate on sub-collections in parallel and combine the results:
You can even parallelize for loops using .par:
This prints the numbers in the order that the threads working on the task process them:
25 26 27 28 29 30 37 12 13 14 15 16 17 43 44 45 46 47 48 49 41 42 6 7 8 9 10 11 3 4 5 1 2 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 62 63 64 65 66 67 68 69 70 71 72 73 74 56 57 58 59 60 61 53 54 55 51 52 50 0 40 18 19 20 21 38 39 31 22 23 24 32 33 34 35 36
Compare this to the output of the sequential version, (0 until 100).foreach(i => print(s"$i ")):
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
You should never try to mutate shared (mutable) variables in parallel operations as the results are unpredictable.
In one run, count may be 186119. But, in another run of the same code, count may be 337736.
build.sbt:scalaVersion := "2.13.3"
libraryDependencies += "org.scala-lang.modules" %% "scala-parallel-collections" % "0.2.0".par methods, as well as the parallel collection hierarchy..par have types include the ParSeq, ParSet, and ParMap traits.Seq, Set, or Map.You can convert a parallel collection to a sequential one using seq:
As previously mentioned, not all operations are parallelizable, such as those maintaining data ordering.
Examples include reduceLeft and reduceRight.
Alternatively, you may use reduce, which operates on portions of the collection, combining the results.
However, the given operation must be associative, i.e., fulfilling (a op b) op c = a op (b op c).
Folding has a similar problem, and there is a fold method for that.
Unfortunately, it is not as flexible as both arguments must be elements of the collection.
Produces:
val str: String = 12345678910111213141516171819202122232425262728...aggregate to solve the problem of complex parallel folding.is equivalent to:
str.aggregate is deprecated for sequential collections.Write a function that receives a collection of strings and a map from strings to integers. Return a collection of integers that are values of the map corresponding to one of the strings in the collection. For example, given Array("Tom", "Fred", "Harry") and Map("Tom" -> 3, "Dick" -> 4, "Harry" -> 5), return Array(3, 5). Hint: Use flatMap to combine the Option values returned by get.
The method java.util.TimeZone.getAvailableIDs yields time zones such as Africa/Cairo and Asia/Chungking. Which continent has the most time zones? Hint: groupBy.
Harry Hacker reads a file into a String and wants to use a parallel collection to update the letter frequencies concurrently on portions of the String. He uses the following code:
val frequencies = new scala.collection.mutable.HashMap[Char, Int]
for (c <- str.par) frequencies(c) = frequencies.getOrElse(c, 0) + 1Why is this a terrible idea? How can he really parallelize the computation? (Hint: Use aggregate.)