How to use Collections in Scala

Last updated on May 30 2022
Shakuntala Deskmukh

Table of Contents

How to use Collections in Scala

Scala – Collections

Scala has a rich set of collection library. Collections are containers of things. Those containers can be sequenced, linear sets of items like List, Tuple, Option, Map, etc. The collections may have an arbitrary number of elements or be bounded to zero or one element (e.g., Option).

Collections may be strict or lazy. Lazy collections have elements that may not consume memory until they are accessed, like Ranges. Additionally, collections may be mutable (the contents of the reference can change) or immutable (the thing that a reference refers to is never changed). Note that immutable collections may contain mutable items.

For some problems, mutable collections work better, and for others, immutable collections work better. When in doubt, it is better to start with an immutable collection and change it later if you need mutable ones.

This Blog throws light on the most commonly used collection types and most frequently used operations over those collections.

Sr.No Collections with Description
1 Scala Lists

Scala’s List[T] is a linked list of type T.

2 Scala Sets

A set is a collection of pairwise different elements of the same type.

3 Scala Maps

A Map is a collection of key/value pairs. Any value can be retrieved based on its key.

4 Scala Tuples

Unlike an array or list, a tuple can hold objects with different types.

5 Scala Options

Option[T] provides a container for zero or one element of a given type.

6 Scala Iterators

An iterator is not a collection, but rather a way to access the elements of a collection one by one.

Scala – Traits

A trait encapsulates method and field definitions, which can then be reused by mixing them into classes. Unlike class inheritance, in which each class must inherit from just one superclass, a class can mix in any number of traits.

Traits are used to define object types by specifying the signature of the supported methods. Scala also allows traits to be partially implemented but traits may not have constructor parameters.

A trait definition looks just like a class definition except that it uses the keyword trait. The subsequent is the basic example syntax of trait.

Syntax

trait Equal {

   def isEqual(x: Any): Boolean

   def isNotEqual(x: Any): Boolean = !isEqual(x)

}

This trait consists of two methods isEqual and isNotEqual. Here, we have not given any implementation for isEqual where as another method has its implementation. Child classes extending a trait can give implementation for the un-implemented methods. So a trait is very similar to what we have abstract classes in Java.

Let us assume an example of trait Equal contain two methods isEqual() and isNotEqual(). The trait Equal contain one implemented method that is isEqual() so when user defined class Point extends the trait Equal, implementation to isEqual() method in Point class should be provided.

Here it is required to know two important method of Scala, which are used in the subsequent example.

  • obj.isInstanceOf [Point] To check Type of obj and Point are same are not.
  • obj.asInstanceOf [Point] means exact casting by taking the object obj type and returns the same obj as Point type.

Try the subsequent example program to implement traits.

Example

trait Equal {

   def isEqual(x: Any): Boolean

   def isNotEqual(x: Any): Boolean = !isEqual(x)

}




class Point(xc: Int, yc: Int) extends Equal {

   var x: Int = xc

   var y: Int = yc

  

   def isEqual(obj: Any) = obj.isInstanceOf[Point] && obj.asInstanceOf[Point].x == y

}




object Demo {

   def main(args: Array[String]) {

      val p1 = new Point(2, 3)

      val p2 = new Point(2, 4)

      val p3 = new Point(3, 3)




      println(p1.isNotEqual(p2))

      println(p1.isNotEqual(p3))

      println(p1.isNotEqual(2))

   }

}

Save the above program in Demo.scala. The subsequent commands are used to compile and execute this program.

Command

\>scalac Demo.scala

\>scala Demo

Output

true

false

true

Value classes and Universal Traits

Value classes are new mechanism in Scala to avoid allocating runtime objects. It contains a primary constructor with exactly one val parameter. It contains only methods (def) not allowed var, val, nested classes, traits, or objects. Value class cannot be extended by another class. This can be possible by extending your value class with AnyVal. The typesafety of custom datatypes without the runtime overhead.

Let us take an examples of value classes Weight, Height, Email, Age, etc. For all these examples it is not required to allocate memory in the application.

A value class not allowed to extend traits. To permit value classes to extend traits, universal traits are introduced which extends for Any.

Example

trait Printable extends Any {

   def print(): Unit = println(this)

}

class Wrapper(val underlying: Int) extends AnyVal with Printable




object Demo {

   def main(args: Array[String]) {

      val w = new Wrapper(3)

      w.print() // actually requires instantiating a Wrapper instance

   }

}

Save the above program in Demo.scala. The subsequent commands are used to compile and execute this program.

Command

\>scalac Demo.scala

\>scala Demo

Output

It will give you the hash code of Wrapper class.

Wrapper@13

When to Use Traits?

There is no firm rule, but here are few guidelines to consider −

  • If the behavior will not be reused, then make it a concrete class. It is not reusable behavior after all.
  • If it might be reused in multiple, unrelated classes, make it a trait. Only traits can be mixed into different parts of the class hierarchy.
  • If you want to inherit from it in Java code, use an abstract class.
  • If you plan to distribute it in compiled form, and you expect outside groups to write classes inheriting from it, you might lean towards using an abstract class.
  • If efficiency is very important, lean towards using a class.

So, this brings us to the end of blog. This Tecklearn ‘How to use Collections in Scala’ helps you with commonly asked questions if you are looking out for a job in Apache Spark and Scala and Big Data Developer. If you wish to learn Apache Spark and Scala and build a career in Big Data Hadoop domain, then check out our interactive, Apache Spark and Scala Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/apache-spark-and-scala-certification/

Apache Spark and Scala Training

About the Course

Tecklearn Spark training lets you master real-time data processing using Spark streaming, Spark SQL, Spark RDD and Spark Machine Learning libraries (Spark MLlib). This Spark certification training helps you master the essential skills of the Apache Spark open-source framework and Scala programming language, including Spark Streaming, Spark SQL, machine learning programming, GraphX programming, and Shell Scripting Spark. You will also understand the role of Spark in overcoming the limitations of MapReduce. Upon completion of this online training, you will hold a solid understanding and hands-on experience with Apache Spark.

Why Should you take Apache Spark and Scala Training?

  • The average salary for Apache Spark developer ranges from approximately $93,486 per year for Developer to $128,313 per year for Data Engineer. – Indeed.com
  • Wells Fargo, Microsoft, Capital One, Apple, JPMorgan Chase & many other MNC’s worldwide use Apache Spark across industries.
  • Global Spark market revenue will grow to $4.2 billion by 2022 with a CAGR of 67% Marketanalysis.com

What you will Learn in this Course?

Introduction to Scala for Apache Spark

  • What is Scala
  • Why Scala for Spark
  • Scala in other Frameworks
  • Scala REPL
  • Basic Scala Operations
  • Variable Types in Scala
  • Control Structures in Scala
  • Loop, Functions and Procedures
  • Collections in Scala
  • Array Buffer, Map, Tuples, Lists

Functional Programming and OOPs Concepts in Scala

  • Functional Programming
  • Higher Order Functions
  • Anonymous Functions
  • Class in Scala
  • Getters and Setters
  • Custom Getters and Setters
  • Constructors in Scala
  • Singletons
  • Extending a Class using Method Overriding

Introduction to Spark

  • Introduction to Spark
  • How Spark overcomes the drawbacks of MapReduce
  • Concept of In Memory MapReduce
  • Interactive operations on MapReduce
  • Understanding Spark Stack
  • HDFS Revision and Spark Hadoop YARN
  • Overview of Spark and Why it is better than Hadoop
  • Deployment of Spark without Hadoop
  • Cloudera distribution and Spark history server

Basics of Spark

  • Spark Installation guide
  • Spark configuration and memory management
  • Driver Memory Versus Executor Memory
  • Working with Spark Shell
  • Resilient distributed datasets (RDD)
  • Functional programming in Spark and Understanding Architecture of Spark

Playing with Spark RDDs

  • Challenges in Existing Computing Methods
  • Probable Solution and How RDD Solves the Problem
  • What is RDD, It’s Operations, Transformations & Actions Data Loading and Saving Through RDDs
  • Key-Value Pair RDDs
  • Other Pair RDDs and Two Pair RDDs
  • RDD Lineage
  • RDD Persistence
  • Using RDD Concepts Write a Wordcount Program
  • Concept of RDD Partitioning and How It Helps Achieve Parallelization
  • Passing Functions to Spark

Writing and Deploying Spark Applications

  • Creating a Spark application using Scala or Java
  • Deploying a Spark application
  • Scala built application
  • Creating application using SBT
  • Deploying application using Maven
  • Web user interface of Spark application
  • A real-world example of Spark and configuring of Spark

Parallel Processing

  • Concept of Spark parallel processing
  • Overview of Spark partitions
  • File Based partitioning of RDDs
  • Concept of HDFS and data locality
  • Technique of parallel operations
  • Comparing coalesce and Repartition and RDD actions

Machine Learning using Spark MLlib

  • Why Machine Learning
  • What is Machine Learning
  • Applications of Machine Learning
  • Face Detection: USE CASE
  • Machine Learning Techniques
  • Introduction to MLlib
  • Features of MLlib and MLlib Tools
  • Various ML algorithms supported by MLlib

Integrating Apache Flume and Apache Kafka

  • Why Kafka, what is Kafka and Kafka architecture
  • Kafka workflow and Configuring Kafka cluster
  • Basic operations and Kafka monitoring tools
  • Integrating Apache Flume and Apache Kafka

Apache Spark Streaming

  • Why Streaming is Necessary
  • What is Spark Streaming
  • Spark Streaming Features
  • Spark Streaming Workflow
  • Streaming Context and DStreams
  • Transformations on DStreams
  • Describe Windowed Operators and Why it is Useful
  • Important Windowed Operators
  • Slice, Window and ReduceByWindow Operators
  • Stateful Operators

Improving Spark Performance

  • Learning about accumulators
  • The common performance issues and troubleshooting the performance problems

DataFrames and Spark SQL

  • Need for Spark SQL
  • What is Spark SQL
  • Spark SQL Architecture
  • SQL Context in Spark SQL
  • User Defined Functions
  • Data Frames and Datasets
  • Interoperating with RDDs
  • JSON and Parquet File Formats
  • Loading Data through Different Sources

Scheduling and Partitioning in Apache Spark

  • Concept of Scheduling and Partitioning in Spark
  • Hash partition and range partition
  • Scheduling applications
  • Static partitioning and dynamic sharing
  • Concept of Fair scheduling
  • Map partition with index and Zip
  • High Availability
  • Single-node Recovery with Local File System and High Order Functions
Got a question for us? Please mention it in the comments section and we will get back to you.

 

 

0 responses on "How to use Collections in Scala"

Leave a Message

Your email address will not be published. Required fields are marked *