Getting Started with Spark Shell

In this first example, we will help you guys in Getting Started with Spark Shell. We have two different ways to interact with Spark Engine

  1. Write a Scala, Python or Java program using Spark Libraries and API’s.
  2. Use the Spark Shell.

Spark Shell is very useful when you are trying your hands in Spark for the first time. It is also easy to test your scenarios, interact with the data-set and do some data manipulations.

To start the Spark Shell, just type spark-shell in your UNIX box and it will open the scala> prompt like the picture below.

I am using Spark version 1.6.

Now let us start with a simple program of trying to access a sample file in the UNIX file system and get the count of number of lines in the file. You can use any file in your UNIX box. I have used the spark LICENSE file for this example

Please note that to access a file in the UNIX system, you have to add the prefix file:// before the actual path.

If you are directing a path in the Hadoop file system, you don’t have to add the prefix. By default it will add the prefix hdfs: //nameservice1 to the path which you mention in the braces.

Now we have loaded the file and did the count of number of lines in the file. Let try to count the number of lines containing a particular word.

If you are new to Scala, you might wonder what is the use of the fat arrow (=>). That is a Scala function literal; it defines a function that takes a string and returns true or false, depending on whether line contains the word “Source” sub-string.

Filter is a keyword to filter the values for which the function has returned true.

You can shorten the code by using the Underscore option.

To print the output of the filter,

If the above command is not working , try the below command

Leave a Reply

Your email address will not be published. Required fields are marked *