Menu

How to create AWS EMR cluster with Hadoop, Hive and Spark on it

Today we will learn how to create an AWS EMR hadoop cluster with Spark on it
Steps:
  • Go to EMR and create a cluster
  • Select Core Hadoop option under Applications

  • Click "Go to advance options"
  • Select Spark under software configuration
    • Additionally, select Zeppelin
      • Zeppelin lets you use notebook to write spark queries/scripts
  • Click Next
  • Under General cluster settings, tick mark EMRFS consistent view
    • Consistent view provides consistency checking for list and read-after-write (for new put requests) for objects in Amazon S3
  • Create Key Value pair
    • To create an Amazon EC2 key pair:
      • On the Key Pairs page, click Create Key Pair
      • In the Create Key Pair dialog box, enter a name for your key pair, such as, mykeypair
      • Click Create
      • Save the resulting PEM file in a safe location
  • Specify Key Pair in the cluster settings
  • Click Next and create the cluster
  • Wait for the cluster to start
  • Once the cluster is up and running. you can now SSH to the cluster.

No comments:

Post a Comment