How to create AWS EMR cluster with Hadoop, Hive and Spark on it

Today we will learn how to create an AWS EMR hadoop cluster with Spark on it
Steps:
  • Go to EMR and create a cluster
  • Select Core Hadoop option under Applications

  • Click "Go to advance options"
  • Select Spark under software configuration
    • Additionally, select Zeppelin
      • Zeppelin lets you use notebook to write spark queries/scripts
  • Click Next
  • Under General cluster settings, tick mark EMRFS consistent view
    • Consistent view provides consistency checking for list and read-after-write (for new put requests) for objects in Amazon S3
  • Create Key Value pair
    • To create an Amazon EC2 key pair:
      • On the Key Pairs page, click Create Key Pair
      • In the Create Key Pair dialog box, enter a name for your key pair, such as, mykeypair
      • Click Create
      • Save the resulting PEM file in a safe location
  • Specify Key Pair in the cluster settings
  • Click Next and create the cluster
  • Wait for the cluster to start
  • Once the cluster is up and running. you can now SSH to the cluster.

No comments:

Post a Comment