Today we will learn how to create an AWS EMR hadoop cluster with Spark on it
Steps:
- Go to EMR and create a cluster
- Select Core Hadoop option under Applications
- Click "Go to advance options"
- Select Spark under software configuration
- Additionally, select Zeppelin
- Zeppelin lets you use notebook to write spark queries/scripts
- Click Next
- Under General cluster settings, tick mark EMRFS consistent view
- Consistent view provides consistency checking for list and read-after-write (for new put requests) for objects in Amazon S3
- Create Key Value pair
- To create an Amazon EC2 key pair:
- Go to the Amazon EC2 console
- In the Navigation pane, click Key Pairs
- On the Key Pairs page, click Create Key Pair
- In the Create Key Pair dialog box, enter a name for your key pair, such as, mykeypair
- Click Create
- Save the resulting PEM file in a safe location
- Specify Key Pair in the cluster settings
- Click Next and create the cluster
- Wait for the cluster to start
- Once the cluster is up and running. you can now SSH to the cluster.
No comments:
Post a Comment