Today we will learn on how to create a hive table (inside EMR cluster) with csv file stored in a S3 bucket
Steps:
- Go to your EMR cluster and copy the "Master Public DNS"
- This is the public ip of your master node
- if you are using a windows machine, download and install putty software for doing SSH into the master node
- Open the putty and login with your AWS key-value pair (pem file)
- In the login as: type hadoop
- you are now logged in the master node
- Create a S3 bucket and place a csv file in the bucket.
- for this test, i am using a csv file from:
- Now, you have to create a script for creating a hive database and a table
- type vi <scriptname>
- It will open vi editor
- Press "i" for writing in the vi editor.
- copy and paste your script
- press esc
- type :wq
- Hit enter
- it will write the script in the file and take you out of the vi editor
- run the script using "hive -f <scriptname>" command
- you are done.
- A hive database and a table has been created
- to verify the results. go to Hue
- for first time, it will ask you to enter a username and password
- this will become your hue credentials for login
- write a select statement to verify the data in the table that you created above
- You will see the data inside your table
No comments:
Post a Comment