AWS EMR: Create a hive table with csv file stored in a S3 bucket

Today we will learn on how to create a hive table (inside EMR cluster) with csv file stored in a S3 bucket

Steps:
  • Go to your EMR cluster and copy the "Master Public DNS"
    • This is the public ip of your master node
  • if you are using a windows machine, download and install putty software for doing SSH into the master node
  • Open the putty and login with your AWS key-value pair (pem file)
  • In the login as: type hadoop 
  • you are now logged in the master node
  • type vi <scriptname>
    • It will open vi editor
  • Press "i" for writing in the vi editor.
  • copy and paste your script
  • press esc
    • type :wq
    • Hit enter
    • it will write the script in the file and take you out of the vi editor
  • run the script using "hive -f <scriptname>" command
  • you are done.
    • A hive database and a table has been created
  • to verify the results. go to Hue
    • for first time, it will ask you to enter a username and password
    • this will become your hue credentials for login
  • write a select statement to verify the data in the table that you created above
  • You will see the data inside your table

No comments:

Post a Comment