Ingest data from external REST API into S3 using AWS Glue

Today we will learn on how to ingest weather api data into S3 using AWS Glue 
Steps:
  • Create a S3 bucket with the below folder structure:
    • S3BucketName
      • Libraries
        • Response.whl
  • Sign up in Openweathermap website and get the api key to fetch the weather data
  • Create a new Glue ETL job
    • Type: Python Shell
    • Python version: <select latest python version>
    • Python Library Path: <select the Response.whl library path>
    • This Job runs: <A new script to be authored by you>
  • Click Next
  • Click "Save job and edit Script"
  • Import response library
  • import boto3 library for saving in S3 bucket
  • Write the code to ingest data 
  • Run the glue job
                                    
  • View the glue job results 
    • Job run status = Succeeded
  • Verify if the data is saved in S3 bucket
  • Download the saved json file from S3 and check if it is correct
  • You are done. Cheers!

Comments

  1. Please bear in mind this tutorial needs to be adjusted for the latest state of the AWS Glue Python script runner.

    1) Python libraries for requests and json do not need to be bundled as a dependency. So, the WHL library import can be dropped.

    2) Putting the guarded main invocation doesn't work. Remove the guards to get it to run

    ReplyDelete

Post a Comment