Today we will learn on how to ingest weather api data into S3 using AWS Glue
Steps:
- Create a S3 bucket with the below folder structure:
- S3BucketName
- Libraries
- Response.whl
- Download python response library (in .whl format) and save in the Libraries folder within S3
- Download link: https://pypi.org/project/requests/#files
- Sign up in Openweathermap website and get the api key to fetch the weather data
- Create a new Glue ETL job
- Type: Python Shell
- Python version: <select latest python version>
- Python Library Path: <select the Response.whl library path>
- This Job runs: <A new script to be authored by you>
- Click Next
- Click "Save job and edit Script"
- Import response library
- import boto3 library for saving in S3 bucket
- Write the code to ingest data
- Run the glue job
- View the glue job results
- Job run status = Succeeded
- Verify if the data is saved in S3 bucket
- Download the saved json file from S3 and check if it is correct
- You are done. Cheers!










Please bear in mind this tutorial needs to be adjusted for the latest state of the AWS Glue Python script runner.
ReplyDelete1) Python libraries for requests and json do not need to be bundled as a dependency. So, the WHL library import can be dropped.
2) Putting the guarded main invocation doesn't work. Remove the guards to get it to run
This was an interesting walkthrough of building a weather data pipeline using AWS services. Bringing weather API data into Amazon S3 using Lambda is a practical example of how cloud-native architectures can automate data collection and storage with minimal operational overhead. I especially liked how the workflow demonstrates the integration between APIs, serverless computing, and cloud storage, which are common building blocks in modern data engineering projects. Those interested in similar cloud-based implementations can also explore Cloud Computing Projects for more ideas on scalable cloud architectures and automation.
ReplyDeleteAnother valuable aspect of this post is how it introduces a real-world data ingestion scenario rather than a purely theoretical example. Collecting, storing, and processing external data sources is an important skill for anyone working with analytics or data platforms. Students looking to gain experience in handling large datasets and data pipelines can also check out Big Data Projects, which cover many of the concepts used in modern data processing environments.
ReplyDelete