Data Solutions: AWS GLUE VS AWS DATA PIPELINE

Today we will try to understand the difference between AWS Glue and AWS Data Pipeline

Are you considering on designing your ETL pipeline in AWS cloud ? the below table will help you understand which AWS ETL service to choose according to your needs:

	AWS GLUE	AWS Data Pipeline
Definition	•Serverless	•A web service that helps you create complex data pipelines. Developers have to rely on EC2 instances to execute tasks in a data pipeline as it spins up an EC2 instance to run the job and terminate the EC2 instance after the job is completed
Resiliency	•Fault tolerant, Scalable, Highly available and Distributed	•Fault tolerant, Highly available, Scalable and Distributed
ETL Design	•GUI Based as well as developer friendly. It allows developers to write ETL transformation code using pyspark	•GUI Based with pre defined ETL templates that allows making complex pipelines quick and easy using drag and drop functionality.
Pricing	•Cost effective. You have to pay only for the execution time (around $0.44 per hour per DPU)	•Low frequency model can cost around $0.66 per month, while high frequency model can cost around $1 per month per job execution (each activity)
Data Sources	•Supports a lot more data sources by allowing developers the flexibility to import libraries in python to define the data sources that are not pre-defined	•Have to work with pre-defined data sources that are available within data pipeline
Scheduling	•Support event driven ETL pipeline trigger	•Supports three type of triggers (Scheduled, Conditional, and On-demand)
Streaming	•Serverless Streaming for making continuous ingestion pipelines for preparing streaming data. Can consume data from streaming sources like Kinesis and Kafka, clean and transform on the fly and make it available for analysis in seconds.

Any Comments / Thoughts much appreciated!

Data Solutions

Menu

AWS GLUE VS AWS DATA PIPELINE - Which one to choose ?

No comments:

Post a Comment