Today we will try to understand the difference between AWS Glue and AWS Data Pipeline
Are you considering on designing your ETL pipeline in AWS cloud ? the below table will help you understand which AWS ETL service to choose according to your needs:
AWS GLUE |
AWS Data Pipeline |
|
Definition |
•Serverless
|
•A
web service that helps you create complex data pipelines. Developers have to
rely on EC2 instances to execute tasks in a data pipeline as it spins up an
EC2 instance to run the job and terminate the EC2 instance after the job is
completed
|
Resiliency |
•Fault
tolerant, Scalable, Highly available
and Distributed
|
•Fault
tolerant, Highly
available, Scalable and Distributed
|
ETL Design |
•GUI
Based as well as developer friendly. It allows
developers to write ETL transformation code using pyspark
|
•GUI
Based with pre defined ETL templates
that allows making complex pipelines quick and easy using drag and drop
functionality.
|
Pricing |
•Cost
effective. You have to pay only for the execution
time (around $0.44 per hour per DPU)
|
•Low frequency model can cost around $0.66
per month, while high frequency model can cost around $1 per month per job
execution (each activity)
|
Data Sources |
•Supports
a lot more data sources by allowing developers the flexibility to import
libraries in python to define the data sources that are not pre-defined
|
•Have
to work with pre-defined data sources that are available within data pipeline
|
Scheduling |
•Support
event driven ETL pipeline trigger
|
•Supports
three type of triggers (Scheduled, Conditional, and On-demand)
|
Streaming |
•Serverless Streaming for making continuous
ingestion pipelines for preparing streaming data. Can consume data from
streaming sources like Kinesis and Kafka, clean and transform on the fly and
make it available for analysis in seconds.
|
|
Any Comments / Thoughts much appreciated!
No comments:
Post a Comment