Azure Data Factory: Upsert using Data Flows

Today we will learn on how to perform upsert in Azure data factory (ADF) using data flows

Scenario:
  • We will be ingesting a csv stored in Azure Storage (ADLS V2) into Azure SQL by using Upsert method
Steps:

1. Create two connections (linked Services) in the ADF:
       1. One for the csv stored in ADLS
       2. One for the target Azure SQL

                                                 


Informatica Cloud - Copy Files from source to target

Today we will learn on how to copy files using Informatica cloud from source to target server using File processor connector

Steps:

1. Create an Informatica File processor connection to define source and target destination
       1. Source file directory: enter source file path (e.g.: \\Source_Server_name\directory1)
       2. Target file directory: enter target file path (e.g.: \\target_Server_name\target_dir\current)
       3. Select file: All (for selecting all the files)

Informatica Cloud - Ingest Data from Rest Api into Azure SQL


Today we will learn on how to fetch data from Rest Api and ingest into Azure SQL using Informatica Cloud data integration

Steps to Ingest data from REST API to Azure SQL DB via Informatica:

1.    Create a test SQL Server Database in Azure
·        Setup a new server for the database
·        For testing, use “Public endpoint” while setting up Azure server for database (this has to be properly setup when using for production)
·        Add your current computer’s IP address in the Azure server firewall
·        Add Informatica server’s IP address in the Azure server firewall
o   If you don’t add informatica server’s IP, Informatica will throw error “client with IP *** is not allowed to access the server”
·        Click Create the server.
·        Once the server is created, click OK to create the database

Azure Data Factory - Upsert using Pipeline approach instead of data flows

Today we will learn on how to perform upsert in Azure data factory (ADF) using pipeline approach instead of using data flows

Task:
  • We will be loading data from a csv (stored in ADLS V2) into Azure SQL with upsert using Azure data factory. Once the data load is finished, we will move the file to Archive directory and add a timestamp to file that will denote when this file was being loaded into database
Benefits of using Pipeline:
  • As you know, triggering a data flow will add cluster start time (~5 mins) to your job execution time. Although, you can make use of the Time to live (TTL) setting in your Azure integration runtime (IR) to decrease the cluster time but, still a cluster might take around (2 mins) to start a spark context. so, if you are thinking of creating a real time data load process, the pipeline approach will work best as it does not need a cluster to run and can execute in seconds.
Pre-requisites
  • An Azure Data Factory resource
  • An Azure Storage account (Gen 2)
  • An Azure SQL Database

Informatica Cloud - Copy data from SQL to Azure ADLS

Today we will learn on how to load data from SQL database to Azure ADLS as file

  1. Purchase and enable Microsoft Azure data lake (ADLS) Gen 2 connector for Informatica 
  2.  Register your informatica app with Azure AD
    1. This app will act as a service principal 
    2. Note down the client secret and client ID
    3. Give appropriate permissions to Informatica app (in Azure)
      • Storage blob data contributor
      • Internet access to Informatica to talk to Azure ADLS
  3. Create an Informatica connection to Azure ADLS Gen 2
    1. Enter the following information:
      • Client ID
      • Client Secret
      • Tenant ID
      • Account Name (Azure Storage account name)
      • File System Name (Container name within a storage account)
      • Directory Path: /   (leave as default)