Data Solutions: 2020

Informatica EDC profiling 10.4.1 Error - Could not create id creator for schema dbo and table due to missing row count result summary

Scenario: We want to integrate Azure SQL with Informatica EDC 10.4.1 and run profiling.

Error: Integration and connection test was successful from the Informatica Catalog Admin. but, when we run the resource, it fails.

Steps to Resolve:

Read full article »

Tableau - Create Box plot visual to display the CAGR

Today, we will learn on how to create and compare historical vs future CAGR using a box plot chart

Read full article »

Export Athena View as CSV to AWS S3 using Glue

Today we will learn on how to export Athena view to AWS S3 using Glue

Read full article »

Move file from one S3 folder to another using AWS Glue

Today we will learn on how to move file from one S3 location to another using AWS Glue

Read full article »

Create a parameter to filter the visual in Tableau Dashboard

Today we will learn on how to Create a parameter to filter the visual in Tableau

Read full article »

AWS Glue - Unpivot columns into rows using python shell job

Today we will learn on how to unpivot columns into rows using AWS Glue python shell job

Read full article »

Create multiple dropdown list filters in a tableau dashboard to filter multiple sheets

Today we will learn on how to create multiple filters (as parameters) in a tableau dashboard to filter multiple sheets

Read full article »

Import data to Azure Cosmos DB using Azure DocumentDB Data Migration Tool

Today we will learn how to import data from our SQL Server database to Cosmos DB on Azure.

We will use "AdventureWorks2016" sample database for this article. Here we are now going to migrate our SQL Database from MS SQL Server database to Document DB on Azure using Azure DocumentDB Data Migration Tool.

Read full article »

Tableau - Change bar chart colours

Today we will learn on how to change the color of each bar in the bar chart visual in Tableau

Read full article »

Tableau visual error - field does not exist in the table

Error: The field <fieldname> does not exist in the table

Reason: Once you refresh the tableau data source, there is a possibility (in rare chances) that you might see error in some of your visuals. There can be two reasons behind this error:

The field name gets renamed in the underlying data source
The field name gets deleted

Resolution:

Refresh the data source
If the field is renamed, then click the arrow on the right of the field that is giving error

Read full article »

How to resolve Configuration Package template Decimal Error in Business Central

Problem Statement: Customer is getting Decimal error while importing data through the Configuration package template.

Resolution: The error comes when someone has changed something in the package / template but forgot to change the same in the file. The best option is to make a new Configuration Package Card to avoid any confusion.

Steps:

Open configuration package

Go to Navigation Path > SEARCH ENGINE > write "CONFIG" in the search
Click on configuration package

Read full article »

Get Started with Azure Cosmos db - SQL API

Azure Cosmos DB

Introduction

An Azure Cosmos DB is a document-oriented and a non-relational database. It does not model data in tabular format instead the data resides in a document form which is a JSON format. So unlike the SQL server, the data in cosmos DB does not reside in the table-column structure.

It extends Azure DocumentDB which was released in 2014. it is schema-less and generally classified as a NoSQL database. Azure Cosmos DB is Microsoft’s proprietary globally-distributed, multi-model database service "for managing data at the planet-scale" launched in May 2017. It uses query language similar to SQL and supports ACID property of transactions.

Read full article »

Create a view in AWS Athena by merging two SQL queries using Union All

Today we will lean on how to create a view in AWS Athena by merging multiple queries result in one query

To merge multiple queries in Athena, we will be creating a view in Athena by using:

With common table expression (CTE) to hold the result of each query in a separate temp table
Union ALL to merge multiple queries results

To use Union keyword, both queries should return exact same columns

Read full article »

Setup Billing Alerts for AWS Services using CloudWatch

Today we will learn on how to setup billing alerts for AWS Services

AWS Cloud Watch service is used to create billing alarms for all the AWS services. Using Cloud watch, we can monitor the service usage charge per month (e.g.: EC2 estimated charges) and send alert notifications to the specified persons.

Read full article »

AWS Glue crawler cannot extract CSV headers properly

Scenario: You have an UTF-8 encoded CSV stored at S3. You ran a Glue crawler to create a metadata table and further read the table in Athena. When you query the table in Athena, you don't get the headers instead every column is named as (col1, col2, col3,...so on)

Read full article »

Convert CSV to Parquet using AWS Glue

Today we will learn on how to convert CSV to Parquet using AWS Glue ETL Job

Read full article »

AWS Glue Fatal exception com.amazonaws.services.glue.readers unable to parse file data.csv

Error in AWS Glue: Fatal exception com.amazonaws.services.glue.readers unable to parse file data.csv

Resolution: This error comes when your csv is either not "UTF-8" encoded or in your "utf-8" encoded csv there are still some special unicode characters left (generally this happens when you convert csv from excel workbook by right clicking and save as csv). To see the special unicode characters, open your file in the Notepad++ and scan the file.

There are two ways to convert Xlsx to CSV UTF-8:

Convert it from Excel

Open xlsx/csv in excel and go to file>save as

select Tools > web options

Go to Encoding > select "UTF-8"

Now upload the file in S3

Preview the file in S3

select the file > preview

If every thing is fine, you will see the CSV data in the S3 preview

If your file has some special unicode characters, S3 will give below error

Convert to UTF-8 programmatically (Best approach)

Write a python script by importing xlrd library that will read your xlsx

Specify the encoding and save the converted file to csv utf-8
Now upload the file in S3 and you will be able to preview the CSV data

Create an AWS Glue crawler to load CSV from S3 into Glue and query via Athena

Let's see how we can load CSV data from S3 into Glue data catalog using Glue crawler and run SQL query on the data in Athena

Steps:

Go to Glue and create a Glue crawler

Select Crawler store type as Data stores

Add a data store

Choose your S3 bucket folder
Note: Please don't select the CSV file. Instead, chose the entire directory

Add another source = No

Choose IAM role

Select an esisting IAM role if you have. otherwise, create a new IAM role

Create a scheduler = Run on demand

Configure crawler's output

Select the Glue Catalog's database where you want the metadata table to be created

This table will only hold the schema not the data

Now go to the Glue Data Catalog > Databases > Tables

Here, you will see your new table created that is mapped to the database you specified

You can also see the database in the Glue Data Catalog
Now, go to Athena

Check if you have already configured an output for Athena Queries

Athena stores each query's metadata in S3 location. so, if path not specified, select a temporary S3 path.

Select the database in Athena

You will see your new table created

Run the sql query to fetch the records from the table created above and see the records

Setup Git on AWS EC2 Linux and clone the repo on Local Windows Machine

Scenario:

Setup and Launch an EC2 linux Instance
Install Git on EC2 linux
Create multiple users on EC2 linux to access the machine and use Git
Create an empty Git Repo on EC2 linux
Give users access to Git
Clone Git on a local windows machine

Steps:

Create a new EC2 Linux instance

Amazon Linux 2 AMI
General purpose t2.micro
Setup inbound security rule to allow your machine SSH into the EC2

SSH port 22 : <your machine's public ip>

You can also click "my ip" in the drop down

In putty, load the pem file and convert it into ppk file
In putty > Auth > load the ppk file

Enter the command: Sudo yum install git -y

Enter "Git --version" to verify if the Git has been successfully installed on EC2 or not

Next step is to create users in EC2 Linux for login and for committing code to Git:

· Open Puttygen and click generate

o Select Type of key to generate = RSA

o Number of bytes = 2048

· Once the loading is finished

o Click “Save public key”

o Click “Save private key”

§ Give a name to your private key (e.g.: test2pvt.ppk)

· Log in to your linux EC2 with default user (ec2-user) through putty

o In Auth, pass the ppk file of the default user

· Create a new user

o Sudo adduser new_user1

· Switch to the new user

o Sudo su – new_user1

· Create a .ssh directory

o Mkdir .ssh

· Assign the appropriate permissions

o Chmod 700 .shh

· Create authorized_keys file

o Touch .ssh/authorized_keys

· Assign the appropriate perissions

o Chmod 600 .ssh/authorized_keys

· Open the ppk file you created for the new user and copy the algorithm and key

· Enter the key that you generate for the new user in the above file

o Vi .ssh/authorized_keys

Note: make your key in one line and enter the algorithm name in front of the key and the key file name at the end (for e.g.: test2pvt was the name of the file you created for this user)

· Login to your new user using putty

Now we will create an empty Git Repo and configure user access:

· On you EC2 Linux, type the below commands:

o Mkdir testGitRepo.git

o Cd testGitRepo.git

o Git init –bare (this will initialize an empty repo)

· Now go to your local windows machine

· Create a folder for git on your local windows machine

· open a command prompt

· Navigate to that folder (cd </path/to/the/git/folder/on/your/local/machine)

· Generate a rsa key for your local machine

o Type: ssh-keygen –t rsa –b 2048

· It will create public and private key and store the keys in your local machine

o Path: users/username/.git/id_rsa.pub

· Open the id_rsa file in your notepad

o Copy the key

o Note: make sure you copy the whole line and everything is in one line

· Go back to your EC2 linux machine

· Open the file “authorized_keys” using vi editor

o Type: vi .ssh/authorized_keys

· If you see an existing key, paste this new key on a new line after that existing key. Otherwise, simply paste the new key.

· Save and exit the vi editor

· Go back to your local machine and enter below commands

o Git init

o Git add .

o Git commit –m “initial git commit”

o Git remote add origin ec2-user@<your-full-ec2-public-ip>:/home/ec2-user/testGitRepo.git

o Git config --global remote.origin.receivepack “git receive-pack”

· Try to do a git pull (clone the repo from ec2)

· Enjoy

Menu

Introduction