Data Solutions: April 2020

AWS EMR: Read CSV file from S3 bucket using Spark dataframe

Today we will learn on how to use spark within AWS EMR to access csv file from S3 bucket

Steps:

Create a S3 Bucket and place a csv file inside the bucket

SSH into the EMR Master node

Get the Master Node Public DNS from EMR Cluster settings
In windows, open putty and SSH into the Master node by using your key pair (pem file)

Type "pyspark"

This will launch spark with python as default language

Create a spark dataframe to access the csv from S3 bucket

Command: df.read_csv("<S3 path to csv>",header=True,sep=',')

Type "df_show()" to view the results of the dataframe in tabular format

You are done

AWS EMR: Create a hive table with csv file stored in a S3 bucket

Today we will learn on how to create a hive table (inside EMR cluster) with csv file stored in a S3 bucket

Steps:

Go to your EMR cluster and copy the "Master Public DNS"

This is the public ip of your master node

if you are using a windows machine, download and install putty software for doing SSH into the master node
Open the putty and login with your AWS key-value pair (pem file)

In the login as: type hadoop
you are now logged in the master node

Create a S3 bucket and place a csv file in the bucket.

for this test, i am using a csv file from:

https://support.staffbase.com/hc/en-us/articles/360007108391-CSV-File-Examples

Now, you have to create a script for creating a hive database and a table

type vi <scriptname>

It will open vi editor

Press "i" for writing in the vi editor.

copy and paste your script
press esc

type :wq
Hit enter
it will write the script in the file and take you out of the vi editor

run the script using "hive -f <scriptname>" command
you are done.

A hive database and a table has been created

to verify the results. go to Hue

for first time, it will ask you to enter a username and password
this will become your hue credentials for login

write a select statement to verify the data in the table that you created above
You will see the data inside your table

How to create AWS EMR cluster with Hadoop, Hive and Spark on it

Today we will learn how to create an AWS EMR hadoop cluster with Spark on it

Steps:

Go to EMR and create a cluster
Select Core Hadoop option under Applications

Click "Go to advance options"
Select Spark under software configuration

Additionally, select Zeppelin

Zeppelin lets you use notebook to write spark queries/scripts

Click Next

Under General cluster settings, tick mark EMRFS consistent view

Consistent view provides consistency checking for list and read-after-write (for new put requests) for objects in Amazon S3

Create Key Value pair

To create an Amazon EC2 key pair:

Go to the Amazon EC2 console
In the Navigation pane, click Key Pairs

On the Key Pairs page, click Create Key Pair
In the Create Key Pair dialog box, enter a name for your key pair, such as, mykeypair

Click Create
Save the resulting PEM file in a safe location

Specify Key Pair in the cluster settings
Click Next and create the cluster

Wait for the cluster to start
Once the cluster is up and running. you can now SSH to the cluster.

Microstrategy web dossier - display dynamic images based on a page filter

Today we will learn on how to display dynamic images in Microstrategy web dossier based on a drop down list (filter)

Steps to perform:

You need to store images at a central location (it can be any)
You need to install and configure a web server at the same location where you are storing your images. so, every image has an HTTP path
you will need to store that path and the image name in a table

Fields:

Image
ImageHTML: this will be a HTTP path in the form of html pointing to your image

e.g: <img src = "full path to the server and image.jpg" height=450px width=800px>

Go to your web dossier and import the above dataset (table)

while importing the dataset, right click the field "ImageHTML"

Change the data type to "HTML"

Click save dataset

Now, create a filter and drag the "image name" object to the filter.
Add a new visualization (Grid type).

Drag the ImageName and ImageHtml fields to the Visualization rows.
Right click the filter > Select target > select visualization name you created > Apply
Right click ImageName > format > No fill
Right click ImageHtml > format > No Fill
Righ click a row in ImageName > format > No fill
Shrink/drag the ImageName header to the extreme left (0 inches), so, it is no longer visible and does not take any visualization grid space. In this way, your image will be visible in the whole grid.
Make column headers white color and vertical/horizontal lines to None.
You are done.
Click image names in the filter and you will see images changing dynamically.

Centralize logs with Azure Monitor and Storage account (ADLS)

Today we will learn on how to Send the logs of each resource within Azure to Azure Monitor as well as Azure Storage (ADLS)

If you want to store the logs for longer duration, you can use Azure Data Lake (ADLS) for logs storage
If you want to view the logs of each resource/activity within Azure on the fly, you can send each resource's logs to Azure Monitor (a centralized hub to viewing logs)

Today we will store the logs of Azure data factory in ADLS as well as view them in Azure Monitor.

Steps to be performed at Data factory side and Azure Monitor:

Go to your data factory
click Diagnostic settings

Configure the diagnostic settings
Select "Send to Log Analytics"

This will send the logs to Azure Monitor
Specify the subsctiption and log analytics workspace

If you haven't created a log analytics workspace, create one.

Select "Archive to a storage account"

This will send the logs to ADLS
Specify the subscription and Storage account where you want the logs to be stored
If you specify this option, then you also have to specify the retention (in days) for logs

Select what type of logs you want to be stored

Tick mark all the logs.

Now go to Azure Monitor to view the logs

Type "Monitor" in the search bar

Select a scope

Select what resource's logs you want to view

Select the Metric you want to view
Select the Aggregation

You are done.
The specific metric will be shown

Steps to see the logs in the Azure Storage (ADLS):

Install Azure Storage Explorer in your computer.

This utility will let you view your ADLS accounts and the data inside
Configure the storage to your storage account

Navigate to your Storage account

Here, you will see different containers created automatically for data factory logs

Pipeline runs
Trigger runs
activity runs

You will also see a $logs container created that will store all the logs for all the resources day wise in a folder

Update changes to Data factory using Azure Resource Manager (ARM) template

Today we will learn on how to use Azure resource manager (ARM) template to publish DEV code changes to UAT data factory

Suppose you are a developer who has developed data factory pipelines. Now, you have to move the code to UAT by creating a new data factory for UAT and publishing the code for pipelines in the UAT data factory.

Manually, doing all these changes can be time taking and can cause human errors/mistakes. the safest and fastest way to do this is to use ARM template.

Steps to use ARM template:

Go to your DEV data factory

Click Export ARM template

It will download a zip file.

Unzip the zip file and you will find a json file called "arm_template.json"
Open the json and change the connection settings (from DEV to UAT) that you have defined in your DEV data factory

Once you are done with the changes, open ARM template by typing "Custom deployment" in the search bar.

Click Template deployment in the Marketplace

Click "build you own template in the editor"

Click load file and load the arm_template.json file that you changed above.
After the file is loaded, you will see a lot of resource being listed out in the resource section (maybe or maybe not, depending on how you designed your DEV data factory)

Click Save
Now it will ask you to specify few settings:

Subscription
Resource Group
Location
Factory Name: <change it to UAT factory for it to create another factory instance>
Tick "I agree" checkbox.

Click purchase for the ARM to start installing the resources

It will create a New data factory instance
It will publish all the code of DEV to the above created instance

You are done.
In some cases, the connections are not properly created. you have to manually test each connection and fix if errors come in the connections.
Enjoy

Azure - How to find the Managed Identity information in Azure Active Directory (AAD)

Today we will learn on how to get the Managed Identity information in Azure Active Directory (AAD)

As soon as you create a resource in Azure, it automatically creates and assigns a Managed Identity to the resource in the Azure Active directory (AD) tenant that is trusted by the subscription in use.

Steps to find the created resource:

Suppose you created a resource - Azure Data Factory (ADF)

ADF Name: ADF-Test-1

Once the data factory is up and running, go to Azure AD
Navigate to Enterprise applications in the Overview section
Change the Application Type filter to All applications and click apply
Enter your data factory name (created above) in the search
You will see your resource in the name list below.

Azure SQL - Script to create logins and give appropriate permissions to the user / login

Today we will learn on how to create a script that creates logins and give appropriate permissions within Azure SQL

Run the below script in your Azure SQL server:

--create a login. supply the login account name
-- login can also be a user's azure login email that you want to add to SQL to give permissions

use [master]
create login [<login account>] from external provider
go

----add the login to the database.
use [<Database name>]
go
create user [<login account>] from login [<login account>] with default_schema = [DBO];
go

---Create a db_executor role within the database
use [<Database name>]
go
create role [db_executor]
go
Grant execute on schema ::dbo to db_Executor
go

---give db executor permission to the user for the database
exec sp_addrolemember N'db_executor',N'<login account>'
go

---give data reader permission to the user for the database
exec sp_addrolemember N'db_datareader',N'<login account>'
go

---give data writter permission to the user for the database
exec sp_addrolemember N'db_datawriter',N'<login account>'
go

---give ddl admin permission to the user for the database
exec sp_addrolemember N'db_ddladmin',N'<login account>'
go

SSIS - Lookup transformation to validate fields or to get extra fields and add to the source

Today we will learn on how to use lookup within SSIS to get a field based from another table based on a condition or to validate some of the source fields with another table

Steps:

Drag the Lookup task in SSIS and attach it to the source
Double click lookup

In general, specify Cache mode = Full cache

Specify how to handle row with no matching entries = Redirect to no match output

In connection, use either a table or a query to lookup

We will use a query

In Columns:

Create Mappings with source and lookup table

Select the lookup column that you want to fetch

Drag the Lookup Match output to the appropriate destination table

Create the mappings of source and destination

Map source fields to target destination

Drag the Lookup No Match output to another destination table where you want to save rows that did not match

In the mappings of the destination, ignore the input column that you are selecting from lookup.

This will tell us that what rows are missing lookup value

PurchName is the field we are selecting frol lookup. So, in case of no matched rows, purchname will come as empty, so we ignore it.

You are done
Run and test

SSIS - split output based on a condition from source data

Today we will learn on how to use Conditional split task available within SSIS to split the out from the source data

Drag a conditional split task in the SSIS and attach it to your source
Open the expression editor for Conditional split

Create two output expressions

Column a > 1
Column a < 2

Drag output 1 ( a>1) to destination
Drag output 2 (a <2) to destination 1

Complete flow should look like below:

You are done.
Run and test.

AX 2012 - create a SSRS report that takes ID as a parameter and displays the associated Sales order

Today we will learn on how to create a sales report that takes sales id as a parameter and print the sales order corresponding to the supplied sales id

Create a temporary table named TEC_SALESINVOICETMP with Tabletype: tempDB

Make 3 Classes:-

[DataContractAttribute]

class SalesReportDC

{

RecId recid;

}

[DataMemberAttribute("RecId")]

public RecId parmSalesRecid(RecId _recid =recid)

{

recid = _recid;

return recid;

}

-----------------------------------------------------------------------------

class SalesReportController extends SrsReportRunController

{

}

protected void preRunModifyContract()

{

SalesReportDC contract;

RecId recid;

;

contract = this.parmReportContract().parmRdpContract() as SalesReportDC;

contract.parmSalesRecid(this.parmArgs().record().RecId);

}

public static void main(Args _args)

{

SalesReportController controller = new SalesReportController();

controller.parmReportName(ssrsReportStr(SalesReportRDP, SalesReport)); //reportname,design

controller.parmArgs(_args);

// controller.setRange(_args, controller.parmReportContract().parmQueryContracts().lookup(controller.getFirstQueryContractKey()));

//if (controller.prompt())

// {

controller.startOperation();

//}

}

-----------------------------------------------------------------

[SRSReportParameterAttribute(classStr(SalesReportDC))]

class SalesReportRDP extends SRSReportDataProviderBase

{

TEC_SalesInvoiceTmp salesInvoiceTmp; //temptable

// SalesLine salesline;

//InventTable inventtable;

//SalesTable salestable;

CustInvoiceJour custinvoicejour;

CustInvoiceTrans custinvoicetrans;

RecId recId;

}

[SRSReportDataSetAttribute(tableStr(TEC_SalesInvoiceTmp))]

public TEC_SalesInvoiceTmp getSalesInvoiceTmp()

{

select salesInvoiceTmp;

return salesInvoiceTmp;

}

private void insertData()

{

SalesTable salesTable;

// select salesTable where salesTable.SalesId == custinvoicejour.SalesId;

salesTable = custinvoicejour.salesTable();

salesInvoiceTmp.ItemId = custinvoicetrans.ItemId;

SalesInvoiceTmp.DeliveryDate = salesTable.DeliveryDate;

SalesInvoiceTmp.SalesId = custinvoicetrans.SalesId;

SalesInvoiceTmp.SalesName = salesTable.SalesName;

SalesInvoiceTmp.Price = custinvoicetrans.SalesPrice;

SalesInvoiceTmp.NameAlias = custinvoicetrans.itemName();

SalesInvoiceTmp.insert();

}

public void processReport()

{

SalesReportDC salesReportContract = this.parmDataContract();

Query query = new Query();

QueryBuildDataSource qbds,qbds1,qbds2;

QueryRun qr;

;

recId = salesReportContract.parmSalesRecid();

qbds = query.addDataSource(tableNum(custinvoicejour));

qbds1 = qbds.addDataSource(tableNum(custinvoicetrans));

qbds.addRange(fieldNum(CustInvoiceJour,recId)).value(queryValue(recid));

qbds1.relations(true);

qbds1.joinMode(JoinMode::InnerJoin);

qr = new QueryRun(query);

while(qr.next())

{

custinvoicejour = qr.get(tableNum(custinvoicejour));

custinvoicetrans = qr.get(tableNum(custinvoicetrans));

//salestable = qr.get(tableNum(salestable));

this.insertData();

}

// select salesInvoiceTmp where salesInvoiceTmp.ItemId == salesReportContract.parmSalesRecid();

super();

}

Now create a menu item – Output type

Drag the menu item in Menu.

Now go to visual studio and design the report. Select precision design. And after designing add to AOT and Publish.

Now view your report in AX.

AX 2012 - view row headers on all the pages of a SSRS report

Today we will learn on how to view row headers on all pages of a AX 2012 SSRS report

If you want to view row header on all columns:

Select tablix and Click advance after.

Here we have two rows above our data. You can see two static members under row group above details.

That means we have to change properties of both static members.

Properties:

FixedData = true
KeepWithGroup = above
RepeatOnNewPage = true

If you want to hide first row containing fields from date and to date.

For the first static member, change these properties.

Properties:

FixedData = true
KeepWithGroup = None
RepeatOnNewPage = true

Deploy it and add to AOT.

View the Output.