AWS EMR: Read CSV file from S3 bucket using Spark dataframe

Today we will learn on how to use spark within AWS EMR to access csv file from S3 bucket

Steps:
  • Create a S3 Bucket and place a csv file inside the bucket
    
  • SSH into the EMR Master node
    • Get the Master Node Public DNS from EMR Cluster settings
    • In windows, open putty and SSH into the Master node by using your key pair  (pem file)
  • Type "pyspark"
    • This will launch spark with python as default language
  • Create a spark dataframe to access the csv from S3 bucket
    • Command: df.read_csv("<S3 path to csv>",header=True,sep=',')
  • Type "df_show()" to view the results of the dataframe in tabular format
  • You are done



AWS EMR: Create a hive table with csv file stored in a S3 bucket

Today we will learn on how to create a hive table (inside EMR cluster) with csv file stored in a S3 bucket

Steps:
  • Go to your EMR cluster and copy the "Master Public DNS"
    • This is the public ip of your master node
  • if you are using a windows machine, download and install putty software for doing SSH into the master node
  • Open the putty and login with your AWS key-value pair (pem file)
  • In the login as: type hadoop 
  • you are now logged in the master node
  • type vi <scriptname>
    • It will open vi editor
  • Press "i" for writing in the vi editor.
  • copy and paste your script
  • press esc
    • type :wq
    • Hit enter
    • it will write the script in the file and take you out of the vi editor
  • run the script using "hive -f <scriptname>" command
  • you are done.
    • A hive database and a table has been created
  • to verify the results. go to Hue
    • for first time, it will ask you to enter a username and password
    • this will become your hue credentials for login
  • write a select statement to verify the data in the table that you created above
  • You will see the data inside your table

How to create AWS EMR cluster with Hadoop, Hive and Spark on it

Today we will learn how to create an AWS EMR hadoop cluster with Spark on it
Steps:
  • Go to EMR and create a cluster
  • Select Core Hadoop option under Applications

  • Click "Go to advance options"
  • Select Spark under software configuration
    • Additionally, select Zeppelin
      • Zeppelin lets you use notebook to write spark queries/scripts
  • Click Next
  • Under General cluster settings, tick mark EMRFS consistent view
    • Consistent view provides consistency checking for list and read-after-write (for new put requests) for objects in Amazon S3
  • Create Key Value pair
    • To create an Amazon EC2 key pair:
      • On the Key Pairs page, click Create Key Pair
      • In the Create Key Pair dialog box, enter a name for your key pair, such as, mykeypair
      • Click Create
      • Save the resulting PEM file in a safe location
  • Specify Key Pair in the cluster settings
  • Click Next and create the cluster
  • Wait for the cluster to start
  • Once the cluster is up and running. you can now SSH to the cluster.

Microstrategy web dossier - display dynamic images based on a page filter

Today we will learn on how to display dynamic images in Microstrategy web dossier based on a drop down list (filter)
Steps to perform:
  • You need to store images at a central location (it can be any)
  • You need to install and configure a web server at the same location where you are storing your images. so, every image has an HTTP path
  • you will need to store that path and the image name in a table
    • Fields: 
      • Image 
      • ImageHTML: this will be a HTTP path in the form of html pointing to your image
        • e.g: <img src = "full path to the server and image.jpg" height=450px width=800px>
  • Go to your web dossier and import the above dataset (table)
    • while importing the dataset, right click the field "ImageHTML"
      • Change the data type to "HTML"
    • Click save dataset
  • Now, create a filter and drag the "image name" object to the filter.
  • Add a new visualization (Grid type).
    • Drag the ImageName and ImageHtml fields to the Visualization rows.
    • Right click the filter > Select target > select visualization name you created > Apply
    • Right click ImageName > format > No fill
    • Right click ImageHtml > format > No Fill
    • Righ click a row in ImageName > format > No fill 
    • Shrink/drag the ImageName header to the extreme left (0 inches), so, it is no longer visible and does not take any visualization grid space. In this way, your image will be visible in the whole grid.
    • Make column headers white color and vertical/horizontal lines to None.
    • You are done.
    • Click image names in the filter and you will see images changing dynamically.

Centralize logs with Azure Monitor and Storage account (ADLS)

Today we will learn on how to Send the logs of each resource within Azure to Azure Monitor as well as Azure Storage (ADLS)
  • If you want to store the logs for longer duration, you can use Azure Data Lake (ADLS) for logs storage
  • If you want to view the logs of each resource/activity within Azure on the fly, you can send each resource's logs to Azure Monitor (a centralized hub to viewing logs)

Today we will store the logs of Azure data factory in ADLS as well as view them in Azure Monitor.

Steps to be performed at Data factory side and Azure Monitor:
  • Go to your data factory
  • click Diagnostic settings
  • Configure the diagnostic settings
  • Select "Send to Log Analytics"
    • This will send the logs to Azure Monitor
    • Specify the subsctiption and log analytics workspace
      • If you haven't created a log analytics workspace, create one.
  • Select "Archive to a storage account"
    • This will send the logs to ADLS
    • Specify the subscription and Storage account where you want the logs to be stored
    • If you specify this option, then you also have to specify the retention (in days) for logs
  • Select what type of logs you want to be stored
    • Tick mark all the logs.
  • Now go to Azure Monitor to view the logs
    • Type "Monitor" in the search bar
  • Select a scope
    • Select what resource's logs you want to view
    • Select the Metric you want to view
    • Select the Aggregation
    • You are done. 
    • The specific metric will be shown
Steps to see the logs in the Azure Storage (ADLS):
  • Install Azure Storage Explorer in your computer.
    • This utility will let you view your ADLS accounts and the data inside
    • Configure the storage to your storage account
  • Navigate to your Storage account
    • Here, you will see different containers created automatically for data factory logs
      • Pipeline runs
      • Trigger runs
      • activity runs
    • You will also see a $logs container created that will store all the logs for all the resources day wise in a folder

Update changes to Data factory using Azure Resource Manager (ARM) template

Today we will learn on how to use Azure resource manager (ARM) template to publish DEV code changes to UAT data factory

Suppose you are a developer who has developed data factory pipelines. Now, you have to move the code to UAT by creating a new data factory for UAT and publishing the code for pipelines in the UAT data factory. 

Manually, doing all these changes can be time taking and can cause human errors/mistakes. the safest and fastest way to do this is to use ARM template.

Steps to use ARM template:
  • Go to your DEV data factory 
    • Click Export ARM template
    • It will download a zip file. 
      • Unzip the zip file and you will find a json file called "arm_template.json"
      • Open the json and change the connection settings (from DEV to UAT) that you have defined in your DEV data factory
    • Once you are done with the changes, open ARM template by typing "Custom deployment" in the search bar.
      • Click Template deployment in the Marketplace
    • Click "build you own template in the editor"
    • Click load file and load the arm_template.json file that you changed above.
    • After the file is loaded, you will see a lot of resource being listed out in the resource section (maybe or maybe not, depending on how you designed your DEV data factory)
    • Click Save 
    • Now it will ask you to specify few settings:
      • Subscription
      • Resource Group
      • Location
      • Factory Name: <change it to UAT factory for it to create another factory instance>
      • Tick "I agree" checkbox.
    • Click purchase for the ARM to start installing the resources
      • It will create a New data factory instance
      • It will publish all the code of DEV to the above created instance
    • You are done.
    • In some cases, the connections are not properly created. you have to manually test each connection and fix if errors come in the connections.
    • Enjoy

Azure - How to find the Managed Identity information in Azure Active Directory (AAD)

Today we will learn on how to get the Managed Identity information in Azure Active Directory (AAD)
As soon as you create a resource in Azure, it automatically creates and assigns a Managed Identity to the resource in the Azure Active directory (AD) tenant that is trusted by the subscription in use.

Steps to find the created resource:
  • Suppose you created a resource - Azure Data Factory (ADF)
    • ADF Name: ADF-Test-1
  • Once the data factory is up and running, go to Azure AD
  • Navigate to Enterprise applications in the Overview section
  • Change the Application Type filter to All applications and click apply
  • Enter your data factory name (created above) in the search
  • You will see your resource in the name list below.

Azure SQL - Script to create logins and give appropriate permissions to the user / login

Today we will learn on how to create a script that creates logins and give appropriate permissions within Azure SQL
Run the below script in your Azure SQL server:

--create a login. supply the login account name
-- login can also be a user's azure login email that you want to add to SQL to give permissions
use [master]
create login [<login account>] from external provider
go

----add the login to the database.
use [<Database name>]
go
create user [<login account>] from login [<login account>] with default_schema = [DBO];
go

---Create a db_executor role within the database
use [<Database name>]
go
create role [db_executor]
go
Grant execute on schema ::dbo to db_Executor
go


---give db executor permission to the user for the database
exec sp_addrolemember N'db_executor',N'<login account>'
go

---give data reader permission to the user for the database
exec sp_addrolemember N'db_datareader',N'<login account>'
go

---give data writter permission to the user for the database
exec sp_addrolemember N'db_datawriter',N'<login account>'
go

---give ddl admin permission to the user for the database
exec sp_addrolemember N'db_ddladmin',N'<login account>'
go

SSIS - Lookup transformation to validate fields or to get extra fields and add to the source

Today we will learn on how to use lookup within SSIS to get a field based from another table  based on a condition or to validate some of the source fields with another table 
Steps:
  • Drag the Lookup task in SSIS and attach it to the source
  • Double click lookup
    • In general, specify Cache mode = Full cache
      • Specify how to handle row with no matching entries = Redirect to no match output
    • In connection, use either a table or a query to lookup
      • We will use a query
    • In Columns:
      • Create Mappings with source and lookup table
      • Select the lookup column that you want to fetch
  • Drag the Lookup Match output to the appropriate destination table
    • Create the mappings of source and destination
      • Map source fields to target destination
  • Drag the Lookup No Match output to another destination table where you want to save rows that did not match
    • In the mappings of the destination, ignore the input column that you are selecting from lookup.
      • This will tell us that what rows are missing lookup value
      • PurchName is the field we are selecting frol lookup. So, in case of no matched rows, purchname will come as empty, so we ignore it.
  • You are done
  • Run and test

SSIS - split output based on a condition from source data

Today we will learn on how to use Conditional split task available within SSIS to split the out from the source data
  • Drag a conditional split task in the SSIS and attach it to your source
  • Open the expression editor for Conditional split 
    • Create two output expressions
      • Column a > 1
      • Column a < 2
    • Drag output 1 ( a>1) to destination 
    • Drag output 2 (a <2) to destination 1
  • Complete flow should look like below:
  • You are done. 
  • Run and test.




AX 2012 - create a SSRS report that takes ID as a parameter and displays the associated Sales order

Today we will learn on how to create a sales report that takes sales id as a parameter and print the sales order corresponding to the supplied sales id


       Create a temporary table named TEC_SALESINVOICETMP with Tabletype: tempDB
Make 3 Classes:-
 [DataContractAttribute]

class SalesReportDC
{
    RecId      recid;
}

[DataMemberAttribute("RecId")]
public RecId parmSalesRecid(RecId _recid =recid)
{
recid = _recid;
return recid;
}
-----------------------------------------------------------------------------
class SalesReportController extends SrsReportRunController
{

}
protected void preRunModifyContract()
{

    SalesReportDC  contract;
    RecId           recid;
    ;
    contract = this.parmReportContract().parmRdpContract() as SalesReportDC;
    contract.parmSalesRecid(this.parmArgs().record().RecId);

}
public static void main(Args _args)
{
    SalesReportController controller = new SalesReportController();
    controller.parmReportName(ssrsReportStr(SalesReportRDP, SalesReport));  //reportname,design
    controller.parmArgs(_args);
   // controller.setRange(_args, controller.parmReportContract().parmQueryContracts().lookup(controller.getFirstQueryContractKey()));

    //if (controller.prompt())
   // {
        controller.startOperation();
    //}
}
-----------------------------------------------------------------
[SRSReportParameterAttribute(classStr(SalesReportDC))]
class SalesReportRDP extends  SRSReportDataProviderBase
{
    TEC_SalesInvoiceTmp     salesInvoiceTmp; //temptable
   // SalesLine       salesline;
    //InventTable     inventtable;
    //SalesTable      salestable;
    CustInvoiceJour     custinvoicejour;
    CustInvoiceTrans    custinvoicetrans;
    RecId           recId;
}
[SRSReportDataSetAttribute(tableStr(TEC_SalesInvoiceTmp))]
public TEC_SalesInvoiceTmp getSalesInvoiceTmp()
{
    select salesInvoiceTmp;
    return salesInvoiceTmp;
}
private void insertData()
{
    SalesTable          salesTable;
   //  select salesTable   where salesTable.SalesId    ==  custinvoicejour.SalesId;
    salesTable                       = custinvoicejour.salesTable();
    salesInvoiceTmp.ItemId           = custinvoicetrans.ItemId;
    SalesInvoiceTmp.DeliveryDate     = salesTable.DeliveryDate;
    SalesInvoiceTmp.SalesId          = custinvoicetrans.SalesId;
    SalesInvoiceTmp.SalesName        = salesTable.SalesName;
    SalesInvoiceTmp.Price            = custinvoicetrans.SalesPrice;
    SalesInvoiceTmp.NameAlias        = custinvoicetrans.itemName();
    SalesInvoiceTmp.insert();
}
public void processReport()
{
    SalesReportDC           salesReportContract = this.parmDataContract();
    Query                   query = new Query();
    QueryBuildDataSource    qbds,qbds1,qbds2;
    QueryRun                qr;
    ;

    recId               =   salesReportContract.parmSalesRecid();
    qbds = query.addDataSource(tableNum(custinvoicejour));
    qbds1 = qbds.addDataSource(tableNum(custinvoicetrans));
    qbds.addRange(fieldNum(CustInvoiceJour,recId)).value(queryValue(recid));
    qbds1.relations(true);

    qbds1.joinMode(JoinMode::InnerJoin);

    qr = new QueryRun(query);

    while(qr.next())
    {
        custinvoicejour       =   qr.get(tableNum(custinvoicejour));
        custinvoicetrans      =   qr.get(tableNum(custinvoicetrans));
        //salestable       =   qr.get(tableNum(salestable));
        this.insertData();
    }

   // select salesInvoiceTmp where salesInvoiceTmp.ItemId == salesReportContract.parmSalesRecid();
    super();
}

 Now create a menu item – Output type



Drag the menu item in Menu.

Now go to visual studio and design the report.  Select precision design. And after designing add to AOT and Publish.

Now view your report in AX.






AX 2012 - view row headers on all the pages of a SSRS report

Today we will learn on how to view row headers on all pages of a AX 2012 SSRS report

If you want to view row header on all columns:
  • Select tablix and Click advance after.


Here we have two rows above our data. You can see two static members under row group above details.

That means we have to change properties of both static members.

Properties:
  • FixedData = true
  • KeepWithGroup = above
  • RepeatOnNewPage = true

If you want to hide first row containing fields from date and to date.
For the first static member, change these properties.
Properties:
  • FixedData = true
  • KeepWithGroup = None
  • RepeatOnNewPage = true


Deploy it and add to AOT.

View the Output.