Informatica Big Data Management (BDM) product may be a GUI based integrated development tool. This tool is employed by organizations to create Data Quality, Data Integration, and Data Governance processes for his or her big data platforms.

Informatica BDM has built-in Smart Executor that supports various processing engines like Apache Spark, Blaze, Apache Hive on Tez, and Apache Hive on MapReduce.

Informatica BDM is employed to perform data ingestion into a Hadoop cluster, processing on the cluster, and extraction of knowledge from the Hadoop cluster.

In Blaze mode, the Informatica mapping is processed by BlazeTM – Informatica’s native engine that runs as a YARN based application.

In Spark mode, the Informatica mappings are translated into Scala code.

In Hive and MapReduce mode, Informatica’s mappings are translated into MapReduce code and are executed natively to the Hadoop cluster.

Informatica BDM integrates seamlessly with the Hortonworks Data Platform (HDP) Hadoop cluster altogether related aspects, including its default authorization system. Ranger are often wont to enforce a fine-grained role-based authorization to data also as metadata stored inside the HDP cluster.

Informatica’s BDM integrates with Ranger altogether modes of execution. Informatica’s BDM features a Smart Executor that permits organizations to run their Informatica mappings seamlessly on one or more methods of implementation under the purview of their existing security setup.

Authentication

Authentication is that the process of dependably ensuring the user is who claims to be. Kerberos is that the widely accepted authentication mechanism on Hadoop, including the Hortonworks Data Platform. Kerberos protocol relies on a Key Distribution Center (KDC), a network service that issues tickets permitting access.

Informatica BDM supports Kerberos authentication on both Active directory and MIT-based key distribution centers. Kerberos authentication is supported by all modes of execution in Informatica BDM.

Authorization

Authorization is that the process of determining whether a user has access to perform certain operations on a given system or not. In HDP Hadoop clusters, authorization plays an important role in ensuring the users access only the data that they’re allowed to by the Hadoop administrator.

1.Blaze- YARN Application

When executing mappings on Informatica Blaze, optimizer first makes an invocation to Hadoop Service to fetch metadata information like the hive table’s partitioning details.

Then the work is submitted to Blaze Runtime. The illustration represents how Blaze interacts with the Hadoop Service, like Hive Server 2.

When an Informatica mapping gets executed in Blaze mode, then call is formed to the Hive Metastore to know the structure of the tables.

The Blaze runtime then loads the optimized mapping into memory. This mapping then interacts with the corresponding Hadoop service to read the data or write the data.

The Hadoop service itself is integrated with Ranger and ensures the authorization is taken place before the request is served.

2.Spark

Informatica BDM can execute mappings as Spark’s Scala code on the HDP Hadoop cluster. The illustration details different steps involved when using Spark execution mode.

The Spark executor translates Informatica’s mappings into the Spark Scala code. As a part of this translation, if Hive sources or targets are involved, then Spark executor makes a call to Hive metastore to know the structure of the Hive tables and optimize the Scala code.

Then, this Scala code is submitted to YARN for execution. When the Spark code accesses the data, the corresponding Hadoop service relies on Ranger for authorization.

3.Hive on MapReduce

Informatica BDM can execute mappings as MapReduce code on the Hadoop cluster. Below illustration steps involved Hive on MapReduce mode.

When a mapping is executed in Hive on MapReduce mode, the Hive executor on the Informatica node translates the Informatica mapping into MapReduce and submits the work to the Hadoop cluster.

If Hive sources or targets are involved, the Hive executor makes a call to the Hive Meta store to know the table structure and accordingly optimize the mapping. because the MapReduce interacts with Hadoop services like HDFS and Hive, the Hadoop service authorizes the requests with Ranger.

4.Hive on Tez

Tez are often enabled in Informatica BDM by a configuration change and is transparent to the mapping developed.

Hence mappings running on Hive on Tez follow an identical pattern as Hive on MapReduce. When a mapping is executed within the Hive on Tez mode, the Hive executor on the Informatica node translates the Informatica mapping into Tez job and submits it to the Hadoop cluster.

If Hive sources or targets are involved, the Hive executor makes a call to the Hive Meta store to know the table structure and accordingly optimize the mapping. because the Tez job interacts with Hadoop services like HDFS and Hive, the Hadoop service authorizes the requests with Ranger.

So, this brings us to the end of blog. This Tecklearn ‘Concept of Informatica (Big Data Management) BDM’ blog helps you with commonly asked questions if you are looking out for a job in Informatica. If you wish to learn Informatica and build a career in Datawarehouse and ETL domain, then check out our interactive, Informatica Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/informatica-training-and-certification/

Informatica Training

About the Course

Tecklearn’s Informatica Training will help you master Data Integration concepts such as ETL and Data Mining using Informatica PowerCenter. It will also make you proficient in Advanced Transformations, Informatica Architecture, Data Migration, Performance Tuning, Installation & Configuration of Informatica PowerCenter. You will get trained in Workflow Informatica, data warehousing, Repository Management and other processes.

Why Should you take Informatica Training?

Informatica professionals earn up to $130,000 per year – Indeed.com
GE, eBay, PayPal, FedEx, EMC, Siemens, BNY Mellon & other top Fortune 500 companies use Informatica.
Key advantages of Informatica PowerCenter: Excellent GUI interfaces for Administration, ETL Design, Job Scheduling, Session monitoring, Debugging, etc.

What you will Learn in this Course?

Informatica PowerCenter 10 – An Overview

Informatica & Informatica Product Suite
Informatica PowerCenter as ETL Tool
Informatica PowerCenter Architecture
Component-based development techniques

Data Integration and Data Warehousing Fundamentals

Data Integration Concepts
Data Profile and Data Quality Management
ETL and ETL architecture
Brief on Data Warehousing

Informatica Installation and Configuration

Configuring the Informatica tool
How to install the Informatica operational administration activities and integration services

Informatica PowerCenter Transformations

Visualize PowerCenter Client Tools
Data Flow
Create and Execute Mapping
Transformations and their usage
Hands On

Informatica PowerCenter Tasks & Workflows

Informatica PowerCenter Workflow Manager
Reusability and Scheduling in Workflow Manager
Workflow Task and job handling
Flow within a Workflow
Components of Workflow Monitor

Advanced Transformations

Look Up Transformation
Active and Passive Transformation
Joiner Transformation
Types of Caches
Hands On

More Advanced Transformations – SQL (Pre-SQL and Post-SQL)

Load Types – Bulk, Normal
Reusable and Non-Reusable Sessions
Categories for Transformation
Various Types of Transformation – Filter, Expression, Update Strategy, Sorter, Router, XML, HTTP, Transaction Control

Various Types of Transformation – Rank, Union, Stored Procedure

Error Handling and Recovery in Informatica
High Availability and Failover in Informatica
Best Practices in Informatica
Debugger
Performance Tuning

Performance Tuning, Design Principles & Caches

Performance Tuning Methodology
Mapping design tips & tricks
Caching & Memory Optimization
Partition & Pushdown Optimization
Design Principles & Best Practices

Informatica PowerCenter Repository Management

Repository Manager tool (functionalities, create and delete, migrate components)
PowerCenter Repository Maintenance

Informatica Administration & Security

Features of PowerCenter 10
Overview of the PowerCenter Administration Console
Integration and repository service properties
Services in the Administration Console (services, handle locks)
Users and groups

Command Line Utilities

Infacmd, infasetup, pmcmd, pmrep
Automate tasks via command-line programs

More Advanced Transformations – XML

Java Transformation
HTTP Transformation

Got a question for us? Please mention it in the comments section and we will get back to you.

977