Concept of Informatica (Big Data Management) BDM

Last updated on Dec 16 2021
Santosh Singh

Table of Contents

Concept of Informatica (Big Data Management) BDM

Informatica Big Data Management (BDM) product may be a GUI based integrated development tool. This tool is employed by organizations to create Data Quality, Data Integration, and Data Governance processes for his or her big data platforms.

Informatica BDM has built-in Smart Executor that supports various processing engines like Apache Spark, Blaze, Apache Hive on Tez, and Apache Hive on MapReduce.

Informatica BDM is employed to perform data ingestion into a Hadoop cluster, processing on the cluster, and extraction of knowledge from the Hadoop cluster.

In Blaze mode, the Informatica mapping is processed by BlazeTM – Informatica’s native engine that runs as a YARN based application.

In Spark mode, the Informatica mappings are translated into Scala code.

In Hive and MapReduce mode, Informatica’s mappings are translated into MapReduce code and are executed natively to the Hadoop cluster.

Informatica BDM integrates seamlessly with the Hortonworks Data Platform (HDP) Hadoop cluster altogether related aspects, including its default authorization system. Ranger are often wont to enforce a fine-grained role-based authorization to data also as metadata stored inside the HDP cluster.

Informatica’s BDM integrates with Ranger altogether modes of execution. Informatica’s BDM features a Smart Executor that permits organizations to run their Informatica mappings seamlessly on one or more methods of implementation under the purview of their existing security setup.

Authentication

Authentication is that the process of dependably ensuring the user is who claims to be. Kerberos is that the widely accepted authentication mechanism on Hadoop, including the Hortonworks Data Platform. Kerberos protocol relies on a Key Distribution Center (KDC), a network service that issues tickets permitting access.

Informatica BDM supports Kerberos authentication on both Active directory and MIT-based key distribution centers. Kerberos authentication is supported by all modes of execution in Informatica BDM.

Authorization

Authorization is that the process of determining whether a user has access to perform certain operations on a given system or not. In HDP Hadoop clusters, authorization plays an important role in ensuring the users access only the data that they’re allowed to by the Hadoop administrator.

1.Blaze- YARN Application

When executing mappings on Informatica Blaze, optimizer first makes an invocation to Hadoop Service to fetch metadata information like the hive table’s partitioning details.

image001 10
Blaze- YARN

Then the work is submitted to Blaze Runtime. The illustration represents how Blaze interacts with the Hadoop Service, like Hive Server 2.

When an Informatica mapping gets executed in Blaze mode, then call is formed to the Hive Metastore to know the structure of the tables.

The Blaze runtime then loads the optimized mapping into memory. This mapping then interacts with the corresponding Hadoop service to read the data or write the data.

The Hadoop service itself is integrated with Ranger and ensures the authorization is taken place before the request is served.

2.Spark

Informatica BDM can execute mappings as Spark’s Scala code on the HDP Hadoop cluster. The illustration details different steps involved when using Spark execution mode.

image003 8
Spark

The Spark executor translates Informatica’s mappings into the Spark Scala code. As a part of this translation, if Hive sources or targets are involved, then Spark executor makes a call to Hive metastore to know the structure of the Hive tables and optimize the Scala code.

Then, this Scala code is submitted to YARN for execution. When the Spark code accesses the data, the corresponding Hadoop service relies on Ranger for authorization.

3.Hive on MapReduce

Informatica BDM can execute mappings as MapReduce code on the Hadoop cluster. Below illustration steps involved Hive on MapReduce mode.

image005 6
Hive on MapReduce

When a mapping is executed in Hive on MapReduce mode, the Hive executor on the Informatica node translates the Informatica mapping into MapReduce and submits the work to the Hadoop cluster.

If Hive sources or targets are involved, the Hive executor makes a call to the Hive Meta store to know the table structure and accordingly optimize the mapping. because the MapReduce interacts with Hadoop services like HDFS and Hive, the Hadoop service authorizes the requests with Ranger.

4.Hive on Tez

Tez are often enabled in Informatica BDM by a configuration change and is transparent to the mapping developed.

image008 5
Hive on Tez

Hence mappings running on Hive on Tez follow an identical pattern as Hive on MapReduce. When a mapping is executed within the Hive on Tez mode, the Hive executor on the Informatica node translates the Informatica mapping into Tez job and submits it to the Hadoop cluster.

If Hive sources or targets are involved, the Hive executor makes a call to the Hive Meta store to know the table structure and accordingly optimize the mapping. because the Tez job interacts with Hadoop services like HDFS and Hive, the Hadoop service authorizes the requests with Ranger.

So, this brings us to the end of blog. This Tecklearn ‘Concept of Informatica (Big Data Management) BDM’ blog helps you with commonly asked questions if you are looking out for a job in Informatica. If you wish to learn Informatica and build a career in Datawarehouse and ETL domain, then check out our interactive, Informatica Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/informatica-training-and-certification/

Informatica Training

About the Course

Tecklearn’s Informatica Training will help you master Data Integration concepts such as ETL and Data Mining using Informatica PowerCenter. It will also make you proficient in Advanced Transformations, Informatica Architecture, Data Migration, Performance Tuning, Installation & Configuration of Informatica PowerCenter. You will get trained in Workflow Informatica, data warehousing, Repository Management and other processes.

Why Should you take Informatica Training?

  • Informatica professionals earn up to $130,000 per year – Indeed.com
  • GE, eBay, PayPal, FedEx, EMC, Siemens, BNY Mellon & other top Fortune 500 companies use Informatica.
  • Key advantages of Informatica PowerCenter: Excellent GUI interfaces for Administration, ETL Design, Job Scheduling, Session monitoring, Debugging, etc.

What you will Learn in this Course?

Informatica PowerCenter 10 – An Overview

  • Informatica & Informatica Product Suite
  • Informatica PowerCenter as ETL Tool
  • Informatica PowerCenter Architecture
  • Component-based development techniques

Data Integration and Data Warehousing Fundamentals

  • Data Integration Concepts
  • Data Profile and Data Quality Management
  • ETL and ETL architecture
  • Brief on Data Warehousing

Informatica Installation and Configuration

  • Configuring the Informatica tool
  • How to install the Informatica operational administration activities and integration services

Informatica PowerCenter Transformations

  • Visualize PowerCenter Client Tools
  • Data Flow
  • Create and Execute Mapping
  • Transformations and their usage
  • Hands On

Informatica PowerCenter Tasks & Workflows

  • Informatica PowerCenter Workflow Manager
  • Reusability and Scheduling in Workflow Manager
  • Workflow Task and job handling
  • Flow within a Workflow
  • Components of Workflow Monitor

Advanced Transformations

  • Look Up Transformation
  • Active and Passive Transformation
  • Joiner Transformation
  • Types of Caches
  • Hands On

More Advanced Transformations – SQL (Pre-SQL and Post-SQL)

  • Load Types – Bulk, Normal
  • Reusable and Non-Reusable Sessions
  • Categories for Transformation
  • Various Types of Transformation – Filter, Expression, Update Strategy, Sorter, Router, XML, HTTP, Transaction Control

Various Types of Transformation – Rank, Union, Stored Procedure

  • Error Handling and Recovery in Informatica
  • High Availability and Failover in Informatica
  • Best Practices in Informatica
  • Debugger
  • Performance Tuning

Performance Tuning, Design Principles & Caches

  • Performance Tuning Methodology
  • Mapping design tips & tricks
  • Caching & Memory Optimization
  • Partition & Pushdown Optimization
  • Design Principles & Best Practices

Informatica PowerCenter Repository Management

  • Repository Manager tool (functionalities, create and delete, migrate components)
  • PowerCenter Repository Maintenance

Informatica Administration & Security

  • Features of PowerCenter 10
  • Overview of the PowerCenter Administration Console
  • Integration and repository service properties
  • Services in the Administration Console (services, handle locks)
  • Users and groups

Command Line Utilities

  • Infacmd, infasetup, pmcmd, pmrep
  • Automate tasks via command-line programs

More Advanced Transformations – XML

  • Java Transformation
  • HTTP Transformation

Got a question for us? Please mention it in the comments section and we will get back to you.

0 responses on "Concept of Informatica (Big Data Management) BDM"

Leave a Message

Your email address will not be published. Required fields are marked *