Performance Tuning and Partitioning in Informatica

Last updated on Dec 16 2021
Santosh Singh

Table of Contents

Performance Tuning and Partitioning in Informatica

The goal of performance tuning is to optimize session performance by eliminating performance bottlenecks to urge a far better acceptable ETL load time.

Tuning starts with the identification of bottlenecks within the source, target, and mapping and further to session tuning. it’d need further tuning on the system resources on which the Informatica PowerCenter Services are running.

We can use the test load choice to run sessions once we tune session performance.

If we tune all the bottlenecks, we will further optimize session performance by increasing the amount of pipeline partitions within the session.

Adding partitions will improve the performance by utilizing more of the system hardware while processing the session.

Determining the simplest thanks to improve performance are often complicated, so it’s better to vary one variable at a time. If the session performance doesn’t improve, then we will return to the first configuration.

The goal of performance tuning is to optimize session performance in order that the sessions run during the available load window for the Informatica Server.

We can increase the session performance with the assistance of the subsequent tasks, such as:

  • Network: The performance of the Informatica Server is said to network connections.

Generally, the data moves across a network at but 1 MB per second, whereas an area disk moves data five to twenty times faster. Thus network connections often affect session performance. So avoid network connections.

  • Flat files: If the flat files are stored on a machine aside from the Informatica server, then move those files to the device that consists of the Informatica server.
  • Less Connection: Minimize the connections to sources, targets, and Informatica server to enhance session performance. Moving the target database into the server system may improve session performance.
  • Staging areas: If we use staging areas, then force the Informatica server to perform multiple data passes. Removing of staging areas can improve the session performance. Use the area only it’s mandatory.
  • Informatica Servers: we will run the multiple Informatica servers against an equivalent repository. Distributing the session load into the multiple Informatica servers improves the session performance.
  • ASCII: Run the Informatica server in ASCII data movement mode improves the session performance. Because ASCII data movement mode stores a value in one byte, Unicode mode takes 2 bytes to save lots of a space.
  • Source qualifier: If a session joins multiple source tables in one Source Qualifier, optimizing the query can improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause are often beneficial from optimization, like adding indexes.
  • Drop constraints: If the target consists of key constraints and indexes, then it slows the loading of data. to enhance the session performance, drop constraints and indexes before running the session (while loading facts and dimensions) and rebuild them after completion of the session.
  • Parallel sessions: Running parallel sessions by using concurrent batches also will reduce the time of loading the data. So concurrent batches increase the session performance.
  • Partitioning: The session improves the session performance by creating multiple connections for sources/targets and loads data in parallel pipelines.
  • Incremental Aggregation: If a session contains an aggregator transformation, then we use incremental aggregation to enhance session performance.
  • Transformation Errors: Avoid transformation errors to enhance session performance. Before saving the mapping, validate it and see if any error occurs, then transformation errors rectify it.
  • Lookup Transformations: If the session contained lookup transformation, then we will improve the session performance by enabling the lookup cache. The cache enhances the speed by saving the previous data and hence no got to load that again.
  • Filter Transformations: If the session contains filter transformation, create that filter transformation nearer to the sources, or we will use filter condition in source qualifier.
  • Group transformations: Aggregator, Rank, and joiner transformation may often decrease the session performance because they need to group data before processing it. We use sorted ports choice to improve session performance, i.e., sort the data before applying the transformation.
  • Packet size: we will improve the session performance by configuring the network packet size, which allows data to cross the network at just one occasion. to try to this, attend the server manager, choose server configure database connections.

Partitioning in Informatica

The PowerCenter Integration Services creates a default partition type at each partition point. If we’ve the Partitioning option, we will change the partition type. The partition type controls how the PowerCenter Integration Service distributes data among partitions at partition points.

When we configure the partitioning information for a pipeline, then we must define a partition type at each partition point within the pipeline. The partition type determines how the PowerCenter Integration Service redistributes data across partition points.

Here are the subsequent partition types within the Workflow Manager, such as:

  1. Database partitioning: The PowerCenter Integration Service queries the IBM DB2 or Oracle system for table partition information. It reads partitioned data from the corresponding nodes within the database. Use database partitioning with Oracle or IBM DB2 source instances on a multi-node table space. Use database partitioning with DB2 targets.
  2. Hash partitioning: Use hash partitioning once we want the PowerCenter Integration Service to distribute rows to the partitions by the group. for instance, we’d like to sort items by item ID, but we don’t skills many items have a specific ID number.

Here are the 2 sorts of hash partitioning, such as:

o Hash auto-keys: The PowerCenter Integration Service uses all grouped or sorted ports as a compound partition key. Then we’d like to use hash auto-keys partitioning at Rank, Sorter, and unsorted Aggregator transformations.

o Hash user keys: The PowerCenter Integration Service uses a hash function to group rows of data among partitions. And define the amount of ports to get the partition key.

  1. Key range: It specifies one or more ports to make a compound partition key. The PowerCenter Integration Service passes data to every partition counting on the ranges we define for every port. Use key range partitioning where the sources or targets within the pipeline are partitioned by key range.
  2. Pass-through: The PowerCenter Integration Service passes all rows at one partition point to subsequent partition point without redistributing them. Choose pass-through partitioning where we would like to make a replacement pipeline stage to enhance performance, but don’t want to vary the distribution of data across partitions.
  3. Round-robin: The PowerCenter Integration Service distributes blocks of data to at least one or more partitions. Use round-robin partitioning in order that each partition process rows supported the amount and size of the blocks.

Key Points of Informatica Partitions

Below are some essential points while we use the partitions in Informatica, such as:

  • We cannot create a partition key for round-robin, hash auto-keys, and pass-through partition.
  • If we’ve a bitmap index upon the target and using the pass-through partition, then we’d like to update the target table. during this process, the session could be failing because the bitmap index creates the locking problem.
  • Partition increases the entire DTM buffer memory requirement. to make sure enough free memory to avoid memory allocation failures.
  • When we use a pass-through partition, then Informatica tries to form multiple connection requests to the database server. to make sure that the database is configured to simply accept the more connections requests.
  • We can use the native database options as partition alternatives to extend the degree of parallelism of query processing.

For example, within the Oracle database, we will specify a PARALLEL hint or alter the DOP of the table.

  • We also can use both Informatica and native database level parallel as per the requirement.

For example, create 2 pass-through pipelines and every sending the query to the Oracle database with the PARALLEL hint.

So, this brings us to the end of blog. This Tecklearn ‘Performance Tuning and Partitioning in Informatica’ blog helps you with commonly asked questions if you are looking out for a job in Informatica. If you wish to learn Informatica and build a career in Datawarehouse and ETL domain, then check out our interactive, Informatica Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/informatica-training-and-certification/

Informatica Training

About the Course

Tecklearn’s Informatica Training will help you master Data Integration concepts such as ETL and Data Mining using Informatica PowerCenter. It will also make you proficient in Advanced Transformations, Informatica Architecture, Data Migration, Performance Tuning, Installation & Configuration of Informatica PowerCenter. You will get trained in Workflow Informatica, data warehousing, Repository Management and other processes.

Why Should you take Informatica Training?

  • Informatica professionals earn up to $130,000 per year – Indeed.com
  • GE, eBay, PayPal, FedEx, EMC, Siemens, BNY Mellon & other top Fortune 500 companies use Informatica.
  • Key advantages of Informatica PowerCenter: Excellent GUI interfaces for Administration, ETL Design, Job Scheduling, Session monitoring, Debugging, etc.

What you will Learn in this Course?

Informatica PowerCenter 10 – An Overview

  • Informatica & Informatica Product Suite
  • Informatica PowerCenter as ETL Tool
  • Informatica PowerCenter Architecture
  • Component-based development techniques

Data Integration and Data Warehousing Fundamentals

  • Data Integration Concepts
  • Data Profile and Data Quality Management
  • ETL and ETL architecture
  • Brief on Data Warehousing

Informatica Installation and Configuration

  • Configuring the Informatica tool
  • How to install the Informatica operational administration activities and integration services

Informatica PowerCenter Transformations

  • Visualize PowerCenter Client Tools
  • Data Flow
  • Create and Execute Mapping
  • Transformations and their usage
  • Hands On

Informatica PowerCenter Tasks & Workflows

  • Informatica PowerCenter Workflow Manager
  • Reusability and Scheduling in Workflow Manager
  • Workflow Task and job handling
  • Flow within a Workflow
  • Components of Workflow Monitor

Advanced Transformations

  • Look Up Transformation
  • Active and Passive Transformation
  • Joiner Transformation
  • Types of Caches
  • Hands On

More Advanced Transformations – SQL (Pre-SQL and Post-SQL)

  • Load Types – Bulk, Normal
  • Reusable and Non-Reusable Sessions
  • Categories for Transformation
  • Various Types of Transformation – Filter, Expression, Update Strategy, Sorter, Router, XML, HTTP, Transaction Control

Various Types of Transformation – Rank, Union, Stored Procedure

  • Error Handling and Recovery in Informatica
  • High Availability and Failover in Informatica
  • Best Practices in Informatica
  • Debugger
  • Performance Tuning

Performance Tuning, Design Principles & Caches

  • Performance Tuning Methodology
  • Mapping design tips & tricks
  • Caching & Memory Optimization
  • Partition & Pushdown Optimization
  • Design Principles & Best Practices

Informatica PowerCenter Repository Management

  • Repository Manager tool (functionalities, create and delete, migrate components)
  • PowerCenter Repository Maintenance

Informatica Administration & Security

  • Features of PowerCenter 10
  • Overview of the PowerCenter Administration Console
  • Integration and repository service properties
  • Services in the Administration Console (services, handle locks)
  • Users and groups

Command Line Utilities

  • Infacmd, infasetup, pmcmd, pmrep
  • Automate tasks via command-line programs

More Advanced Transformations – XML

  • Java Transformation
  • HTTP Transformation

Got a question for us? Please mention it in the comments section and we will get back to you.

0 responses on "Performance Tuning and Partitioning in Informatica"

Leave a Message

Your email address will not be published. Required fields are marked *