Top ETL Testing Interview Questions and Answers

Last updated on Feb 18 2022
Sunder Rangnathan

Table of Contents

Top ETL Testing Interview Questions and Answers

What is ETL?

In data warehousing architecture, ETL is an important component, which manages the data for any business process. ETL stands for  Extract, Transform  and  Load. Extract does the process of reading data from a database. Transform does the converting of data into a format that could be appropriate for reporting and analysis. While, load does the process of writing the data into the target database.

What exactly do you mean by ETL?

ETL stands for Extract Transform Load and is widely regarded as one of the essential Testing tools in the data warehousing architecture. Its main task is to handle data management for the businesses process which is complex and are useful to the business in many ways. Extracting simply means reading the data from a database. Transformation means converting the data into a form which is suitable for analysis and reporting. The load on the other side handles the process of writing and managing the data into the database which users want to target simply.

What are the various tools used in ETL?

  • Cognos Decision Stream
  • Oracle Warehouse Builder
  • Business Objects XI
  • SAS business warehouse
  • SAS Enterprise ETL server

Explain what are the ETL testing operations includes?

ETL testing includes

  • Verify whether the data is transforming correctly according to business requirements
  • Verify that the projected data is loaded into the data warehouse without any truncation and data loss
  • Make sure that ETL application reports invalid data and replaces with default values
  • Make sure that data loads at expected time frame to improve scalability and performance

Name a few ETL bugs.

  • Following is the list of ETL bugs:
  • Bug related to ECP.
  • Bug related to load conditioning.
  • Source related bugs.
  • Bugs related to calculations.
  • Bug related to the user interface.

Mention what are the types of data warehouse applications and what is the difference between data mining and data warehousing?

The types of data warehouse applications are

  • Info Processing
  • Analytical Processing
  • Data Mining

Data mining can be defined as the process of extracting hidden predictive information from large databases and interpret the data while data warehousing may make use of a data mine for analytical processing of the data in a faster way. Data warehousing is the process of aggregating data from multiple sources into one common repository

What is fact? What are the types of facts?

It is a central component of a multi-dimensional model which contains the measures to be analysed. Facts are related to dimensions.

Types of facts are

  • Additive Facts
  • Semi-additive Facts
  • Non-additive Facts

Explain what is Grain of Fact?

  • Grain fact can be defined as the level at which the fact information is stored. It is also known as Fact Granularity

Explain what factless fact schema is and what is Measures?

  • A fact table without measures is known as Factless fact table. It can view the number of occurring events. For example, it is used to record an event such as employee count in a company.
  • The numeric data based on columns in a fact table is known as Measures

Explain what are Cubes and OLAP Cubes?

Cubes are data processing units comprised of fact tables and dimensions from the data warehouse. It provides multi-dimensional analysis.

OLAP stands for Online Analytics Processing, and OLAP cube stores large data in muti-dimensional form for reporting purposes. It consists of facts called as measures categorized by dimensions.

Explain what is tracing level and what are the types?

Tracing level is the amount of data stored in the log files. Tracing level can be classified in two Normal and Verbose. Normal level explains the tracing level in a detailed manner while verbose explains the tracing levels at each and every row.

Explain what is transformation?

A transformation is a repository object which generates, modifies or passes data. Transformation are of two types Active and Passive

Explain the use of Lookup Transformation?

The Lookup Transformation is useful for

  • Getting a related value from a table using a column value
  • Update slowly changing dimension table
  • Verify whether records already exist in the table

Explain what is partitioning, hash partitioning and round robin partitioning?

To improve performance, transactions are sub divided, this is called as Partitioning. Partioning enables Informatica Server for creating of multiple connection to various sources

The types of partitions are

Round-Robin Partitioning:

  • By informatica data is distributed evenly among all partitions
  • In each partition where the number of rows to process are approximately same this partioning is applicable

Hash Partitioning:

  • For the purpose of partitioning keys to group data among partitions Informatica server applies a hash function
  • It is used when ensuring the processes groups of rows with the same partitioning key in the same partition need to be ensured

Mention what is the advantage of using DataReader Destination Adapter?

The advantage of using the DataReader Destination Adapter is that it populates an  ADO recordset  (consist of records and columns) in memory and exposes the data from the DataFlow task by implementing the DataReader interface, so that other application can consume the data.

Using SSIS ( SQL Server Integration Service) what are the possible ways to update table?

To update table using SSIS the possible ways are:

  • Use aSQL command
  • Use a staging table
  • Use Cache
  • Use the Script Task
  • Use full database name for updating if MSSQL is used

In case you have non-OLEDB (Object Linking and Embedding Database) source for the lookup what would you do?

In case if you have non-OLEBD source for the lookup then you have to use Cache to load data and use it as source

In what case do you use dynamic cache and static cache in connected and unconnected transformations?

  • Dynamic cache is used when you have to update master table and slowly changing dimensions (SCD) type
  • For flat files Static cache is used

Explain what is data source view?

A data source view allows to define the relational schema which will be used in the analysis services databases. Rather than directly from data source objects, dimensions and cubes are created from data source views.

Explain what is the difference between OLAP tools and ETL tools?

The difference between ETL and OLAP tool is that

ETL tool is meant for the extraction of data from the legacy systems and load into specified data base with some process of cleansing data.

Example: Data stage, Informatica etc.

While OLAP is meant for reporting purpose in OLAP data available in multi-directional model.

Example: Business Objects, Cognos etc.

How you can extract SAP data using Informatica?

  • With the power connect option you extract SAP data using informatica
  • Install and configure the PowerConnect tool
  • Import the source into the Source Analyzer. Between Informatica and SAP Powerconnect act as a gateaway. The next step is to generate the ABAP code for the mapping then only informatica can pull data from SAP
  • To connect and import sources from external systems Power Connect is used

Mention what is the difference between Power Mart and Power Center?

Power Center Power Mart
      Suppose to process huge volume of data        Suppose to process low volume of data
       It supports ERP sources such as SAP, people soft etc.        It does not support ERP sources
       It supports local and global repository       It supports local repository
     It converts local into global repository      It has no specification to convert local into global repository

Explain what are the differences between Unconnected and Connected lookup?

Connected Lookup Unconnected Lookup
 Connected lookup participates in mapping It is used when lookup function is used instead of an expression transformation while mapping
Multiple values can be returned Only returns one output port
It can be connected to another transformations and returns a value Another transformation cannot be connected
Static or dynamic cache can be used for connected Lookup  Unconnected as only static cache
 Connected lookup supports user defined default values  Unconnected look up does not support user defined default values
 In Connected Lookup multiple column can be return from the same row or insert into dynamic lookup cache   Unconnected lookup designate one return port and returns one column from each row

 

Explain what staging area is and what is the purpose of a staging area?

Data staging is an area where you hold the data temporary on data warehouse server. Data staging includes following steps

  • Source data extraction and data transformation (restructuring )
  • Data transformation (data cleansing, value transformation)
  • Surrogate key assignments

Explain these terms Session, Worklet, Mapplet and Workflow ?

  • Mapplet : It arranges or creates sets of transformation
  • Worklet: It represents a specific set of tasks given
  • Workflow: It’s a set of instructions that tell the server how to execute tasks
  • Session: It is a set of parameters that tells the server how to move data from sources to target

ETL Testing Vs DB Testing

Compare ETL Testing and DB Testing
ETL Testing DB Testing
Business Intelligence reporting Goal is to integrate data
Business flow environment based on earlier data Applicable to business flow systems
Informatica, Cognos and QuerySurge can be used QTP and Selenium tools for automation
Analysing data may have potential impact Architectural implementation involves high impact.
Dimensional model Entity relationship model
Analytics are processed Transactions are processed
Denormalized data is used Data used is normalized

There is a group of parameters that direct the server regarding the movement of the data from the source to the target. What it is called as?

It is called as Session.

Do you have any idea about the ETL testing and the operations that are a part of the same?

Well, there are certain important tasks that are opted in this. It simply means verifying the data in terms of its transformation in the correct or the right manner as per the needs of a business. It also includes the verification of the projected data. The users can also check whether the data is successfully loaded in the warehouse or not without worrying about the loss of data. The improvement in scalability, as well as the performance can also be assured from this directly. In addition to this, the ETL simply replaces the default values which are not always useful to the users.

How can you put Power Center different from the Power Mart?

Power Mart is good to be considered only when the data processing requirements are low. On the other side, the Power Center can simply process bulk time in a short span of time. Power Center can easily support ERP such as SAP while no support of the same is available on the ERP. The local repository can be supported by the Mart while the center cannot perform this task reliably.

What is Bus Schema?

For the various business process to identify the common dimensions, BUS schema is used. It comes with a conformed dimensions along with a standardized definition of information

Explain what is data purging?

Data purging is a process of deleting data from data warehouse. It deletes junk data’s like rows with null values or extra spaces.

Explain what are Schema Objects?

Schema objects are the logical structure that directly refer to the databases data. Schema objects includes tables, views, sequence synonyms, indexes, clusters, functions packages and database links

What is partitioning in ETL?

The transactions are always needed to be divided for the better performance. The same processes are known as Partitioning. It simply makes sure that the server can directly access the sources through multiple connections.

Name a few tools that you can easily use with the ETL

There are many tools that can be considered. However, it is not always necessary that a user needs all of them at the same time. Also, which tool is used simply depends on the preference and the task that needs to be accomplished. Some of the commonly used ones are Oracle Warehouse Builder, Congos Decision Stream, SAS Enterprise ETL server and SAS Business warehouse.

What do you understand by the term fact in the ETL and what are the types of the same?

Basically, it is regarded as a federal component that generally belongs to a model which has multiple dimensions. The same can also be used when it comes to considering the measures that belong to analyzation. The facts are generally useful for providing the dimensions that largely maters in the ETL. The commonly used types of facts in ETL are Semi-additive facts, Additive Facts, as well as Non-additive Facts.

What exactly do you know about the tracing level and the types of the same?

There are files logs and there is a limit on them when it comes to storing data in them. The Tracing level is nothing but the amount of data that can be easily stored on the same. These levels clearly explain the tracing levels and in a manner that provides all the necessary information regarding the same. There are two types of same and they are:

  1. Verbose
  2. Normal

Do you think the data warehousing and the data mining is different from one another. How the same can be associated with the warehousing applications?

Well, the warehousing applications that are important generally include the analytical process, Information processing, as well as Data mining. There are actually a very large number of predictive that needs to be extracted from the database with a lot of data. The warehousing sometimes depends on the mining for the operations involved. The data mining is useful for the analytical process while the other doesn’t. The data can simply be aggregated from the sources that are different through the warehousing approach while the same is not possible in case of mining.

Do you have any information regarding the Grain of Fact?

The fact information can be stored at a level that is known as grain fact. The other name of this is Fact Granularity. It is possible for the users to change the name of the same when the need of same is realized. There are multiple files that are associated with the same and the users can use this for changing the name of all of them directly.

It is possible to load the data and use it as a source?

Yes, in ETL it is possible. This task can be accomplished simply by using the Cache. The users must make sure that the Cache is free and is generally optimized before it is used for this task. At the same time, the users imply make sure that the desired outcomes can simply be assured without making a lot of efforts.

What is Factless fact table in ETL?

It is defined as the table without measures in the ETL. There are a number of events that can be managed directly with the same. It can also record events that are related to the employees or with the management and this task can be accomplished in a very reliable manner.

What exactly do you mean by the Transformation? What are the types of same?

It is basically regarded as the repository object which is capable to produce the data and can even modify and pass it in a reliable manner. The two commonly used transformations are Active and Passive.

What is the exact purpose of an ETL according to you?

It is actually very beneficial for the extracting of data and from the systems that are based on legacy.

Can you define measures in a simple statement?

Well, they can be called as the number data which is generally based on the columns and is generally present in a fact table by default.

When you will make use of the Lookup Transformation?

It is one of the finest and in fact one of the very useful approaches in the ETL. It simply makes sure that the users can get a related value from a table and with the help of a column value that seems useless. In addition to this, it simply makes sure of boosting the performance of a dimensions table which is changing at a very slow rate. Also, there are situations when the records already remain present in the table. Dealing with such issues can be made possible with the help of Lookup transformation.

What do you understand by Data Purging?

There are needs and situations when the data needs to be deleted from the data warehouse. It is a very daunting task to delete the data in bulk. The Purging is an approach that can delete multiple files at the same time and enable users to maintain speed as well as efficiency. A lot of extra space can be created simply with this.

Can you tell something about Bus Schema?

Dimension identification is something that is very important in the ETL and the same is largely handled by the Bus Schema.

Are your familiar with the Dynamic and the Static Cache?

When it comes to updating the master table, the dynamic cache can be opted. Also, the users are free to use it for changing the dimensions. On the other side, the users can simply manage the flat files through the Static Cache.  It is possible to deploy both the Dynamic and the Static Cache at the same time depending on the task and the overall complexity of the final outcome.

What do you mean by staging area?

It is an area which is used when it comes to holding the information or the data temporary on the server that controls the data warehouse. There are certain steps that are included and the prime one among them is Surrogate assignments.

What are the types of Partitioning you are familiar with?

There are two types of Partitioning that is common in ETL and they are:

  1. Hash
  2. Round-robin

Can you tell a few benefits of using the Data Reader Destination Adapter?

There are ADO record sets which generally consists of columns and records. When it comes to populating them in a simple manner, the Data Reader Destination Adapter is very useful. It simply exposes the data flow and let the users impose various restrictions on the data which is required in many cases.

Is it possible to extract the SAP data with the help of Informatica?

There is a power connect option that simply let the users perform this task and in a very reliable manner. It is necessary to import the source code in the analyzer before you accomplish this task.

What do you mean by the term Mappet?

This is actually an approach that is useful for creating or arranging the different sets in the transformation. It simply let user accomplish other tasks also that largely matters and are related to the data warehouse.

What are commercial ETL tools?

  1. Ab Initio
  2. Adeptia ETL
  3. Business Objects Data Services
  4. Informatica PowerCenter
  5. Business Objects Data Integrator (BODI) Confluent
  6. DBSoftLab

Can you explain the fact table?

In datawarehouse fact table is the central table of star schema.

What are the types of measures?

There are types of measures:

  • Additive Measures – Can be joined across any dimensions of fact table.
  • Semi Additive Measures – Can be joined across only some of dimensions of fact table.
  • Non Additive Measures – Cannot be joined across any dimensions of fact table.

Give a brief on Grain of Fact?

Grain fact functionality defined as a level/stage where the fact information will be stored. Also called as Fact Granularity.

Define Transformation?

In ETL, Transformation involves, data cleansing, Sorting the data, Combining or merging and applying the business rules to the data for improvising the data for quality and accuracy in ETL process.

What is Lookup Transformation?

The Lookup transformation accomplished lookups by joining information in input columns with columns in a reference dataset. You utilize the lookup to get to extra data in a related table that depends on values in like common columns.

Is it possible to update the table using the SQL service?

Yes, it is actually possible and the users can perform this task simply and in fact without worrying about anything. The users generally have several options to accomplish this task easily. The methods that can be helpful in this matter are using a staging cable, using a SQL command, using the MSSQL, as well as using the Cache.

How can you define a Workflow?

It is basically a group that contains instructions that let the server perform the executions related tasks.

Tell one basic difference between the Connected Lookup and Unconnected ones?

Mapping is common in the connected lookups while it is not so common in the latter. It is only used in the Unconnected lookup only when the function is already defined. There are several values that can be returned from the Connected Lookup while the Unconnected Lookup has a strict upper limit on the same.

Tell something about the Data source view and how it is significant?

There are several analysis services databases that largely depend on the relational schema and the prime task of the Data source is to define the same. They are also helpful in creating the cubes and dimensions with the help of which the users can set the dimensions in a very easy manner.

What are objects in the Schema?

These are basically considered as the logical structures that are related with the data. They generally contain tables, views, synonyms, clusters as well as function packages. In addition to this, there are several database links which are present in them.

What are Cubes in the ETL and how they are different from that of OLAP?

There are things on which the data processing depends largely and cubes are one among them, they are generally regarded as the units of the same that provide useful information regarding the dimensions and fact tables.  When it comes to multi-dimensions analysis, the same can simply be assured from this. On the other side, the Online Analytics Processing stores a large data in the dimensions that are more than one. This is generally done for making the reporting process smoother and more reliable. All the facts in it are categorized and this is exactly what that makes them easy to understand.

Compare Unconnected Vs Connected Lookups

Connected Lookup Unconnected Lookup
Either dynamic or Static Cache can be used.  Can use only Static Cache.
We can return multiple rows from the same row  Can return only one output port
It supports user-defined values  It won’t support user defined values
We can pass any number of values to another transformation. Can pass one output value to one transformation
Cache has all lookup columns that are used in mapping. Cache has all the lookups or output ports of lookup conditions and return port.

Define Bus Schema?

In data warehouse, BUs Schema is used for identifying the most common dimensions in business process.  In one word it is definite dimension and a standardized equivalent of facts.

What data purging means?

Data Purging – Common word that used in data warehousing for deleting or erasing data from the storage.

Schema Objects means?

Schema objects can be defined as the logical structures, where as DB stores the schema object logically within a database tablespace. Schema Objects can be the tables, clusters or views, sequence or indexes, functions packages and db links.

Can you brief the terms Mapplet, Session, Workflow and Worklet?

  • Mapplet :  Reusable object that contains a set of transformations.
  • Worklet:  Represents set of workflow tasks
  • Workflow:  Customs the tasks for each record that need to execute.
  • Session:  set of instructions that instructs how to flow the data to the target.

What are Cubes and OLAP Cubes?

Cubes are data processing units comprised of fact tables and dimensions from the data warehouse. They provide a multi-dimensional analysis.

OLAP stands for ‘Online Analytics Processing,’ and  OLAP Cubes  store voluminous data in a multi-dimensional form for reporting purposes. They consist of facts called ‘measures’ categorized by dimensions.

Mention the types of Data Warehouse applications. What is the difference between Data Mining and Data Warehousing?

Types of data warehouse applications are:

  • Info Processing
  • Analytical Processing
  • Data Mining

Data mining can be defined as the process of extracting hidden predictive information from large databases and interpreting the data, while data warehousing may make use of a data mine for the analytical processing of the data in a faster way.

Data warehousing is the process of aggregating data from multiple sources into one common repository.

Why ETL Testing is required?

  • To keep an eye on data that is being transferred from one system to another
  • To keep track of the efficiency and speed of the process
  • To achieve fast and the best results

Compare ETL Testing with Manual Testing.

Criteria ETL Testing Manual testing
Basic procedure Writing scripts for automating the testing process A method of observing and testing
Requirements No need for additional technical knowledge other than the understanding of the software Needs technical knowledge of SQL and Shell scripting
Efficiency Fast and systematic, and provides top results Needs time and effort, and is prone to errors

What are the responsibilities of an ETL Tester?

An ETL Tester:

  • Requires in-depth knowledge of the ETL tools and processes
  • Needs to write SQL queries for various scenarios during the testing phase
  • Should be able to carry out different types of tests and keep a check on the other functionalities of the process
  • Needs to carry out quality checks on a regular basis

What are the various tools used in ETL?

  • Cognos Decision Stream
  • Oracle Warehouse Builder
  • Business Objects XI
  • SAS Business Warehouse
  • SAS Enterprise ETL Server

Define ETL Processing.

ETL Testing Process:

Although there are many ETL tools, there is a simple testing process commonly used in ETL Testing. It is as important as the implementation of the ETL tool into your business. Having a well-defined ETL Testing strategy can make the testing process much easier. Hence, this process needs to be completed before you start the data integration with the selected ETL tool.

In this ETL Testing process, a group of experts comprising the programming and developing team will start writing SQL statements. The development team may customize them according to the requirements.

ETL Testing process has the following stages:

  • Analyzing requirements:  Understanding the business structure and their particular requirements.
  • Validation and test estimation:  Estimating the time and expertise required to carry on with the procedure.
  • Test planning and designing the testing environment:  Based on the inputs from the estimation, an ETL environment is planned and worked out.
  • Test data preparation and execution:  Data for the test is prepared and executed as per the requirements.
  • Summary report:   Upon the completion of the test run, a brief summary report is prepared for improvising and concluding.

What do ETL Testing operations include?

ETL Testing includes:

  • Verifying whether the data is transformed accurately according to business requirements
  • Verifying that the projected data is loaded into the data warehouse without any truncation or data loss
  • Making sure that the ETL application reports any invalid data and replaces with default values
  • Making sure that the data loads within the expected time frame to improve scalability and performance

List a few ETL bugs.

  • Calculation Bug
  • User Interface Bug
  • Source Bug
  • Load Condition Bug
  • ECP-related Bug

What is Fact? What are the types of Facts?

Fact is a central component of a multi-dimensional model that contains the measures to be analyzed. Facts are related to dimensions.

Types of facts are:

  • Additive Facts
  • Semi-additive Facts
  • Non-additive Facts

What is ETL?

ETL stands for Extract-Transform-Load. It is an important component in the data warehouse with which one can manage the data of any business.

  • The extract does the reading of the data from the database.
  • Transform does the conversion of data such that it can be used for analysis and reporting.
  • Load does the allocation of data to the respective database.

What are the operations included in ETL testing?

Following are the operations included in ETL testing:

  • Verification of the conversion of data to a required business format.
  • Verification of the loading of data to the respective data warehouse without any cut short of the main data.
  • Ensuring that there are no invalid data and if found any, replacing them with the default data.
  • Ensuring the time frame to improve the performance and expandability of the loading.

What are the different types of data warehouse applications?

Following are the different types of data warehouse applications:

  • Processing of information.
  • Processing of analytics.
  • Data mining.

What is the difference between data mining and data warehousing?

Following is the table explaining the difference between data mining and data warehousing:

Parameter Data mining Data warehousing
Definition Data mining refers to extracting information from hidden patterns. Data warehousing refers to the collection of data from various places and storing them in one place.
Key features Outcomes that are likely can be predicted.
The patterns can be discovered automatically
Data can be obtained within the fixed time frame.
Heterogeneous data can be used to make the final database.
Advantages The marketing of the product is direct.
Detailed analysis of trends in the marketplace.
Productivity and performance are better.
Cost-effective.

 

Name the types of tools used in ETL.

Following are the different types of tools that are used in ETL:

  • Warehouse builder from Oracle.
  • Decision stream from Cognos.
  • Business warehouse from SAS.
  • Enterprise ETL server from SAS.
  • Business object XI.

Define fact.

Fact is defined as the central component related to the multi-dimensional model. The multi-dimensional model contains measures that are to be analyzed.

What are the different types of facts?

Following are the different types of facts:

  • Additive facts
  • Semi-additive facts
  • Non-additive facts

What do Cubes mean?

Cubes are defined as the data processing units that consist of facts tables and dimensions obtained from the data warehouse. It is used for multi-dimensional analysis.

What does OLAP Cubes mean?

OLAP cubes stand for Online Analytics Processing cubes. It is used for storing multi-dimensional data on a large scale. It consists of dimensions that are segregated on the basis of measures.

What does the tracing level mean?

The tracing level is defined as the amount of data that is stored in the log files.

What are the different types of tracing levels?

Following are the two different types of tracing levels:

  • Normal
  • Verbose

Explain the types of tracing levels.

Normal level is the first type of tracing levels that are used to explain the tracing level in a detailed manner. The verbose level is the second type of levels that are used to explain the tracing level at every row.

What do you mean by the term “Grain of Fact”?

Grain of fact is also known as Fact Granularity and is defined as the storage place of the fact information.

Define measures.

Measures are defined as the numeric data on the basis of the columns in a fact table.

What is a factless fact schema?

A factless fact schema is the fact table without any measures. It is used to view the number of occurrences of the events.

What is the transformation?

Tranformation is defined as the storage place where the generation, modification, and passing of data take place.

How many types of transformation are there?

There are two types of transformation:

  • Active transformation
  • Passive transformation

What is an active transformation?

Active transformation is used to modify the rows of data and also the number of input rows that are passed through them. An example of an active transformation is Filter transformation.

What is a passive transformation?

Passive transformation is used to get the input and output data in the same number of rows. An example of passive transformation is Lookup transformation.

What is the use of Lookup transformation?

Following are the uses of Lookup transformation:

  • With the use of column value, the related value can be found from the table.
  • The dimension of the table changes slowly.
  • Lookup example formation is used for the verification of the existing records.

What is partitioning?

Partitioning is defined as the division of the data storage to improve performance. There are two types of partitioning:

  • Round-robin partitioning
  • Hash partitioning

What is round-robin partitioning?

Round-robin partitioning is a type of partitioning which is done to distribute the data uniformly in all the division and is applied when the number of rows for processing are equal.

What is hash partitioning?

Hash partitioning is a type of partitioning which is done for grouping of the data based on the keys and is used for ensuring that the processed groups are in the same partition. Hush partitioning finds application in the Informatica server.

What is the advantage of using the DataReader Destination Adapter?

The advantage of using DataReader Destination Adapter is that the records and columns in the memory are postulated such that the data from the DataFlow task is available for full consumption.

What is Informatica?

Informatica is a software development company that offers products related to data integration. Products from Informatics are used by ETL, data quality, master data management, data masking, etc.

Name the list of transformations that are available on Informatica.

Following are the list of transformations that are available on Informatica:

  • transformation for rank.
  • transformation for sequence generator.
  • transformation for controlling transactions.
  • transformation for source qualifier
  • transformation for normalizing.

What is filter transformation?

Filter transformation is an active transformation which is used for filtering the records on the basis of filter condition.

What is SSIS?

SSIS stands for SQL Server Integration Service. It is one of the components of the Microsoft SQL Server database which is used for conducting a wide range of data integration. SSIS is used in ETL testing because it is fast and flexible and also the movement of the data from one database to another becomes easy with the help of SSIS.

How to update the table with the help of SSIS?

Following are the ways to update the table with the help of SSIS:

  • By using SQL command.
  • By using a staging table.
  • With the help of cache.
  • By using the script task.

Name the two types of ETL testing that are available.

Following are the two types of ETL testing that are available:

  • Application testing
  • Data eccentric testing

Define dimensions.

Dimensions are the place where the summarized data are stored.

Why do we need ETL testing?

Following are the reasons why we need ETL testing:

  • With the help of ETL testing, one can check for the efficiency and speed of the process.
  • To keep an eye on the trAnswerfer of the data from one system to the other.
  • To get familiar with the ETL process before running the entire business using ETL.

What do you mean by the term “staging area”?

During the process of data integration, the data is stored at a placed temporarily so that the data is cleaned and checked for any duplication. This storage area is known as a staging area.

Define ETL mapping sheets.

ETL mapping sheet is a place where one can find all the information related to the source file which included all the rows and columns. This sheet is very helpful for ETL tool testing.

 

Name a few test cases.

Following is the list of test cases:

  • Issues related to correctness.
  • Data checker
  • Validation on mapping doc

What is the use of mapping doc validation?

With the help of mapping doc validation, one can check if the provided information is available in the mapping doc.

What is the purpose of data check as a test case?

With the help of the data check test case, one can easily get the information related to data check, number check, and null check.

What is the use of the correctness issue test case?

As the name suggests, the correctness issues test case helps in understanding the misspelled data, null data, and inaccurate data.

What is the difference between power mart and power center?

Following is the table explaining the difference between power mart and power center:

Power mart Power center
It is used only for the local storage It is used for local and global storage.
There is a specification for the conversion of local data into global It can be used for the conversion of local data into global data
EPR sources are not supported ERP sources such as SAP is supported
The main purpose of power mart is to process low volume data The main purpose of power center is to process huge amount of data

What is the difference between unconnected and connected lookup?

Following are the difference between unconnected and connected lookup:

Unconnected lookup Connected lookup
The cache used is static The cache used can be either static or dynamic
Only a single output port can be used Multiple output ports can be used
Only a single transformation can be used Multiple transformations can be used

What is the difference between OLAP tools and ETL tools?

Following is the table explaining the difference between OLAP tools and ETL tools:

OLAP tools ETL tools
OLAP tools are used for reporting data from the OLAP database ETL tools are used for the extraction of data from the system and to load them at the specific database
Cognos is an example of the OLAP tool Informatica is an example of the ETL tool

What do understand by the term data purging?

Data purging is defined as the process of deleting junk data from the data warehouse.

What is schema objects?

Schema objects are the logical structures that are used for referring to the database. These objects are tables, indexes, database links, function packages, etc.

What is the purpose of a staging area?

Following are the purposes of staging area:

  • Restructuring of the database for proper data extraction and transformation.
  • Clustering data and transformation of values.
  • Used for the replacement of key assignments.

Explain the following terms:

  • Mapplet: This is used for arranging a set of transformations.
  • Worklet: This is used for representing a specific set of tasks.
  • Workflow: This is used as a set of instructions for the server to execute the tasks.
  • Session: This is used as a set of definitions that are used for the commanding of a server while moving data from the target source.

What is the data source view?

  • Data source view is used for defining the relational schema that is used for the analysis of the service database.

Explain the steps for the extraction of SAP data using Informatica.

Following are the steps for the extraction of SAP data using Informatica:

  • SAP data is extracted using Informatica by using the option called power connect.
  • By installing and configuring the Power control tool.

What is the use of dynamic cache and static cache in connected and unconnected transformation?

Static cache is used for flat files while the dynamic cache is used for updating the master table by slowly changing the dimensions.

 

So, this brings us to the end of the ETL Testing Interview Questions blog. This Tecklearn Top ETL Testing Interview Questions and Answers’ helps you with commonly asked questions if you are looking out for a job in ETL Testing or Data Warehousing Domain. If you wish to learn ETL Testing and build a career in Data Warehousing domain, then check out our interactive, ETL Testing Training, that comes with 24*7 support to guide you throughout your learning period.

https://www.tecklearn.com/course/etl-testing-training/

ETL Testing Training

About the Course

Today’s businesses have to work with data in multiple formats extracted from multiple sources. All this makes ETL Testing all the more important. Tecklearn’s ETL Testing training offers an in-depth understanding of Data warehousing and business intelligence concepts through real-world examples. You will also gain the essential knowledge of ETL testing, Performance Tuning, cubes, etc., through hands-on projects, and this will help you to become a successful ETL Testing expert.

Why Should you take ETL Testing Training?

  • An ETL Developer can earn $100,000 per year – indeed.com
  • Global Big Data Analytics Market to reach $40.6 Billion in 4 years.
  • Most companies estimate that they’re analysing a mere 12% of the available data. – Forrester Research

What you will Learn in this Course?

Introduction to ETL testing

  • Introduction to ETL testing
  • Life cycle of ETL Testing
  • Database concepts and ETL in Business Intelligence
  • Understanding the difference between OLTP and OLAP and data warehousing

Database Testing and Data Warehousing Testing

  • Introduction to Relational Database Management Systems (RDBMS)
  • Concepts of Relational database
  • Data warehousing testing Versus database
  • Integrity constraints
  • Test data warehousing testing
  • Hands On

ETL Testing Scenarios

  • Data warehouse workflow
  • ETL Testing scenarios and ETL Mapping
  • Data Warehouse Testing
  • Data Mismatch and Data Loss Testing
  • Creation of Data warehouse workflow
  • Create ETL Mapping
  • Hands On

Various Testing Scenarios

  • Introduction to various testing scenarios
  • Structure validation and constraint validation
  • Data correctness, completeness, quality and Data validation
  • Negative testing
  • Hands On

Data Checks using SQL

  • Using SQL for checking data
  • Understanding database structure
  • Working with SQL Scripts
  • Hands On

Reports & Cube testing

  • Reports and Cube Testing
  • Scope of Business Intelligence Testing
  • Hands On

Got a question for us? Please mention it in the comments section and we will get back to you.

 

 

 

 

 

0 responses on "Top ETL Testing Interview Questions and Answers"

Leave a Message

Your email address will not be published. Required fields are marked *