Concept of Data Warehouse and Dimension Modelling

Last updated on Sep 27 2021
Ketan Gandhi Gandhi

Table of Contents

Concept of Data Warehouse and Dimension Modelling

In today’s competitive market, most successful companies respond quickly to market changes and opportunities. The requirement to respond quickly is by effective and efficient use of data and information. “Data Warehouse” is a central repository of data that is organized by category to support the organization’s decision makers. Once data is stored in a data warehouse, it can be accessed for analysis.

The term “Data Warehouse” was first invented by Bill Inmon in 1990. According to him, “Data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision-making process.”

Ralph Kimball provided a definition of data warehouse based on its functionality. He said, “Data warehouse is a copy of transaction data specifically structured for query and analysis.”

Data Warehouse (DW or DWH) is a system used for analysis of data and reporting purposes. They are repositories that saves data from one or more heterogeneous data sources. They store both current and historical data and are used for creating analytical reports. DW can be used to create interactive dashboards for the senior management.

For example, analytic reports can contain data for quarterly comparisons or for annual comparison of sales report for a company.

Data in DW comes from multiple operational systems like sales, human resource, marketing, warehouse management, etc. It contains historical data from different transaction systems but it can also include data from other sources. DW is used to separate data processing and analysis workload from transaction workload and enables to consolidate the data from several data sources.

The Need for Data Warehouse

For example − You have a home loan agency, where data comes from multiple SAP/non-SAP applications such as marketing, sales, ERP, HRM, etc. This data is extracted, transformed and loaded into DW. If you have to do quarterly/annual sales comparison of a product, you cannot use an operational database as this will hang the transaction system. This is where the need for using DW arises.

Characteristics of a Data Warehouse

Some of the key characteristics of DW are −

  • It is used for reporting and data analysis.
  • It provides a central repository with data integrated from one or more sources.
  • It stores current and historical data.

Data Warehouse vs. Transactional System

Following are few differences between Data Warehouse and Operational Database (Transaction System) −

  • Transactional system is designed for known workloads and transactions like updating a user record, searching a record, etc. However, DW transactions are more complex and present a general form of data.
  • Transactional system contains the current data of an organization whereas DW normally contains historical data.
  • Transactional system supports parallel processing of multiple transactions. Concurrency control and recovery mechanisms are required to maintain consistency of the database.
  • Operational database query allows to read and modify operations (delete and update), while an OLAP query needs only read-only access of stored data (select statement).
  • DW involves data cleaning, data integration, and data consolidations.

DW has a three-layer architecture − Data Source Layer, Integration Layer, and Presentation Layer. The following diagram shows the common architecture of a Data Warehouse system.

image1 1

Types of Data Warehouse System

Following are the types of DW system −

  • Data Mart
  • Online Analytical Processing (OLAP)
  • Online Transaction Processing (OLTP)
  • Predictive Analysis

Data Mart

Data Mart is the simplest form of DW and it normally focuses on a single functional area, such as sales, finance or marketing. Hence, data mart usually gets data only from few data sources.

Sources could be an internal transaction system, a central data warehouse, or an external data source application. De-normalization is the norm for data modeling techniques in this system.

image2 1

Online Analytical Processing (OLAP)

An OLAP system contains less number of transactions but involves complex calculations like use of Aggregations − Sum, Count, Average, etc.

What is Aggregation?

We save tables with aggregated data like yearly (1 row), quarterly (4 rows), monthly (12 rows) and now we want to compare data, like Yearly only 1 row will be processed. However, in an un-aggregated data, all the rows will be processed.

OLAP system normally stores data in multidimensional schemas like Star Schema, Galaxy schemas (with Fact and Dimensional tables are joined in logical manner).

In an OLAP system, response time to execute a query is an effectiveness measure. OLAP applications are widely used by Data Mining techniques to get data from OLAP systems. OLAP databases store aggregated historical data in multi-dimensional schemas. OLAP systems have data latency of a few hours as compared to Data Marts where latency is normally closer to few days.

Online Transaction Processing (OLTP)

An OLTP system is known for large number of short online transactions like insert, update, delete, etc. OLTP systems provide fast query processing and also responsible to provide data integrity in multi-access environment.

For an OLTP systems, effectiveness is measured by the number of transactions processed per second. OLTP systems normally contain only current data. The schema used to store transactional databases is the entity model. Normalization is used for data modeling techniques in OLTP system.

OLTP vs OLAP

The following illustration shows the key differences between an OLTP and OLAP system.

image3 1

Indexes − In an OLTP system, there are only few indexes while in an OLAP system there are many indexes for performance optimization.

Joins − In an OLTP system, large number of joins and data is normalized; however, in an OLAP system there are less joins and de-normalized.

Aggregation − In an OLTP system, data is not aggregated while in an OLAP database more aggregations are used.

OBIEE – Dimensional Modeling

Dimensional modeling provides set of methods and concepts that are used in DW design. According to DW consultant, Ralph Kimball, dimensional modeling is a design technique for databases intended to support end-user queries in a data warehouse. It is oriented around understandability and performance. According to him, although transaction-oriented ER is very useful for the transaction capture, it should be avoided for end-user delivery.

Dimensional modeling always uses facts and dimension tables. Facts are numerical values which can be aggregated and analyzed on the fact values. Dimensions define hierarchies and description on fact values.

Dimension Table

Dimension table stores the attributes that describe objects in a Fact table. A Dimension table has a primary key that uniquely identifies each dimension row. This key is used to associate the Dimension table to a Fact table.

Dimension tables are normally de-normalized as they are not created to execute transactions and only used to analyze data in detail.

Example

In the following dimension table, the customer dimension normally includes the name of customers, address, customer id, gender, income group, education levels, etc.

Customer ID Name Gender Income Education Religion
1 Brian Edge M 2 3 4
2 Fred Smith M 3 5 1
3 Sally Jones F 1 7 3

Fact Tables

Fact table contains numeric values that are known as measurements. A Fact table has two types of columns − facts and foreign key to dimension tables.

Measures in Fact table are of three types −

  • Additive − Measures that can be added across any dimension.
  • Non-Additive − Measures that cannot be added across any dimension.
  • Semi-Additive − Measures that can be added across some dimensions.

Example

Time ID Product ID Customer ID Unit Sold
4 17 2 1
8 21 3 2
8 4 1 1

This fact tables contains foreign keys for time dimension, product dimension, customer dimension and measurement value unit sold.

Suppose a company sells products to customers. Every sale is a fact that happens within the company, and the fact table is used to record these facts.

Common facts are − number of unit sold, margin, sales revenue, etc. The dimension table list factors like customer, time, product, etc. by which we want to analyze the data.

Now if we consider the above Fact table and Customer dimension then there will also be a Product and time dimension. Given this fact table and these three dimension tables, we can ask questions like: How many watches were sold to male customers in 2010?

Difference between Dimension and Fact Table

The functional difference between dimension tables and fact tables is that fact tables hold the data we want to analyze and dimension tables hold the information required to allow us to query it.

Aggregate Table

Aggregate table contains aggregated data which can be calculated by using different aggregate functions.

An aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning or measurement.

Common aggregate functions include −

  • Average()
  • Count()
  • Maximum()
  • Median()
  • Minimum()
  • Mode()
  • Sum()

These aggregate tables are used for performance optimization to run complex queries in a data warehouse.

Example

You save tables with aggregated data like yearly (1 row), quarterly (4 rows), monthly (12 rows) and now you have to do comparison of data, like Yearly only 1 row will be processed. However in an un-aggregated table, all the rows will be processed.

MIN Returns the smallest value in a given column
MAX Returns the largest value in a given column
SUM Returns the sum of the numeric values in a given column
AVG Returns the average value of a given column
COUNT Returns the total number of values in a given column
COUNT (*) Returns the number of rows in a table

Select Avg (salary) from employee where title = ‘developer’. This statement will return the average salary for all employees whose title is equal to ‘Developer’.

Aggregations can be applied at database level. You can create aggregates and save them in aggregate tables in the database or you can apply aggregate on the fly at the report level.

Note − If you save aggregates at the database level it saves time and provides performance optimization.

So, this brings us to the end of blog. This Tecklearn ‘Concept of Data Warehouse and Dimension Modelling’ blog helps you with commonly asked questions if you are looking out for a job in Oracle BI. If you wish to learn OBIEE and build a career in Business Intelligence domain, then check out our interactive, Oracle Business Intelligence Enterprise Edition (OBIEE) Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/oracle-business-intelligence-enterprise-edition-training-and-certification/

Oracle Business Intelligence Enterprise Edition (OBIEE) Training

About the Course

Oracle Business Intelligence Enterprise Edition (OBIEE) Training and Certification training course lets you master the Oracle Business Intelligence and Analytics platform. You will gain in-depth knowledge on OBIEE advanced features, BI dashboards, reports, etc., through hands-on training sessions. This course is meant to provide detailed knowledge of the OBIEE 12c. You will be trained in various Business Intelligence scenarios, BI life cycle, concepts of OLAP and working with charts and graphs.

Why Should you take OBIEE Training?

  • The average salary for “oracle business intelligence consultant” ranges from approximately $76,703 per year for Business Consultant to $113,285 per year for Full Stack Developer. -Indeed.com.
  • Oracle Business Intelligence has a market share of around 8% globally.
  • OBIEE gives you experience with next-generation solutions that can help strategize, plan and optimize business operations. Allow you to become an Oracle Certified Expert or Oracle Implementation Specialist of an in-demand solution.

What you will Learn in this Course?

Data Modelling and Data Warehousing Concepts

  • Introduction to Oracle Business Intelligence Enterprise Edition (OBIEE)
  • Data models
  • Data warehousing
  • Data modelling implications and the impact of Data modelling on business intelligence

Business Intelligence Concepts

  • Overview of Business Intelligence
  • Description of BI Stack: BI technology, BI Server and BI Scheduler
  • Need for reporting in business
  • Difference between OLTP and OLAP
  • Multidimensional and relational analytical processing

OBIEE Installation

  • OBIEE Installation (12c)
  • Oracle Business Intelligence Suite
  • Architecture of OBIEE
  • Key features and components

Online Analytical Processing (OLAP)

  • Concept of Online Analytical Processing
  • Significance of OLAP in business intelligence life cycle
  • Star Schema and Designing with Star Schema
  • Enterprise information model
  • Snow flake and constellation

Oracle BI Repository

  • What is Oracle Business Intelligence Repository?
  • Directory structure installation, services,
  • Analytics and interactive reporting,
  • Dashboard creation and multiple report creation

Business Intelligence Repository Business Model

  • How to build a Business Model and Mapping Layer in BI Repository
  • Data format, Conditional format, Removing filters,
  • Report Saving
  • Creation of new folder
  • Working with the Enterprise Manager
  • Testing and validation of the Repository
  • Understanding the process of Cache disabling
  • Dashboard prompt and filtering

Dashboard Creation

  • Creation and Setup of OBIEE Dashboard
  • Basics of OBIEEE dashboard
  • Deploying Dashboard Builder for building Dashboards
  • Edit, Share and Save options in Dashboard analysis
  • Process of Cache creation and clearing
  • ODBC functions
  • Logical Table Source, Summary and Detail Report

OBIEE Repository

  • OBIEE Repository
  • Creation of Test Report and adding calculations
  • Deployment of OBIEE analysis
  • Repository variables, session and presentation variables

OBIEE Security & Management

  • Securing the Oracle Business Intelligence Suite with Enterprise Manager
  • Creation of alerts and notifications
  • Administration, Maintenance and Grouping
  • Types of security in OBIEE
  • Task and folder level security

Differences between OBIEE 11g and 12c

  • Hands on Variables

New Features of OBIEE 12c

  • Repository Development
  • Catalog Development
  • Deployment-Online
  • RPD Deployment

0 responses on "Concept of Data Warehouse and Dimension Modelling"

Leave a Message

Your email address will not be published. Required fields are marked *