Home bi & dw basics OLAP interview questions & answers

OLAP interview questions & answers

What is the difference between OLAP and DSS?


Answer-1:
Data-driven DSS is a type of DSS that emphasizes access to and manipulation of a time-series of internal company data and sometimes external data. Simple file systems accessed by query and retrieval tools provide the most elementary level of functionality.

Data warehouse systems that allow the manipulation of data by computerized tools tailored to a specific task and setting or by more general tools and operators provide additional functionality. Data-driven DSS with On-line Analytical Processing (OLAP) provides the highest level of functionality and decision support that is linked to analysis of large collections of historical data. Executive Information Systems (EIS) and Geographic Information Systems (GIS) are special purpose Data-Driven DSS.

 

Compare OLAP database and OLTP database.

The data warehouse and the OLTP data base are both relational databases. However, the objectives of both these databases are different.

The OLTP database records transactions in real time and aims to automate clerical data entry processes of a business entity. Addition, modification and deletion of data in the OLTP database is essential and the semantics of the application used in the front end impact on the organization of the data in the database.

The data warehouse on the other hand does not cater to real time operational requirements of the enterprise. It is more a storehouse of current and historical data and may also contain data extracted from external data sources.

Differences
Data warehouse database OLTP database
Designed for analysis of business measures by categories and attributes Designed for real time business operations.
Optimized for bulk loads and large, complex, unpredictable queries that access many rows per table. Optimized for a common set of transactions, usually adding or retrieving a single row at a time per table.
Loaded with consistent, valid data; requires no real time validation Optimized for validation of incoming data during transactions; uses validation data tables.
Supports few concurrent users relative to OLTP Supports thousands of concurrent users.

However, the data warehouse supports OLTP system by providing a place for the latter to offload data as it accumulates and by providing services which would otherwise degrade the performance of the database.

What is difference between dc and adc?


DC: Domain Controller

ADC: Active Directory Integrated Domain Controller

In DC the zone database is stored in .dns text file where as in ADC the zone database is stored in active directory itself not in .dns file.

What is the difference between ETL tool and OLAP tool? What are various ETL in the market? What are various OLAP tools? what is the future for both for next five years?


Answer-1:
ETL is a extraction,transformation,loading tool i.e u can extract , u can transform using different transformations available in tool and aggreagte the data. The output of this ETL tool is used as input to OLAP tool

OLAP is online analytical process, where u can get online reports after doing some joines,creating some cubes

ETL tools in market

1 INFORMATICA-- univeral tool ,good market

2 ABINITO -- fastest loading tool,very good market

3 DATASTAGE-- difficult work, no good market

4 BODI-- good market

5 ORACLE WAREHOUSE BUILDER-- good market

What is Cube?

Answer-1:
An OLAP cube is a data structure that allows fast analysis of data The arrangement of data into cubes overcomes a limitation of relational databases.

Answer-2:
In database theory, an OLAP cube is an abstract representation of a projection of an RDBMS relation. Given a relation of order N, consider a projection that subtends X, Y, and Z as the key and W as the residual attribute. Characterizing this as a function,

W : (X,Y,Z) → W,

the attributes X, Y, and Z correspond to the axes of the cube, while the W value into which each ( X, Y, Z ) triple maps corresponds to the data element that populates each cell of the cube.

What is fact?

Answer-1:
Facts in data warehousing are numerical values which helps to evaluate the analysis. Sales quantity, number of days, Sales amount etc are all example of facts.

What is dimension?

Answer-1:
Dimensions in data warehousing refer to the attributes by which the facts are analysed. Region, customer, productnames etc are all examples of dimensions.

What is heirarchy?

Answer-1:
Each of the elements of a dimension could be summarized using a hierarchy. The hierarchy is a series of parent-child relationships, typically where a parent member represents the consolidation of the members which are its children. Parent members can be further aggregated as the children of another parent.

For example May 2005 could be summarized into Second Quarter 2005 which in turn would be summarized in the Year 2005. Similarly the cities could be summarized into regions, countries and then global regions etc.

What is star schema?

Answer-1:
The star schema (sometimes referenced as star join schema) is the simplest style of data warehouse schema, consisting of a few "fact tables" (possibly only one, justifying the name) referencing any number of "dimension tables". The "facts" that the data warehouse helps analyze are classified along different "dimensions": the fact tables hold the main data, while the usually smaller dimension tables describe each value of a dimension and can be joined to fact tables as needed.

What is snowflake schema?

Answer-1:
A snowflake schema is a way of arranging tables in a relational database such that the entity relationship diagram resembles a snowflake in shape. At the center of the schema are fact tables which are connected to multiple dimensions. When the dimensions consist of only single tables, you have the simpler star schema. When the dimensions are more elaborate, having multiple levels of tables, and where child tables have multiple parent tables ("forks in the road"), a complex snowflake starts to take shape. Generally, whether a snowflake or a star schema is used only affects the dimensional tables. The fact table is unchanged.

What is slowly chanding dimension?

Answer-1:
Dimension is a term in data management and data warehousing that refers to logical groupings of data such as geographical location, customer information, or product information. Slowly Changing Dimensions (SCD) are dimensions that have data that slowly changes.

Explain different types of SCD?

Answer-1:

There are mainly 3 different types in SCD.
Type-1: The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all. This is most appropriate when correcting certain types of data errors, such as the spelling of a name. (Assuming you won't ever need to know how it used to be misspelled in the past.)

Type-2: The Type 2 method tracks historical data by creating multiple records in the dimensional tables with separate keys. With Type 2, we have unlimited history preservation as a new record is inserted each time a change is made.

Type-3: The Type 3 method track changes using separate columns. Whereas Type 2 had unlimited history preservation, Type 3 has limited history preservation, as it's limited to the number of columns we designate for storing historical data.