Thursday, January 6, 2011

Dataware house Basics

Wednesday, January 5, 2011

Dataware housing Glossary

DataWare housing Glossary

Data warehousing
Data warehousing is the process of extracting, integrating, aggregating, filtering, summarizing,
standardizing, transforming, cleansing and quality checking the organization applications data
and storing it in a consolidated database. This database will end up being the only source from
which the management will access and retrieve information for decision making.
Data mart
Data marts can be viewed as a more specialized data warehouse. The size of a data mart is much
smaller than that of a data warehouse and therefore the time required to implement a data mart
is less than the time required to implement a data warehouse. The cost of implementing a data
mart is less than that of a data warehouse.
Data extraction
Data extraction is the process of extracting data from the OLTP databases or from external
sources and storing it in a consolidated area. The extracted data will be used for populating the
data warehouse. Some of the data extraction techniques that can be used are snapshot
extraction, time stamp extraction, etc.
Data transformation
The data that is extracted from the OLTP databases and external data sources is raw data. Data
transformation will have to be carried out on the extracted data before it can be stored in the
data warehouse. Some of the data transformation tasks that have to be carried out are: Data
Substitution, Data Standardization, Data Filtering and Data Summarization.
Data standardization
As the data which is going to be stored in the data warehouse file will be retrieved from multiple
OLTP databases it will have to be standardized. An examples of data standardization is the
standardization of the data stored in date fields or the standardization of data that is stored in
flags.
Data filtering
Data filtering is the process of extracting only the required data from the OLTP databases and
external data sources. The filtered data that has been extracted will be used for populating the
data warehouse files. For example the end user may want to store only the last five years sales
data in the data warehouse.
Data cleansing
Data cleansing is the process of cleansing and enhancing the data that is stored in the data
warehouse. Some of the data cleansing activities are the standardization of data, ensuring the
accuracy and quality of the data that is stored in the data warehouse, initializing of fields in the
data warehouse, ensuring the data integrity of the data that is stored in the data warehouse,
etc.
Meta data
Meta data can be thought of as data about data. It forms an important part of any data
warehouse and could also determine the success of a data warehouse. There are basically two
kinds of meta data that can be found in a data warehouse. These two kinds of meta data are
technical meta data and business meta data. Meta data will be useful for those who are going to
administer and use the data warehouse.
Operational data
Operational data is the data that is stored in the organizations OLTP applications. Operational
data will usually be stored in relational databases or in flat files and is also called as real time
data. One of the characteristics of operational data is that it is volatile. Operational data is not
ideal for analysis. Informational data can be derived from the operational data.
Operational database
Operational databases contains an organizations operational data that is generated by their OLTP
systems. The data that is present in these databases is not ideal for carrying out data analysis.
The data in these operational databases will be used for populating the files in a data warehouse.
OLAP
OLAP otherwise know as On Line Analytical Processing is a multi dimensional analysis technique
that end users can use for carrying out analysis on the data that is stored in a data warehouse.
Some the basic concepts that are associated with OLAP are drill down and drill up along a
dimension, slice and dice through the data, drill across, etc.
Non volatile data
No volatile data is data that will not undergo much change. The data that is stored in the data
warehouse is non volatile data. This is one of the characteristics of the data that is stored in a
data warehouse. It is advisable to carry out analysis on non volatile data.
Data scrubbing
The physical transformation and purifying of the operational data that is going to be stored in the
data warehouse is known as data scrubbing. Data scrubbing can be defined as the process of
filtering, merging, standardizing, initializing and translating the operational data in order to create
informational data that can be stored in the data warehouse.
Data dumping
It is important to store the right data in a data warehouse. Data should not be stored in a data
warehouse just because it is available. Storing of unnecessary data in a data warehouse is called
as data dumping. It is up to the data warehouse team to ensure that the data warehouse does
not contain any unnecessary data.
Meta data users
Meta data users are those who access the meta data that is provided along with a datawarehouse. Meta data users can be classified into two categories. They are technical users andbusiness users.
Aggregate data
Aggregate data is the data that has undergone data summarization. By storing aggregate data in
the data warehouse it will be possible to improve the performance times of the end users
queries. The drawback of storing aggregate data is that the time required to populate the files
containing the aggregate data could be high.
Data dictionary
A database about data and database structures. A data dictionary contains details about all data
elements that are present in the databases. It will contain the name of each data element, the
structures of each data element such as the length and type of the data elements and
information about their usage.
Data quality
The quality of the data that is present in the data warehouse is very important. In order to
ensure data quality the data warehouse team should ensure that no junk or corrupted data is
stored in the data warehouse. Also the fields in the data warehouse that do not contain any data
should be initialized. The quality of the data that is stored in the data warehouse could determine
the success of the data warehouse.
Data accuracy
The accuracy of the data that is stored in the data warehouse plays an important role in
determining how successful the data warehouse is going to be. It is up to the data warehouse
team to ensure that the data stored in the data warehouse is accurate.

Saturday, January 1, 2011

Data warehousing ppt

Introduction to Data warehousing

Data warehousing Concepts


Operational vs. informational data:

Operational data is the data you use to run your business. This data is what is typically stored, retrieved, and updated by your Online Transactional Processing (OLTP) system. An OLTP system may be, for example, a reservations system, an accounting application, or an order entry application.

Informational data is created from the wealth of operational data that exists in your business and some external data useful to analyze your business. Informational data is what makes up a data warehouse. Informational data is typically:
  • Summarized operational data
  • De-normalized and replicated data
  • Infrequently updated from the operational systems
  • Optimized for decision support applications
  • Possibly "read only" (no updates allowed)
  • Stored on separate systems to lessen impact on operational systems  
  •  A data mart is a scaled down deployment of a data warehouse that contains data focusing on a departmental user’s analytical requirements. For example, the Ohio-based Huntington Bank Corporation set up a data mart for its general ledger system, to get the ledger system's functional information to the bank's financial analysts and budget coordinators quickly.  
  •  Data mining is the process of examining data for trends and patterns that might have evaded human analysis. For example, Shoko’s Sunday circulars contained coupons advertising health and beauty aids, consumables, and household chemicals, which were are all located on the left-hand side of the stores. Shoko’s data mining exercise revealed that people who were coming in to shop gravitated to the left-hand side of the store for the promotional items and were not necessarily shopping the whole store. Consequently, it added apparel promotions to the Sunday circulars.
  • An on-line Analytical Processing (OLAP) application is intended to provide end-users an ability to perform any business logic and statistical analysis that is relevant. This analysis must happen fast, i.e., it must deliver most responses to users within about five seconds, with the simplest analyses taking no more than one second and very few taking more than 20 seconds.
    Multidimensional databases are non-relational DBMS products that are specialized for use for the kinds of queries in data warehouses. This is in contrast to using specialized analysis tools that run on top of a traditional RDBMS.