Saturday, January 1, 2011

Data warehousing Concepts


Operational vs. informational data:

Operational data is the data you use to run your business. This data is what is typically stored, retrieved, and updated by your Online Transactional Processing (OLTP) system. An OLTP system may be, for example, a reservations system, an accounting application, or an order entry application.

Informational data is created from the wealth of operational data that exists in your business and some external data useful to analyze your business. Informational data is what makes up a data warehouse. Informational data is typically:
  • Summarized operational data
  • De-normalized and replicated data
  • Infrequently updated from the operational systems
  • Optimized for decision support applications
  • Possibly "read only" (no updates allowed)
  • Stored on separate systems to lessen impact on operational systems  
  •  A data mart is a scaled down deployment of a data warehouse that contains data focusing on a departmental user’s analytical requirements. For example, the Ohio-based Huntington Bank Corporation set up a data mart for its general ledger system, to get the ledger system's functional information to the bank's financial analysts and budget coordinators quickly.  
  •  Data mining is the process of examining data for trends and patterns that might have evaded human analysis. For example, Shoko’s Sunday circulars contained coupons advertising health and beauty aids, consumables, and household chemicals, which were are all located on the left-hand side of the stores. Shoko’s data mining exercise revealed that people who were coming in to shop gravitated to the left-hand side of the store for the promotional items and were not necessarily shopping the whole store. Consequently, it added apparel promotions to the Sunday circulars.
  • An on-line Analytical Processing (OLAP) application is intended to provide end-users an ability to perform any business logic and statistical analysis that is relevant. This analysis must happen fast, i.e., it must deliver most responses to users within about five seconds, with the simplest analyses taking no more than one second and very few taking more than 20 seconds.
    Multidimensional databases are non-relational DBMS products that are specialized for use for the kinds of queries in data warehouses. This is in contrast to using specialized analysis tools that run on top of a traditional RDBMS.


No comments:

Post a Comment