Data Warehousing & ETL
Organizations make decisions and service customers based on the data at their disposal. A data warehouse is often used to examine business trends to establish a strategy for the future. Each year, a large number of business intelligence and data warehousing initiatives fail because of erroneous or incomplete data.
PII’s in-depth experience working with data and turning it into information can help your organization succeed at Data Warehousing. We’ve had numerous successful data warehousing projects and welcome the opportunity to help turn your data into information. PII understands the need to move massive amounts of data on a daily basis. Informatica delivers data integration software and services to solve the problem of data manipulation across disparate systems, helping organizations gain greater business value from all their information assets. Informatica’s platform-neutral software reduces costs, speeds time to results, and scales to handle data integration projects of any size or complexity. With a proven track record of success, Informatica helps companies and government organizations of all sizes realize the full business potential of their enterprise data. That’s why Informatica is known as the data integration company.
ETL, it’s an acronym commonly used and seldom understood. The process of Extraction, Transformation and Loading of a data warehouse is serious business. Without clear cut rules and processes you will end up with a Data Warehouse that contains incorrect data. PII prides itself on helping clients develop a clear definition of standards and processes for leveraging the Informatica Suite of products. With project plans, test cases and thorough Q & A processes your organization can be assured that their data will be safe, secure and accurate.
Extract
The first part of an ETL process is to extract the data from the source systems. Most data warehousing projects consolidate data from different source systems. Each separate system may also use a different data organization / format. Common data source formats are relational databases and flat files, but may include non-relational database structures such as IMS or other data structures such as VSAM or ISAM. Extraction converts the data into a format for transformation processing.
Transform
The transform stage applies a series of rules or functions to the extracted data to derive the data to be loaded. Some data sources will require very little manipulation of data. In other cases, one or more of the following transformations types may be required:
Selecting only certain columns to load (or selecting null columns not to load)
- Translating coded values (e.g., if the source system stores 1 for male and 2 for female, but the warehouse stores M for male and F for female) ,this is termed as Data Cleansing
- Encoding free-form values (e.g., mapping “Male” and “1″ and “Mr” onto M)
- Deriving a new calculated value (e.g., sale_amount = qty * unit_price)
- Joining together data from multiple sources (e.g., lookup, merge, etc.)
- Summarizing multiple rows of data (e.g., total sales for each store, and for each region)
- Generating surrogate key values
- Transposing or pivoting (turning multiple columns into multiple rows or vice versa)
- Splitting a column into multiple columns (e.g., putting a comma-separated list specified as a string in one column as individual values in different columns)
Load
The load phase loads the data into the data warehouse (DW). Depending on the requirements of the organization, this process ranges widely. Some data warehouses might weekly overwrite existing information with cumulative, updated data, while other DW (or even other parts of the same DW) might add new data hourly. The timing and scope to replace or append are strategic design choices dependent on the time available and the business needs. More complex systems can maintain a history and audit trail of all changes to the data.
