Wednesday, March 23, 2016

Delta Load Implementation

As ETL tool such as Data Services and SSIS evolve, we can see there are request to extract SAP ECC tables directly out from the source (without going through SAP BW). There are a lot of considerations when we land on that approach mainly around the corporate datawarehouse roadmap and data policy, the standard EAI layer and the delta load mechanism.

When this approach is adopted, an organization should evaluate their existing datawarehouse such as SAP BW and the implication of single source of truth when data is extracted out into another system. Is the staging system supposedly to be a global datawarehouse in long run, is it planned to be the regional datawarehouse layer or just to cater for silo project as a 'dump' or staging area to transform some data to meet silo business unit reporting needs?

On the technical side, I always put emphasize on the delta mechanism as there is quite an in depth configuration at the ECC end. The common delta method may be timestamp and there are certain tables that store the timestamp records when transactional records were added. There are also log table to detect changes or deletion in records. And during data load there is additive or overwrite records functions. Another interesting area to take into account is the initial load. Some huge extraction from multiple tables with complex join require setup tables to populate huge historical dataset first before the delta take place. This is a pointer noted for performance consideration. 

I came across design using Data Services to extract data directly from huge ECC tables that prior to loading read from an ABAP program which identify the delta mechanism or logic such as reversal of posting etc. Any approach that the organization takes for data extraction, it is good to cover all the area during IT strategy or solution stage from BI roadmap to the technical and security assessment of the solution.

P/s this is a good read on delta methods