In the overall data warehousing value chain on Microsoft BI stack using SSIS, the data extraction layer poses one of the biggest bottlenecks on performance. As part of the data extraction there is a need to sync up with multiple heterogeneous source such as DB, Flat file, XML files etc. Such processes are additionally subjected to stringent SLA’s for batch processing to load into the staging area. To minimize adverse impact in production thus ensuring required throughput, performance tuning must be carefully setup and executed within SSIS.
In our implementations on end to end Data warehouse using Microsoft BI stack with SSIS as the ETL there are clear set of rules defined for performance tuning. The packages once designed and developed as per the functional requirements typically gets validated by the testing team with the sample data set (which is 5-10% of the data volume of production). But production performance yields major differences thus invariably breaching SLA’s.
We recommend the following steps to avoid performance bottlenecks in a planned implementation.
STEP1: Identification and Troubleshooting performance issues
- Ascertain whether the root cause is related to the environment or infrastructure availability
- Take mitigation action and inform the relevant owners responsible for troubleshooting
- If the results from these checks are negative we can move into the next phase
- Isolate the execution of various parts of the package to find the operations that should be optimized first and create a copy of the original package
- Define the overall execution speed as the total of source speed (time to read the data from source adapters) + transformation speed (time that data rows need to pass through all transformations), +destination speed (time to write the data to the destination adapters)
- Establish the overall package execution speed by executing the package using DTEXEC
- Isolate the source and transformation speeds to find out how fast SSIS is reading from a source and performing transformations. Remove the destination associated with the source.
- For one or more destination, isolate one at a time.
- Replace the destination with a Row Count transformation and measure the execution performance from source through transformations to Row Count
- Remove the transformations and measure the source speed of your data flow.
- Calculate the destination speed and the transformation speed from the previous two results.
- Isolate individual transformations to identify the performance of a specific transformation for further troubleshooting.
After identifying the potential root cause from above approach, now performance tuning steps should be followed.
STEP 2: Performance tuning steps in ETL Package
- Avoid design of multiple control flow tasks in serial manner
- Setting up of accurate precedence levels
- Identify the dependent task like for e.g. Customer related table loads, Product related table loads etc.
- Group the dependent tasks in separate sequence container
- The ETL package will now have multiple sequence container which are now independent and can be executed in parallel
- Strive for a combination of both serial and parallel processes which will improve the overall package execution time.
- Avoid using “select * from “source tables and bring all the fields which may not be relevant and fetching all unwanted data
- Identify only fields relevant for mapping and transformation, expose them to the workflow down the ETL package execution to have an efficient result
- Select the right data types, try to be as narrow as possible to have optimum memory allocation
- Avoid binary large object (BLOB) types because these get swapped to disk (this includes the VARCHAR(MAX) columns).
This is a very common known fact in the database world that Bulk operations work faster than any row by row operation. For SSIS:
- Use “Bulk Insert task” or “Table or View Fast load option” in oledb destination
- Minimize the number of entries that are added to log file
SSIS transformation can be broadly classified as given below. Due diligence in selection of the right criteria should be done.
- Non-blocking transformations - each row is passed through the transformation without any waits. E.g. – Copy Column, Data Conversion etc
- Partial-blocking transformation - waits until a sufficient number of rows is stored and then it passes them through E.g. – Merge, Merge Join, Union All etc
- Blocking transformation - all rows must be read into the transformation before the transformation can pass them through consuming more memory and the slowest E.g. – Sort, Fuzzy Grouping, Fuzzy look up etc
- Same functionalities can be achieved through many combination of the non-blocking transformations
- Time consuming transformations like Sorting, Merge etc if possible, should be moved to the database side using the ORDER BY clause
At execution time, before the data is read from the sources, SSIS automatically tunes buffer sizes to achieve maximum memory utilization based on a series of input parameters. These needs to optimized, based on the specific work scenario and will need multiple trials to achieve the desired results:
- Estimated Row Size ,DefaultMaxBufferRows , DefaultMaxBufferSize , MinBufferSize (not configurable)
- Configure the DefaultMaxBufferRows and DefaultMaxBufferSize properties, depending on configured values, SSIS will tune buffer sizes at execution time by using one of the following scenarios:
- When Estimated Row Size * DefaultMaxBufferRows exceeds MaxBufferSize, SSIS reduces the number of rows that will be stored in a buffer to manage the memory footprint.
- When Estimated Row Size * DefaultMaxBufferRows is less than MinBufferSize, SSIS increases the number of rows that will be stored in a buffer to maximize memory utilization.
- When Estimated Row Size * DefaultMaxBufferRows is between MinBufferSize and DefaultMaxBuffer Size, then SSIS attempts to size the buffer as closely as possible to the result of Estimated Row Size * DefaultMaxBufferRows by using a multiple of the MinBufferSize to increase memory utilization.
While the extract and load phases of the pipeline will touch disk (read and write respectively), the transformation itself should process in memory. If transformations spill to disk (for example with large sort operations), degradation of performance happens.
- Construct the packages to partition and filter data so that all transformations fit in memory
- Try to always use full-cache lookups
- To run multiple lookups against the same query, use the Cache connection manager to reuse the cache.
- When you need a solution for range lookups, try to use the Script transformation instead
- Pushing data into an indexed table will take much more time as the database engine will end up having to build the index alongside the updates to the table, which causes a considerable delay.
- Depending on the insert pattern, if data is inserted into an indexed table, fragmented index might result. This will degrade the access speed, which would be slower to access and deal with than a non-fragmented index; hence it is highly recommended to follow the index drop/recreate procedure below:
- Drop all indexes on destination table
- Push data (perform ETL operations)
- Create all indexes on destination table