SSIS tips and Tricks

Finally a technical article after a reallllly long time............

Tip 1: Ensure that when you fetch data from an SQL server data source always use the "(NOLOCK)" option in the sql statement
example: select studentid,name from student (NOLOCK)
This ensures that a deadlock does not take place (best option to use in a multi user environment :) ).

Tip 2: The maximum packet size that can be transferred across a network is approximately 4 MB. Here we could improve the performance of our data fetch from the source and data migration to destination by modifying the connection property of your SSIS package. Right click on the the connection and click properties, you will find a property called packet size: Make sure this packet size is about 32767 to get the maximum benefit across a network.

Tip 3: When you have multiple tasks running in parallel in a package(The maximum an SSIS package can actually subdivide these tasks or SSIS can spawn a maximum of 8 processes), so it would make it feasible that you do not execute more than 8 tasks in parallel as it would not benefit the package. Also when you are porting data from multiple sources where constraints (parent child hierarchy among tables) are not a priority, then it is beneficial to load all the tables with the low volumes of data to the tables with high volumes of data.

Tip 4: In a multiprocessor environment with extremely high memory, when you have a single source and multiple destinations to be populated, a good practice is to divide the job in such a way that there is a 1:1 relation ship between the source and destination rather than 1:N relationship between the source and the destinations:
Dont Do-->

Please Do-->

Tip 8: An important note, whenever bulk operations are performed on a database hosted on a server with limited memory and disk space, it is extremely important to keep track of the size of the tempdb. Make sure you set the recovery model to simple for the tempdb (by default it should be simple).
Also keep track of the size using the following set of SQL's-
use tempdb
sp_spaceused

Tip 9: This is extremely important for SSIS testers and that is load testing. Use the execute SQL task for checking the maximum amount of rows/second. First create a simple SSIS package with only the execute SQL task as shown in the following diagram:

Now let us say that we are executing the following query:
select * from geography
We get 300 rows as the number of records fetched.
The time taken to execute the job is 5 seconds.
Then the total number of rows/second will be in the 300/5 = 60 rows/second.
Note: The lower the result is from the above tip, the better our SSIS package is performing.

Tip 10: Merge query vs Bulk Insert – This is a highly argumentative subject and I will touch only the surface of this. When the dataset is massive, a truncate and bulk insert would be much faster when compared to a Merge query (also note that the merge query is dependent on constraints), hence slowing the data insertion. But if the dataset is minimum, a merge query should be beneficial.

Tip 11: I am just going to suggest the readers to understand the CDC command that comes with SQL server 2008 and above versions. This is extremely beneficial in understanding the data transported to a destination table and also as to when the data was transferred. Cool to understand.

Tip 11: Last and final one for now.... Use the logging feature only when required. There is a performance implication associated with logging.

Comments

Ishwar said…
Also another important aspect when it comes to improving your SSIS packages is to figure out whether your transformation is a row (non blocking), semi blocking or a blocking transformations.
Non Blocking Transformations: These are the transformations where the buffers are reused while performing any transaction. Also the UOW is at the lowest level and that is at the row level.

Blocking & Semi Blocking Transformations: These are asynchronous transformations where the existing buffers cannot be utilized but a new one has to be created in the process like an Aggregate task or a union all task.

Popular posts from this blog

Rhino - ETL

Microsoft acquires LinkedIn

Redshift Experience