Presentation is loading. Please wait.

Presentation is loading. Please wait.

Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013.

Similar presentations


Presentation on theme: "Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013."— Presentation transcript:

1 Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013

2 Agenda  Definition of “Large Dataset”  Database and table structures for large data sets.  Loading data using T-SQL  Loading data using SSIS  Q & A

3 A Large Dataset….  A Collection of data sets: large complex difficult to process ~ Wikipedia Large data set to you is depend on your hardware configuration.

4 A Large table…. A large table is one that does not perform as desired or one where the maintenance costs have gone beyond pre-defined maintenance periods. if one user’s activities significantly affect another or if maintenance operations affect other user’s abilities. In effect, this even limits availability. ~Microsoft White Paper ( Partitioned Tables and Indexes )

5 Database and table structures  Try to have multiple file group across multiple disks. Can backup only the file group. To get Performance Data on a separate I/O path. Index on a separate I/O path  Partition the tables and Indexes Partition means Break large table into parts. Easy to insert-Switch partition in to table Easy to delete- Move partition from table Can rebuild index only of the partition

6 Database and table structures

7  Have Clustered index for the most using column.  Have non Clustered indexes for other most using columns.  Try to have correct data types. Int, smallint makes a different when no of rows are high.

8 Loading Data with T-SQL  TSQL Commands and Utilities  BCP  BULK INSERT  OPENROWSET

9 Loading Data with T-SQL  BCP  bulk copy is an utility program: bcp.exe  copies data between Microsoft SQL Server and a data file in a user-specified format  Can generate format file for data.  performance is improved if the data being imported is sorted according to the clustered index on the table  IN – Insert to table, Out- Export to file  Syntax: bcp AdventureWorks2012.sales.CurrencyRate out F:\DemoData\Currency.dat –c –usa – SSLLAPTOP266\SQL2012

10 Loading Data with T-SQL  Bulk Insert  Transact-SQL statement. faster than the BCP utility  Unlike BCP, Can’t export data to files. Only insert  you can specify up to 1024 fields only, If more than that use BCP.  Syntax : BULK INSERT AdventureWorks2012.Sales.SalesOrderDetail FROM 'f:\orders\lineitem.tbl’

11 Loading Data with T-SQL  OPENROWSET  Alternative to accessing tables in a linked server and is a one-time  SELECT a.* FROM OPENROWSET('SQLNCLI', 'Server=Seattle1;Trusted_Connection=yes;','SELECT Name FROM Department AS a;)  Use BULK keyword for use OPENROWSET for bulk loading  SELECT a.* FROM OPENROWSET( BULK 'c:\testvalues.txt',FORMATFILE = 'c:\test\values.fmt') AS a;  IGNORE_CONSTRAINTS, IGNORE_TRIGGERS

12 Loading Data with T-SQL  Considerations…. Disable Index Alter index [IXYourIndex] ON YourTable DISABLE Disable constrain Do not disable clustered index. Table become read- only. If you disable it have to rebuild index. Enable Index ALTER INDEX [IXYourIndex] ON YourTable REBUILD

13 Deleting Data with T-SQL  Truncate instead of delete.(if possible)  (if not) Move the Partition to new table, then truncate it.-New table should be in same file group Delete batch by batch avoid growing log file delete is a single auto-commit transaction Disable Triggers Simple recovery mode to minimize growing log

14 Loading data using SSIS Buffer size Can increase buffer size of dataflow task Default size=10MB, can set up to=100MB Cachelookup If same table is looked up more than once inside SSIS package, cache that lookup Vender specific providers will give you more performance. Ex: Oracle provider instead of Microsoft Oledb

15 Loading data using SSIS  Use raw file instead on temp table when using staging table  Avoid using Slowly changing dimension control. Use merge statement instead of SCD control  Use parallel flow inside dataflow task to multi threading  Minimize usage of blocking,Partially blocking transformations Ex: Aggregate, Sort, Merge, Merge Join, and Union All

16 Demo  Demo # 1  Delete data from partitioned table.  Demo # 2  Import/Export data using BCP

17 Q & A


Download ppt "Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013."

Similar presentations


Ads by Google