Karuta’s ASP & M$ SQLserver

Dicas, códigos e outras coisinhas de meus estudos para não esquecer…

Archive for fevereiro \07\UTC 2017

.NET – FASTEST WAY TO LOAD TEXT FILE TO SQL – SQLBULKCOPY

Posted by karuta em fevereiro 7, 2017

An exercise in coding. Loading a text file to SQL. There are a billion different ways to do it, and depending on your source data format and such, you can parse the file a million ways too. But how fast can you get data from disk to SQL using .NET code? (VB.NET or C#) . This post is going to show the differences in some of the ways you can load data from disk to SQL.

I am sure I could do more, but this is a good sampling. Lets assume a 1,00,000 row file, comma separated with 3 columns, string, int, string of variable length. Lets assume our destination is SQL Server 2005, table already created, no keys or anything on the table.

We will call our table LoadedData. Our test app will be a VB.NET Console Application, running on the same box as SQL 2005 is loaded. Now, there are many ways to load files. A few are: Reading them line by line, ReadToEnd() and also using the JET engine to read in a CSV, etc. From the testing I have been doing, all of these seem to work fairly fast, maybe a comparison on these is for another blog post, but for brevity’s sake, lets just say they are all comparable. Now, I chose 3 methods of inserting data.

1) StreamReader.ReadLine, Insert Line By Line

Sub Method1()
Dim i As Long = 0

Dim sr As StreamReader = New StreamReader(filename)
Dim line As String = sr.ReadLine()
Dim dbConn As SqlConnection = New SqlConnection(ConfigurationManager.ConnectionStrings(“MyDB”).ToString())
Dim dbCmd As SqlCommand = New SqlCommand()
dbCmd.Connection = dbConn

Dim wholeFile As String = sr.ReadToEnd()

Do
Dim fields() As String = line.Split(“,”)

dbCmd.CommandText = “INSERT INTO dbo.TestData (Column1,Column2,Column3) ” & _
” VALUES (‘” & fields(0) & “’,” & fields(1) & “,’” & fields(2) & “’)”

dbConn.Open()
dbCmd.ExecuteNonQuery()
dbConn.Close()
i = i + 1
line = sr.ReadLine()

Loop While Not line = String.Empty

End Sub

2) StreamReader.ReadLine, Batch Insert With DataAdapter

Sub Method2()
Dim i As Long = 0
Dim dbConn As SqlConnection = New SqlConnection(ConfigurationManager.ConnectionStrings(“MyDB”).ToString())
Dim sr As StreamReader = New StreamReader(filename)
Dim line As String = sr.ReadLine()

Dim strArray As String() = line.Split(“,”)
Dim dt As DataTable = New DataTable()
Dim row As DataRow

For Each s As String In strArray
dt.Columns.Add(New DataColumn())
Next

Do

row = dt.NewRow()
row.ItemArray = line.Split(“,”)

dt.Rows.Add(row)

i = i + 1
line = sr.ReadLine()

Loop While Not line = String.Empty

Dim dataAdapter As New SqlDataAdapter()

dataAdapter.SelectCommand = New SqlCommand(“SELECT TOP 1 Column1,Column2,Column3 from dbo.TestData”, dbConn)

Dim cmdBuilder As SqlCommandBuilder = New SqlCommandBuilder(dataAdapter)

dbConn.Open()

Dim ds As DataSet = New DataSet
dataAdapter.Fill(dt)

dataAdapter.UpdateBatchSize = 1000
dataAdapter.Update(dt)

dbConn.Close()
End Sub

3) StreamReader.ReadLine, SqlBulkCopy

Sub Method3()
Dim i As Long = 0
Dim dbConn As SqlConnection = New SqlConnection(ConfigurationManager.ConnectionStrings(“MyDB”).ToString())
Dim sr As StreamReader = New StreamReader(filename)
Dim line As String = sr.ReadLine()

Dim strArray As String() = line.Split(“,”)
Dim dt As DataTable = New DataTable()
Dim row As DataRow

For Each s As String In strArray
dt.Columns.Add(New DataColumn())
Next

Do

row = dt.NewRow()
row.ItemArray = line.Split(“,”)

dt.Rows.Add(row)

i = i + 1
line = sr.ReadLine()

Loop While Not line = String.Empty

Dim bc As SqlBulkCopy = New SqlBulkCopy(dbConn, SqlBulkCopyOptions.TableLock, Nothing)
bc.DestinationTableName = “TestData”

bc.BatchSize = dt.Rows.Count

dbConn.Open()
bc.WriteToServer(dt)
dbConn.Close()
bc.Close()
End Sub

The results of the 3 methods are surprising. The thing is, most people are going to use Method1 because it just is the first thing you think of doing, and maybe the easiest to code (everyone learns loops in school, etc) – now, nitpickers will say “use a stored proc” etc – that will save minimal time, and in best practice yes, but for the sake of the example bear with it..

Method2 is less intuitive, and really tricky to get working (at least I had some issues with it) but once it works, it makes a little bit more sense then Method1.

Method3 is something that no one ever hears or uses, but once they do, they never go back.

Side note: about 5 years ago I worked on a program that inserted huge files, and they were taking 10-20 minutes a piece. I was using VB6, and converted the line by line insert to use BCP from code and got it down to 2-3 minutes, which was good. So I know about BCP and BULK INSERT. I just didn’t know it was built into .NET, now I do..anyways, on to the results.

Method 1- 14.69 minutes to insert 1 million records

Method 2 – 7.88 minutes to insert 1 million records

Method 3 – 0.28 minutes to insert 1 million records

So yeah. Wow. That is not a typo for Method 3. Roughly 17 seconds. Now, give Method2 so credit, it reduced the time from Method1 by 50% but Method3 just KILLS them both. Man, just awesome. When you run something like that and see that kind of performance, you can’t help but smile.

A few caveats and gotchas:

Method2 – the BatchSize property I have set to 1000. If you set to 0 it uses the max. I tried this and locked my machine up. Yikes.

Method3 – The SqlBulkCopyOptions makes a difference – TableLock speeds up the operation. The BatchSize here I have set to the # of rows, It might run differently with different batch sizes, I really didn’t experiment with it, but adding the Copy Options and BatchSize, it sped up the operations.

So, the fastest way I have found in .NET to load data from files to sql – hands down – SqlBulkCopy. Since it took 17 seconds, The next step is to try different file reading methods and see what time times are there, like I said, maybe that will be my next post. 🙂

p.s. yes I am up at 2:00 AM posting this, insomnia anyone? In any event I got to watch the repeat of the Nevada Democratic Debate 🙂

Anúncios

Posted in asp | Leave a Comment »

SQL SERVER MANAGEMENT STUDIO AND EXCEL – COLUMN HEADINGS ON COPY/PASTE

Posted by karuta em fevereiro 7, 2017

Here is a tip about a setting that I think should be on by default, but it isn’t. How many times are you writing T-SQL in SQL Server Management Studio (SSMS) and you need to just copy the results out, paste to Excel and either do more analysis or send off to someone. What happens when you have a result set with 20 columns? 30? 50?

Well, the old way to get the column headings in Excel was to just type them in. Ouch. This gets old realllly fast. But there is a setting in SSMS that most people don’t even know exists, and then once they find out about it, they are like, dang! I wish I would have known about that.

Well, fire up SSMS, Tools->Options, Query Results, SQL Server, Results to Grid. Then check the “Include column headers when copying or saving the results” Hit OK, and the any new query window you have will automatically copy the column headings from your result set with the data!

 

image

Posted in sql server | Leave a Comment »

SQL SERVER 2008 – SAVING CHANGES IS NOT PERMITTED

Posted by karuta em fevereiro 7, 2017

Finally getting around to doing some work on SQL 2008, and after about 3 minutes, I run into this error: “Saving Changes is not permitted.. blah blah blah” See screenshot below.

image

This is different than SQL 2005. Microsoft maybe trying to save us from ourselves? The thing is, I never “enabled the option Prevent saving changes that require the table to be re-created” – it seems to be enabled by default. It would be awesome if this error told me exactly where the setting was.

 

Well, it happes to be in Tools->Options, Designers, Table and Database Designers. Uncheck the box and go about your merry way!

 

image

Posted in sql server | Leave a Comment »