I learned a long time ago, that the quickest way to do something was to not do it at all.
A friend of mine asked me if I could review an update statement that was confounding their group looking for ways to optimize it.
The table has, just short of 50 columns, a considerable number of indexes and the column in question, has a datatype set to NVarChar(8), consists of some numbers, combinations of letters and numbers, etc.
The process would take 7 ½ hours to run this update, as it parses through 100K rows at a time. There is one index that includes the CN1 column in it, but no index on just the CN1 column. This is what the update statement looks like:
UPDATE TOP (@BatchSize) [dbo].[Table_b4]
SET CN1 = (CASE
WHEN TRY_CAST(CN1 AS INT) IS NULL THEN CN1
ELSE FORMAT(CAST(CN1 AS INT), ‘val1’)
WHERE CN1 NOT LIKE ‘[0-x][0-x][0-x][0-x][0-x][0-x][0-x][0-x]’
Now as we review this update statement and try to optimize it, the question came to me to ask, “Why am I updating this at all?” We have to cast the VarChar data as an integer, then format the data the data and verify that none of the existing values aren’t like the lovely section in the WHERE clause.
Back in Oracle 10g, I had numerous wide, SAS tables that required extensive CPU and IO to update them. I introduced the guys to CTAS and then exchange partition, but there’s also a version of this in SQL Server, called Swap Schema.
The idea behind it, is instead of updating an existing table, create a new table as select on the original one, building out the data on the insert as you would for the update, then swap the new table with the original, skipping the update.
The new process would look like the following:
- DBO, (Database Owner)
- STAGING, (For the build and switch)
First, create the table in the STAGING schema, based on a select from the original table in the DBO schema, but with the data inserted as the update:
CREATE TABLE STAGING.LRG_Tbl
AS SELECT * FROM DBO.LRG_Tbl
WHERE TRY_CAST(CN1 AS INT) IS NULL
THEN FORMAT(CAST(CN1 AS INT), ‘val1’)
INSERT INTO STAGING.LRG_Tbl
SELECT * FROM DBO.LRG_Tbl where CN1!=’val1′;
You get the idea… The goal here is to perform the CTAS with the data pre-built in the updated format and then insert the last of the data.
Once the table is built, now we need to do the switch and update statistics-
ALTER SCHEMA DBO TRANSFER STAGING.LRG_Tbl
GRANT SELECT ON “DBO.LRG_Tbl” TO “f_read_only”
You will grant all permissions to the object to match the previous object and the process is complete. This is a process that will need to be done in a quiet time, but if the process that takes just a matter of minutes replaces a current process that takes 7+ hours, it may be worth it.
If you decide to perform a CTAS naming the table differently than the original, you would need to use the SP_RENAME proc to correct the name and this could be a more complicated process. With this process, the privileges aren’t impacted.
If you select from the table, you’ll now see that it has the data updated without ever having to have performed an update.