Performance head of using SELECT * in query

In some cases we tend to write ‘SELECT * …’ in our queries. Even though we have code maintenance issues with ‘SELECT * …’ query (say for example in case a new column gets in to the schema, application coding against this query breaks), most important issue would be performance. In this tutorial I would like to show a sample code to differentiate how performance would get hit due to SELECT * query from others.

There can be argument that we can still make SELECT * query give us performance by covering all columns of the table in Covering Indexes and there by making a seek possible, but in general realistic scenarios we do not tend to put all the columns in an index. And more realistically we tend to put an index on top of most frequently used predicate. So a proper combination of predicate with columns listed in SELECT would give you a good performance.

In the following example we do have a TestTable with both Clustered and NonClustered indexes on Id and Data columns respectively. Then we will insert 300,000 rows of test data into it for our testing purpose. Later we have two SQL queries with ‘SELECT *’ and ‘SELECT COLUMNS’ with predicate on NonClustered index column i.e., Data. As our SELECT COLUMN query does only retrieve Data column, this query is very fast. When it comes to our SELECT * query, it retrieves all the data so a Clustered index scan is inevitable and it gives bot I/O and CPU performance heads.

Check out following code, comments are provided for ease of understanding –

/**************************** Create TestDB Database *****************************/ 

IF  NOT EXISTS (SELECT [name] FROM master.[dbo].[sysdatabases] WHERE [name] = 'TestDB')
BEGIN
       CREATE DATABASE [TestDB];
END
GO

USE [TestDB]
GO


/**************************** Create TestTable Table ******************************/

IF EXISTS(SELECT * FROM sys.sysobjects WHERE type = 'U' AND name = 'TestTable')
BEGIN
        DROP TABLE [dbo].[TestTable];
END


CREATE TABLE [dbo].[TestTable](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Data] [int] NOT NULL,
    [InsertedDate] [datetime] NOT NULL DEFAULT GETDATE()
)


IF EXISTS (SELECT name FROM sys.indexes WHERE name = N'PK_Individual_Id')
BEGIN
        DROP INDEX PK_Individual_Id ON [dbo].[TestTable];
END


CREATE 
        CLUSTERED INDEX PK_Individual_Id
ON 
        [dbo].[TestTable] (Id);
GO


IF EXISTS (SELECT name FROM sys.indexes WHERE name = N'IX_Individual_Data')
BEGIN
        DROP INDEX IX_Individual_Data ON [dbo].[TestTable];
END


CREATE 
        NONCLUSTERED INDEX IX_Individual_Data
ON 
        [dbo].[TestTable] (Data);
GO


/************************** Insert Test Data using CTE ****************************
**********************************************************************************/

WITH TestData AS
 (
    -- Anchor Query
    SELECT
           1 As SampleId,
           1 AS SampleData

    -- Union All
    UNION ALL

    -- Recursive Query
    SELECT
           td.SampleId + 1,
           CAST(ABS(CHECKSUM(NEWID())) % 300 AS INT)
    FROM
           TestData td
    WHERE
           td.SampleId < 30000

 )
 INSERT
 INTO
       [dbo].[TestTable] (Data)
 SELECT
       SampleData
 FROM
       TestData OPTION (MAXRECURSION 30000);

GO 10


/**************************** Clear Buffers and Cache *****************************/

DBCC FREESESSIONCACHE;
DBCC FREEPROCCACHE;
DBCC FREESYSTEMCACHE('ALL');
DBCC DROPCLEANBUFFERS;
GO


/**********************************************************************************
As we are requesting all the rows in the table, here a Clustered index scan is 
inevitable. Examining Execution plan shows 81% of total cost towards batch. So sending 
unnecessary data is going to give poor performance.
**********************************************************************************/

SELECT
        *
FROM
        [dbo].[TestTable]
WHERE
        [Data] = 251;


/**************************** Clear Buffers and Cache *****************************/

DBCC FREESESSIONCACHE;
DBCC FREEPROCCACHE;
DBCC FREESYSTEMCACHE('ALL');
DBCC DROPCLEANBUFFERS;
GO


/**********************************************************************************
As we are only requesting only data which is present in the non-clustered index itself.
This query is not returning unnecessary data and will give more performance. Examining 
Execution plan shows it got 0% cost towards complete batch cost.
**********************************************************************************/

SELECT
        [Data]
FROM
        [dbo].[TestTable]
WHERE
        [Data] = 251;


/***************************** Drop TestTable Table *******************************/

IF EXISTS(SELECT * FROM sys.sysobjects WHERE type = 'U' AND name = 'TestTable')
BEGIN
    DROP TABLE [dbo].[TestTable];
END
GO


/***************************** Drop TestDB Database ******************************/

USE master
GO

IF EXISTS (SELECT [name] FROM master.[dbo].[sysdatabases] WHERE [name] = 'TestDB')
BEGIN
       DROP DATABASE [TestDB];
END
GO

 

Output –

SelectAllPerformance1

You may also like...