SQL Server: should I use information_schema tables over sys tables?

SqlSql ServerSql Server-2008Stored ProceduresMetadata

Sql Problem Overview


In SQL Server there is two schemas for metadata:

  • INFORMATION_SCHEMA
  • SYS

I have heard that INFORMATION_SCHEMA tables are based on ANSI standard. When developing e.g. stored procedures, should it be wise to use INFORMATION_SCHEMA tables over sys tables?

Sql Solutions


Solution 1 - Sql

Unless you are writing an application which you know for a fact will need to be portable or you only want quite basic information I would just default to using the proprietary SQL Server system views to begin with.

The Information_Schema views only show objects that are compatible with the SQL-92 standard. This means there is no information schema view for even quite basic constructs such as indexes (These are not defined in the standard and are left as implementation details.) Let alone any SQL Server proprietary features.

Additionally it is not quite the panacea for portability that one may assume. Implementations do still differ between systems. Oracle does not implement it "out of the box" at all and the MySql docs say:

> Users of SQL Server 2000 (which also follows the standard) may notice > a strong similarity. However, MySQL has omitted many columns that are > not relevant for our implementation, and added columns that are > MySQL-specific. One such column is the ENGINE column in the > INFORMATION_SCHEMA.TABLES table.

Even for bread and butter SQL constructs such as foreign key constraints the Information_Schema views can be dramatically less efficient to work with than the sys. views as they do not expose object ids that would allow efficient querying.

e.g. See the question SQL query slow-down from 1 second to 11 minutes - why? and execution plans.

###INFORMATION_SCHEMA

Plan

###sys

Plan

Solution 2 - Sql

I would always try to use the Information_schema views over querying the sys schema directly.

The Views are ISO compliant so in theory you should be able to easily migrate any queries across different RDBMS.

However, there have been some cases where the information that I need is just not available in a view.

I've provided some links with further information on the views and querying a SQL Server Catalog.

http://msdn.microsoft.com/en-us/library/ms186778.aspx

http://msdn.microsoft.com/en-us/library/ms189082.aspx

Solution 3 - Sql

INFORMATION_SCHEMA is more suitable for external code that may need to interface with a variety of databases. Once you start programming in the database, portability kind of goes out the window. If you are writing stored procedures, that tells me you have committed to a particular database platform (for better or for worse). If you have committed to SQL Server, then by all means, use the sys views.

Solution 4 - Sql

I won't repeat some of the other answers but add a performance perspective. information_schema views, as Martin Smith mentions in his answer, are not the most efficient source of this information since they have to expose standard columns that have to be collected from multiple underlying sources. sys views can be more efficient from that perspective, so if you have high performance requirements, and don't have to worry about portability, you should probably go with sys views.

For example, the first query below uses information_schema.tables to check if a table exists. The second one uses sys.tables to do the same thing.

if exists (select * from information_schema.tables where table_schema = 'dbo' and table_name = 'MyTable')
	print '75% cost';
	
if exists (select * from sys.tables where object_id = object_id('dbo.MyTable'))
	print '25% cost';

When you view the IO for these, the first query has 4 logical reads to sysschobjs and sysclsobjs, while the second one has none. Also the first one does two non-clustered index seeks and a key lookup while the second one only does a single clustered index seek. First one costs ~3x more than the second one according to query plans. If you have to do this lots of times in a large system, say for deployment time, this could add up and cause performance problems. But this really only applies to heavily loaded systems. Most IT line of business systems don't have these levels of performance issues.

Again, the overall cost of these are very small individually when compared to other queries in most systems but if your system has a lot of this type of activity, it could add up.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionjuurView Question on Stackoverflow
Solution 1 - SqlMartin SmithView Answer on Stackoverflow
Solution 2 - SqlcodingbadgerView Answer on Stackoverflow
Solution 3 - SqlPeter RadocchiaView Answer on Stackoverflow
Solution 4 - SqlTombalaView Answer on Stackoverflow