Showing posts with label Best Practices. Show all posts
Showing posts with label Best Practices. Show all posts

Wednesday, October 04, 2017

Sizing database files

It has been a while since I wrote some of my best practices posts. I decided to revisit these posts again to see if anything has changed, I also wanted to see if I could add some additional info.

This post will demonstrate that there is a difference in performance if you don't size your database file accordingly. It is a good practice to have your database sized correctly for the next 6 to 12 months, you don't want your server wasting cycles with growing files all the time.

Figure out how big your files are now, figure out how much they will grow in the next year and size your files accordingly, check back every month or so to see if your estimates were correct.

By default SQL Server will create databases with very small files when you create a database and you don't specify the sizes. If you have people creating databases on your servers, consider adding a DDL trigger to notify you when a new DB is added so that you can talk to the database creator and size the files. You also can change the defaults on the server so that you don't have the 10% growth either.

First let's see what the difference is when we have a database where the files will have to grow versus one where the files are big enough for the data that will be inserted.
Here we are creating two databases, one with much bigger files than the other one


The TestBigger database is correctly sized for the data that will be inserted

CREATE DATABASE [TestBigger]
 ON  PRIMARY 
( NAME = N'TestBigger', FILENAME = N'f:\TempTestBigger.mdf' , 
SIZE = 509600KB , FILEGROWTH = 1024KB )
 LOG ON 
( NAME = N'TestBigger_log', FILENAME = N'f:\TempTestBigger_log.ldf' , 
SIZE = 502400KB , FILEGROWTH = 10%)
GO


The TestSmaller database is very small, files will have to be expanded many times to accommodate all the data I will be inserting

CREATE DATABASE [TestSmaller]
 ON  PRIMARY 
( NAME = N'TestSmaller', FILENAME = N'f:\TempTestSmaller.mdf' , 
SIZE = 5280KB , FILEGROWTH = 1024KB )
 LOG ON 
( NAME = N'TestSmaller_log', FILENAME = N'f:\TempTestSmaller_log.ldf' , 
SIZE = 504KB , FILEGROWTH = 10%)
GO



These two stored proc calls are just to verify that the files match with what we specified, you can use sp_helpdb to check the size of a database that you created when you don't specify the file sizes

EXEC sp_helpdb 'TestBigger'

name         filename             filegroup SIZE
TestBigger f:\TempTestBigger.mdf      PRIMARY 509632 KB
TestBigger_log f:\TempTestBigger_log.ldf NULL 502400 KB
EXEC sp_helpdb 'TestSmaller'

name         filename              filegroup SIZE
TestSmaller f:\TempTestSmaller.mdf      PRIMARY 5280 KB
TestSmaller_log f:\TempTestSmaller_log.ldf NULL  512 KB


Next, we are creating two identical tables, one in each database


USE TestSmaller
GO
CREATE TABLE test (SomeName VARCHAR(100), 
SomeID VARCHAR(36), SomeOtherID VARCHAR(100), SomeDate DATETIME)

USE TestBigger
GO
CREATE TABLE test (SomeName VARCHAR(100),
 SomeID VARCHAR(36), SomeOtherID VARCHAR(100), SomeDate DATETIME)

This query is just used so that the data is cached for the two inserts later on, this way the data doesn't have to be fetched from disk for either inserts, you can discard the results after the query is done

USE master
GO


SELECT TOP 1000000 c1.name,NEWID(),NEWID(),GETDATE() 
FROM sys.sysobjects c1
CROSS JOIN sys.sysobjects c2
CROSS JOIN sys.sysobjects c3
CROSS JOIN sys.sysobjects c4


Here is the first insert into the bigger database

INSERT TestBigger.dbo.test
SELECT TOP 1000000 c1.name,NEWID(),NEWID(),GETDATE() 
FROM sys.sysobjects c1
CROSS JOIN sys.sysobjects c2
CROSS JOIN sys.sysobjects c3
CROSS JOIN sys.sysobjects c4

Here is the second insert into the smaller database

INSERT TestSmaller.dbo.test
SELECT TOP 1000000 c1.name,NEWID(),NEWID(),GETDATE() 
FROM sys.sysobjects c1
CROSS JOIN sys.sysobjects c2
CROSS JOIN sys.sysobjects c3
CROSS JOIN sys.sysobjects c4


On several machines I tested on, it takes half the time or less to insert the data in the bigger database compared to the smaller database. On some machines it is almost 5 times faster to insert into the bigger database.

How about on your machine, do you see that the insert into the bigger database takes less than half the time it takes to insert into the smaller database?


Check the sizes of the databases again

EXEC sp_helpdb 'TestBigger'

name         filename            filegroup size
TestBigger f:\TempTestBigger.mdf      PRIMARY 509632 KB
TestBigger_log f:\TempTestBigger_log.ldf NULL 502400 KB
EXEC sp_helpdb 'TestSmaller'

name         filename              filegroup size
TestSmaller f:\TempTestSmaller.mdf      PRIMARY 215296 KB
TestSmaller_log f:\TempTestSmaller_log.ldf NULL 427392 KB




As you can see, the bigger database did not expand, the smaller database expanded a lot.


Autogrow
If you do use autogrow, then make sure you don't use the default 10%, take a look at this message
Date 10/03/2017 12:57:56 PM
Log SQL Server (Current - 11/25/2012 5:00:00 AM)
Source spid62
Message
Autogrow of file 'MyDB_Log' in database 'MyDB' took 104381 milliseconds. Consider using ALTER DATABASE to set a smaller FILEGROWTH for this file.
See that, it took a long time, you don't want to grow a one terabyte file by ten percent, that would be one hundred gigabytes, that is huge. Use something smaller and don't use percent, the bigger the file gets the longer it will take to expand the file.


File placement
Separate the log files from the data files by placing them on separate hard drives. Placing the files on separate drives allows I/O activity to occur at the same time for both the data and log files. Instead of having huge files consider having smaller files in separate filegroups. Put different tables used in the same join queries in different filegroups as well. This will improve performance, because of parallel disk I/O searching for joined data.

Put heavily accessed tables and the nonclustered indexes that belong to those tables on different filegroups. This will improve performance, because of parallel I/O if the files are located on different physical disks. Just remember that you can't separate the clustered indexes from the base table, you can only do this for non clustered indexes. Of course people can get very creative, I have worked with a database once where each table was placed in its own filegroups, there were hundreds of files....what a mess


Tempdb
There are all kinds of recommendations about how many data files you should have for tempdb. Start with 4 files and add more files if you see contention. Paul Randal, has a detailed post here: A SQL Server DBA myth a day: (12/30) tempdb should always have one data file per processor core.

If you can, place tempdb on its own physical drive as well, separated from the user databases.

Consider Solid State hard drives or flash storage for tempdb

See also http://support.microsoft.com/kb/2154845 for recommendations by Microsoft Customer Service and Support


Test, test, test
Never ever blindly follow what you read on the internet, make sure that you test it out first on a QA server before promoting the changes to production!!



Saturday, January 05, 2008

The World Is Small, The Risk Of Your Data Being Stolen Is Not!

Remember the How Is Your Sensitive Data Encrypted In The Database? post I wrote a while back? A colleague just informed me that he got a letter from that same datacenter. The letter states that his personal data was on one of those servers which got stolen. I told him that this is the reason we encrypt our data and also why we encrypt outside of the DB. The world is small indeed.

Here is a pic of the letter

IdentityTheft

Friday, November 23, 2007

Whitepaper on Malware to Attack Databases

Brian Kelly on his blog mentiones a whitepaper by Cesar Cerrudo: Data0: Next generation malware for stealing databases. This whitepaper describes how malware could be crafted to steal information out of databases.



The attack will use the following techniques:
  • Discovery
  • Exploitation
  • Escalate Privileges (if necessary)
  • Cover Tracks


Print it out and read it while you wait in line on Black Friday

Wednesday, September 19, 2007

SQL Injection Cheat Sheet

What is SQL Injection? From wikipedia: SQL injection is a technique that exploits a security vulnerability occurring in the database layer of an application. The vulnerability is present when user input is either incorrectly filtered for string literal escape characters embedded in SQL statements or user input is not strongly typed and thereby unexpectedly executed

Here is a nice SQL injection cheat sheet. Currently only for MySQL and Microsoft SQL Server, some ORACLE and some PostgreSQL

http://ferruh.mavituna.com/makale/sql-injection-cheatsheet/

Table Of Contents
About SQL Injection Cheat Sheet
Syntax Reference, Sample Attacks and Dirty SQL Injection Tricks

Line Comments
SQL Injection Attack Samples

Inline Comments
Classical Inline Comment SQL Injection Attack Samples
MySQL Version Detection Sample Attacks

Stacking Queries
Language / Database Stacked Query Support Table
About MySQL and PHP
Stacked SQL Injection Attack Samples

If Statements
MySQL If Statement
SQL Server If Statement
If Statement SQL Injection Attack Samples

Using Integers

String Operations
String Concatenation

Strings without Quotes
Hex based SQL Injection Samples

String Modification & Related

Union Injections
UNION – Fixing Language Issues

Bypassing Login Screens

Enabling xp_cmdshell in SQL Server 2005
Other parts are not so well formatted but check out by yourself, drafts, notes and stuff, scroll down and see.

Saturday, July 14, 2007

Best Practice: Backups

What if I told you to take your latest production backup, restore it on a different machine and try using the database? Are you comfortable with that task? Do you think it will work? When was the last time you tested your backups?

Do you even have a backup?
Why am I asking all these things? Because your data is as good as your last good backup. Is your data backed up regularly? You will say “Of course it is we use [Insert expensive backup solution here] for all our enterprise backups”. Prove it, go to work on Monday and ask them to give you the latest backup. I bet out of a 100 people who ask this question to their backup team there will be several people without a backup file.
Here is another problem: three years ago the backups were taking about 1 hour. The backup started at 12 it would be done at 1, at 1:30 a job from another machine would ftp the file down. Two years later the backup takes 2 hours to complete, you didn’t realize this. Can you guess what will happen if you try to restore once of those backup that were moved by FTP? I will tell you it won’t work. What if there is no backup and you do a FTP? Oh yes the 0kb file will be created.

Where do you keep your backups?
Are you backups in the same building? If you would say yes then you have a big problem. Let me tell you a little story. I worked for a company in New York City between 2001 and 2005. This company had their office in WTC tower one. To be safe they kept their backups in WTC tower two. Well I don’t have to tell you what happened with the backup. If you do store your backup offsite (and why wouldn’t you?) make sure it is at least 100 miles away. If you don’t want to go that far from your current location then pick a location which is safe from floods, fires and not worthy to attack.

Where is your Source Code?
Do you backup your source code? Most people will say they keep it in Subversion or Visual Source Safe. But does that get backed up? What happens if your building goes up in flames? What we do is we have a full source code backup every day. In addition to that we also have differential backups every n revisions. We have jobs that create these backups and then FTP them to 3 different locations. If you have 20 developers and you lose 6 hours of work then you have lost 120 * $$ (you do the math). This is the best case scenarios. If the backup was in the building together with all the workstations then you got a lot bigger problem to deal with.
SQL developers are notorious for not using source control. They will tell you that the database backup is their source control. A source control system does not have to be expensive; we use Subversion (which is free and better than VSS). You can either use Tortoise or the plugin for Visual Studio to do your check ins.

Friday, June 08, 2007

Three New SQL Server Best Practices Articles On TechNet

Predeployment I/O Best Practices

The I/O system is important to the performance of SQL Server. When configuring a new server for SQL Server or when adding or modifying the disk configuration of an existing system, it is good practice to determine the capacity of the I/O subsystem prior to deploying SQL Server. This white paper discusses validating and determining the capacity of an I/O subsystem. A number of tools are available for performing this type of testing. This white paper focuses on the SQLIO.exe tool, but also compares all available tools. It also covers basic I/O configuration best practices for SQL Server 2005.
On This Page

Overview

Determining I/O Capacity

Disk Configuration Best Practices & Common Pitfalls

SQLIO

Monitoring I/O Performance Using System Monitor

Conclusion

Resources



Partial Database Availability

This white paper outlines the fundamental recovery and design patterns involving the use of filegroups in implementing partial database availability in SQL Server 2005. As databases become larger and larger, the infrastructure assets and technology that provide availability become more and more important.

The database filegroups feature introduced in previous versions of SQL Server enables the use of multiple database files in order to host very large databases (VLDB) and minimize backup time. With data spanning multiple filegroups, it is possible to construct a database layout whereby failure of certain data resources do not render the entire solution unavailable. This increases the availability of solutions that use SQL Server and further reduces the surface area of failure that would render the database totally unavailable.



Comparing Tables Organized with Clustered Indexes versus Heaps

In SQL Server 2005, any table can have either clustered indexes or be organized as a heap (without a clustered index.) This white paper summarizes the advantages and disadvantages, the difference in performance characteristics, and other behaviors of tables that are ordered as lists (clustered indexes) or heaps. The performance for six distinct scenarios where DML operations are performed on these tables are measured and detailed observations presented. This white paper provides best practice recommendations on the merits of the two types of table organization, along with examples of when you might want to use one or the other.
On This Page

Introduction

Clustered Indexes and Heaps

Test Objectives

Test Methodology

Test Results and Observations

Recommendations

Appendix: Test Environment

Monday, October 17, 2005

Do Not Drop And Create Indexes On Your Tables

When you do this the nonclustered indexes are dropped and recreated twice, once when you drop the clustered index and then again when you create the clustered index.

Use the DROP_EXISTING clause of the CREATE INDEX statement, this recreates the clustered indexes in one atomic step, avoiding recreating the nonclustered indexes since the clustered index key values used by the row locators remain the same.

Here is an example:

CREATE UNIQUE CLUSTERED INDEX pkmyIndex ON MyTable(MyColumn)
WITH DROP_EXISTING