Header Ads

25 Point Basic MySQL Setup/Optimization Checklist

Daily I run into new web programmers that are using PHP and MySQL to create their blogs and websites. I created this checklist as a guide for new and experienced to make sure they are covering the basics of a MySQL server setup.

This guide is by no means all inclusive, but should help to cover some of the major gaps in knowledge and commonly overlooked fundamentals that I run into on a daily basis.

The checklist is separated into 5 equal sections: Server Setup, Schema Design, Table Design, Index Optimization, Query Optimization, and a 6th Bonus Tips section. 
 
Section 1 – Server Setup

1 - Root User
For security reasons, the root MYSQL user must be setup with a secure password, and should only have access from localhost. It is a bad idea to allow outside access to the root account. Create additional users if you need to access the database remotely!

2 - Backup and Restore
Before allowing a database to be used in a production environment, there should be a usable backup and restore process. I use the phrase “in case the database server is completely destroyed” because the backup location and method needs to be completely independent from the database server. Note: Even a weekly database backup is better than no backup at all.

3 - Benchmarking
There’s no easy way to determine bottlenecks and trouble unless a method to benchmark performance is in place. The slow query log should always be enabled, and it’s a good idea to install a benchmarking program. Monyog is an external program that provides a number of real-time reports useful for monitoring and performance tuning.

4 - DNS
If you do not allow outside access, or you can access your server from known IP addresses, disabling DNS look-ups can speed up server operations. Additionally, if the MySQL server loses it’s DNS look-up service, the usability of the entire Database can all but halt.

5 - Privileges
When adding users to a database, only give them the permissions that are absolutely required, and be specific in where they can access from. “GRANT ALL ON EVERYTHING TO USER@ANYWHERE” is a really bad idea. If you do need to give users full permissions for installations or another purpose, it’s a good practice to change them back to the minimum once complete.
Section 2 – Schema Design

6 - Naming
A standardized naming scheme should always be used. The best practice is to use lowercase letters with an underscore connecting names, such as `my_personal_database`. Tables, and individual column names should carry the same naming convention. Use descriptive names for every column including id columns. `id` is not descriptive whereas `contact_id` is.

7 - Collation
Use the same collation for all parts of the database, and avoid using UTF-8 or multi-byte formats unless you specifically need them. Keeping the same format on all tables and columns can help prevent data corruption and conversion problems. UTF-8 requires significantly more disk space and overhead which can reduce performance. If you need UTF-8, use it , but don’t make your entire database UTF-8 because you’re lazy.

8 - Foreign Keys
Always use foreign keys to ensure that bad / incomplete data stays out of the database. Nothing replaces good application level programming, but foreign keys are the best way to prevent putting bad data into your database in the first place. You will need InnoDB to use foreign keys, but the benefit is worth it.

9 - Logically Segmented Data in Tables
Tables should be segmented logically by the data they contain and their association with other tables. In this manner, there may be more total tables, but will help eliminate tables with a huge number of columns which can really hurt performance. Additionally, it will make querying easier as it’s unlikely that every column is needed for every query. This also allows for Single-to-many relationships such as storing multiple addresses related to a single entry. Don’t be afraid of 20 tables with 20 columns each, be afraid of 1 table with 400 columns!

10 - Reserved Words
Avoid using reserved words for any name in your database schema. Words like date, time, decimal, etc. are often used, and can wreak havoc trying to get queries to work properly, and can cause even more difficulty in debugging. You can technically use these words if they are placed in back-ticks (`date`), but this is a bad practice and should always be avoided.

Section 3 – Table Design

11 - Data types
MySQL has many data types, probably more than any other database. Using the correct data type for the data being stored is one of the most important aspects in design. Failing at this step can break a database’s speed and the usability of an application.

Whole Numbers – BIGINT, INT, MEDIUMINT, SMALLINT, TINYINT
Decimals – DECIMAL, FLOAT, DOUBLE, REAL
Dates – TIMESTAMP*, DATETIME, DATE, TIME, YEAR
Strings – CHAR, VARCHAR, BINARY, VARBINARY, BLOB, TEXT, ENUM, SET

Additionally, using the correct data type allows for the use of MySQL’s built-in functions which can sort, do math, comparisons, date conversions, etc. For example, I often see dates stored in VARCHAR columns, which completely prevents MySQL from sorting, or using any date related function.

12 - Large Numerical Keys
It’s common for new programmers to use a BIGINT(20) when they need a key column. While admirable, this is a waste of disk space. An UNSIGNED INT(10) has over 4 billion possible numbers, which is more than most will ever use. Even so, by that time, you will want to look into partitioning, and will have a variety of other problems on your hand. Don’t use BIGINT’s unless you need to store very-very large numbers.

13 - Smallest Length
Using the smallest length data length is important. Every byte of savings adds up when a database’s size and usage goes up. Lazy programming by using VARCHAR(255) or DECIMAL(20,2) creates unnecessary overhead and causes problems down the road. Give yourself one extra byte of space if needed, but 100 is a overkill.

14 - Avoid TEXT and BLOB Columns
TEXT and BLOB type columns can eat up a server’s resources when being selected. While these are most certainly needed to store larger amounts of data, they should only be used for that purpose. VARCHAR can hold up to 255 bytes and should always be used before TEXT whenever possible.

15 - Non-Relational Storage
A huge design mistake is storing data in a non-relational format. It’s common to see data stored in a CSV format like (value1,value2,value3) in a single column. This effectively bypasses MySQL’s ability to use the data. It’s best to use multiple tables for single-to-many relationships, as this allows for MySQL to handle the data in an elegant manner. There are some situations where storing csv-like data would make sense, but for all intensive purposes, avoid storing data like this.

Section 4 – Index Optimization

16 - Use proper indexes
MySQL supports several types of indexes (PRIMARY, UNIQUE, NORMAL, PARTIAL, and FULLTEXT). It is important to use the correct type of index for the job. It is also important to only use indexes when needed, and not to create duplicate indexes. For example a primary key column already has an index, so adding a second UNIQUE INDEX on the primary key is a complete waste of overhead and disk space.

17 - Multi-Column Indexes
If there is a data set that is constantly queried with more than one column in the WHERE clause, it may be a good idea to create a multi-column index. If you have an index on (`user_id`,`user_category`) the index will work when both are in the where clause or the first column (`user_id`) is in the where clause. However, the index will not be used if only `user_category` is in the where clause.

18 - Modifying Indexed Fields During a Query
Unless you specify the length of an index, modifying an indexed column will prevent the index from being used.

For example, if there is an index on `credit_card_number` and you perform a query like this:
 
SELECT `user_id` FROM `my_table` WHERE LEFT(`credit_card_number`,4) = '5666';
The index will not be used. If this was a common scenario, you could create a partial index of (`credit_card_number`,4), and the above query would use the index.

19 - Indexes With a High Cardinality
Indexes work best when there are many unique values in relation to the total number of rows. This allows the database engine to quickly reduce the number of possible rows in the result set. Indexes on columns with only a few unique values are inefficient and will end up being a waste of overhead.

20 - Unique and NULL Column Indexes
Allowing NULLS in index columns adds an additional byte of storage per row to the index. This again equates to a waste of space and overhead and will slow down MySQL’s performance. It’s better to use no value rather than NULL.

Section 5 – Query Optimization

21 - Specific Column Names
Always use specific column names instead of * when querying a table. SELECT * is lazy programming. While it is completely valid syntax, you won’t know the columns that will be returned. If you don’t know what you’re going to get with a query, there’s no reason to use it.. right? Write out any column names that you need data from. This way your code is intuitive, you won’t have problems trying to use data from a column that doesn’t exist, and the next person using your script wont hate you.

22 - MySQL’s Built-in Functions
MySQL has a variety of very advanced, and very fast, built-in functions. They probably are much more efficient than php or most other application level scripts. These functions can greatly increase your application’s speed, and reduce its complexity. MySQL has everything from mathematical operations, date comparisons, even spacial functions for calculation geographic equations. Learn to use them.

23 - Selecting TEXT and BLOB Columns
When a TEXT or BLOB column is select in a query, MySQL will create a temporary internal table. If large result sets are selected with TEXT or BLOB columns, this can create a major load on the database, and unnecessary overhead. This relates back to SELECT *, don’t select a TEXT or BLOB type column unless you actually need to use the data.

24 - Use Transactions
Transactions are another great way of preventing incomplete or corrupted data while inserting or altering data. When using a transaction, you can insert or alter any number of rows of data. If there is an error, all of the queries in the transaction will be aborted. Think of inserting 50,000 rows into a report table, and having 10 arbitrary rows not insert correctly. That entire set of data is now corrupt, and a transaction would have prevented that.

25 - SQL_NO_CACHE
SQL_NO_CACHE is a great way to prevent MySQL from caching a query’s result. This is important for results with a rapidly changing data, or very large result sets. Both of these situations can eat up server resources without any gain to the application or end-user.

Bonus

1 - TIMESTAMP vs. DATETIME
TIMESTAMP and DATETIME store dates in the exact same format (YYYY-MM-DD HH:MM:SS) but TIMESTAMP uses less space to do so. The only limitation is that TIMESTAMP cannot be used for dates older than Jan 1st, 1970.

2 - SIGNED INT
Unless you need to store negative numbers, only use UNSIGNED INT and other numerical data type fields. There’s no reason to allow for negative numbers if they will never be used.

3 - Collation: _ci vs. _cs
The _ci in a collation stands for “case insensitive”. If you care about case sensitivity use a collation that ends in _cs. The can be very important for searching and other operations where John ≠ john!

4 - InnoDB vs. MyISAM
If you’re using MyISAM as a storage engine only because it was the default, you may be making a mistake. InnoDB is superior in several areas (Reliability, Backups, Foreign Keys, and Performance in many situations) and while maybe not always the best option (Full Text Indexing), you should know why you’re using the engine you’re using. You can also mix the 2, but this can make performance tuning especially difficult.

5 - Consult a Professional
When you get a project and the database design, usage or other factor is just over your head, it’s a good idea to consult a professional. It may cost a fair sum, but the cost down the road could be substantial. Planning is always cheaper than reacting.



No comments

Powered by Blogger.