Check if SQL Server is Under CPU Pressure

Last fall, I wrote about how to check wait conditions in SQL Server.  As I pointed out in that post, if a high percentage of your server’s wait time is due to waits of the type SOS_SCHEDULER_YIELD, you may be experiencing CPU pressure on your server.

You can use the following query to further investigate if your server is waiting on CPU resources.

SELECT scheduler_id, current_tasks_count, runnable_tasks_count
FROM sys.dm_os_schedulers
WHERE scheduler_id < 255

Here is an example of the output of this query:

scheduler_id current_tasks_count runnable_tasks_count
———— ——————- ——————–
0            8                   0
1            11                  0

If you run this query frequently and see the runnable_tasks_count consistenty above zero, that is an indication that the server is in fact CPU-bound.  If you see runnable_tasks_count in double digits for any length of time, that is cause for immediate concern.

The sys.dm_os_schedulers dynamic management view which is used in this query gives information on the SQL Server schedulers.  Schedulers are responsible for allocating system resources (such as CPU time) to SQL Server processes.  sys.dm_os_schedulers contains one row per scheduler for schedulers mapped to individual processors.  “Active worker” schedulers are those dedicated to regular user-controlled tasks; these are assigned IDs less than 255.  Schedulers with IDs above 255 are called “hidden”schedulers; these are used by system processes.  You should be more concerned with the “active worker” schedulers, and that’s why the query explicitly excludes schedulers 255 and up.

Optionally, instead of checking for particular scheduler_id values, you could modify the query to check the scheduler status, as follows:

SELECT scheduler_id, current_tasks_count, runnable_tasks_count
FROM sys.dm_os_schedulers
WHERE status = ‘VISIBLE ONLINE’

The WHERE clause (status = ‘VISIBLE ONLINE’) will limit the result set to only those schedulers that are available for user processes.

These queries should work in SQL Server 2005, 2008, 2008 R2, and 2012.

One final point… before running out to upgrade your server, you should ensure that your queries and stored procedures are using the correct plans. An incorrect or corrupt plan can cause SQL Server to consume far more of the CPU resources than should normally be necessary.  In that case, adding physical resources to your server is not the answer.

Drupal/PHP: User Sessions and Garbage Collection

I recently had to troubleshoot a web site running on the LAMP platform.  I have deep knowledge of the Windows platform technologies, but only a passing knowledge of the technologies on which this particular site was built.

The specific technologies involved were PHP 5.2 (5.2.6), MySQL 5 (5.0.51a), Apache 2.2.9, and Debian Linux (lenny9).  The web site was based on Drupal 6 (6.19).  Everything was running on a single virtualized server.  The problem was that the database backups were taking so much time and consuming so many server resources that the web site was becoming unavailable for 10-20 minutes every night.

Following is a description of what I did to find and correct the problem.  This description will highlight some SQL queries that are useful for troubleshooting MySQL, as well as configuration settings that are key to management of Drupal user sessions and PHP garbage collection.

The original administrator for the site had set up a process to back up the MySQL database, compress the backup file into a ZIP, and move the ZIP file to a secondary network location.  As the site outages corresponded to the times when this backup process was running, it seemed clear that it was the cause of the problems.  The question was why.

The resources allocated to the virtual server, particularly RAM, were somewhat limited.  So an initial guess at the source of the problem was simply that the server was underpowered.

I examined the web server logs and confirmed that the web server was producing HTTP 503 errors during the period of time that the backups were occurring.  Further examination of the web server logs showed that the periods of time that the 503 errors were occurring each night had been gradually increasing over time.  In fact, the site had run for more than 18 months without a problem.  After that, 503 errors began to appear.  These errors had gone unnoticed for about three months.  Simple site monitors were in place to verify the availability of the web site, but those monitors only checked the site every 10 to 15 minutes.  Because of this, the initial site outages were rarely long enough to trigger the site monitors.  After about three months the outages became long enough to consistently trigger the site monitors, and forced my investigation into the source of the problem.

In all of this time the server resources were unchanged, so the theory that the site was underpowered was called into question by the slow-but-steady escalation of the problem.

On the other hand, data was constantly being added to the site, so I began to consider the possibility that the growth of the database was to blame.  It seemed feasible that the backups would take longer as the database grew, as would the corresponding site outages.

I ran the following SQL query on the MySQL server, which returned the size in MB of each database on the server.  It showed that the database for the web site was 5GB in size.

SELECT s.schema_name,  
   SUM(t.data_length + t.index_length)/1024/1024 total_size,  
  (SUM(t.data_length + t.index_length)-SUM(t.data_free))/1024/1024 data_used,
   SUM(data_free)/1024/1024 data_free
FROM INFORMATION_SCHEMA.SCHEMATA s INNER JOIN INFORMATION_SCHEMA.TABLES t
    ON s.schema_name = t.table_schema
GROUP BY s.schema_name
ORDER BY total_size DESC;

5GB seemed an excessively large size for that database, and provided a likely explanation for why the backup operations were taking so long.  To further investigate, I adjusted the SQL query to show the sizes of each individual table on the server.  The new query looked like this…

SELECT s.schema_name, table_name,  
  SUM(t.data_length + t.index_length)/1024/1024 total_size,  
(SUM(t.data_length + t.index_length)-SUM(t.data_free))/1024/1024 data_used, 
  SUM(data_free)/1024/1024 data_free
FROM INFORMATION_SCHEMA.SCHEMATA s INNER JOIN INFORMATION_SCHEMA.TABLES t
    ON s.schema_name = t.table_schema
GROUP BY s.schema_name, table_name
ORDER BY total_size DESC;

… which revealed that the "sessions" table in the database was 4GB.  Ah-ha!  Now the questions centered on the “sessions” table. What was it used for?  And what was a reasonable size for it?

A little research showed that the "sessions" table is a Drupal table that stores user session information for the web site.  If user sessions don’t expire, the “sessions” table grows indefinitely. 

One final SQL query, shown here, revealed that there was data in the “sessions” table that was more than two years old.  Notice the use of the FROM_UNIXTIME function to transform the timestamp values to a human-readable date format.

SELECT uid, sid, hostname,  
  FROM_UNIXTIME(timestamp, ‘%Y-%d-%m %H:%I:%S %X’) AS datecreated, 
  cache, session
from sessions
order by datecreated;

That gave me the explanation for why the problem became worse over time… user sessions were not expiring, which meant that the “sessions” table kept growing, and therefore the database backup and compress operations were taking longer and longer to complete.

So… the final question… why was this happening?  The blog post found at http://success.grownupgeek.com/index.php/2008/08/14/drupal-sessions-table/  held the answer.  It explains that  the settings.php file contains the Drupal settings that control the use of the “sessions” table.  The default settings are as follows:

ini_set(‘session.cache_expire’,     200000);
ini_set(‘session.cache_limiter’,    ‘none’);
ini_set(‘session.cookie_lifetime’,  2000000);
ini_set(‘session.gc_maxlifetime’,   200000);

200000 seconds sets a session length of more than 2 days. The blog post recommended settings for those values that were more like this (45 minute sessions):

ini_set(‘session.cache_expire’,     2700);
ini_set(‘session.cache_limiter’,    ‘none’);
ini_set(‘session.cookie_lifetime’,  0);
ini_set(‘session.gc_maxlifetime’,   2700);

I think that 45 minutes may be a bit short, but it does make sense to reduce the session length from the default settings (which are overlong, in my opinion).

I checked the configuration for the site (the file was located at /var/www/citebank.org/sites/default/settings.php) to see if the settings had been modified from the default. Nope; the site was still running at the default of more than 2 days per user session.  Still, even with the default settings there should not have been data more than two years old in the “sessions” table.

What was going on?  Going back to the blog post, I noticed a user comment that held the answer.  The comment reads “Debian/Ubuntu doesn’t have garbage collection set up by default in the apt-get version of PHP5”.  

Garbage collection is the process that cleans old data from the “sessions” table.  So, if garbage collection wasn’t configured, it would explain why the table contained such old data.  The site I was troubleshooting was running on PHP5 on Debian Linux.  apt-get  is a command line tool for managing software on Linux.  I did not know how (or if there even was a way) to find out if apt-get was used to install PHP, but it seemed likely.

The blog comment went on to describe some additional modifications to the settings.php file.  To ensure that garbage collection is configured, the following two lines need to exist (or be added) to the settings.php file:

ini_set(‘session.gc_probability’,  1);
ini_set(‘session.gc_divisor’,   100);

At this point, I was convinced that I had enough information to correct the problems.  I did have one concern, though.  Given that there was 4GB and millions of rows worth of data in the “sessions” table, I assumed that changing the garbage collection settings would cause Drupal to issue delete statements against all of the rows that were older than the configured length of a session.  That would be bad… millions of logged DELETE operations against a database happening all at once will bring the server to its knees.

I knew that a TRUNCATE TABLE statement issued against a SQL Server table was a quick (re: not logged) way of clearing out a table, but I wasn’t sure if that was also true on MySQL.  A little research showed that the version of MySQL on which the site was running would implement a TRUNCATE TABLE statement by actually dropping and recreating the “sessions” table… quick and easy.  And, no side effects other than a handful of active user sessions being prematurely dropped.  In this case, that was an acceptable side effect.

After having learned all of this, here are the steps that I took to correct the problem.

  1. Edit the settings.php file and modify the existing session variables: session.cache_expire, session.cache_limiter, session_cookie_lifetime, and session.gc_maxlifetime.
  2. Add the additional session variables to settings.php: session.gc_probability and session.gc_divisor.
  3. Issue a “TRUNCATE TABLE sessions” command against the MySQL database.
  4. Restart Apache.

That’s it.  A lot of pain caused by a few missing configuration settings.  Lesson learned:  to avoid unexpected problems, be sure to appropriately configure session lifetime and garbage collection when setting up a web site based on Drupal.

Determine the Space Used By MySQL 5 Schemas and Tables

I recently posted a script that can be used to obtain information about the space used by tables in SQL Server 2005/2008 databases.  The following scripts can be used to obtain equivalent information for MySQL 5+ databases.

Get the size of and free space of each schema

SELECT s.schema_name, 
   ROUND((SUM(t.data_length + t.index_length))/1024/1024,2) TotalSizeInMB, 
   ROUND(((SUM(t.data_length + t.index_length))-SUM(t.data_free))/1024/1024,2) DataUsedInMB, 
   ROUND(SUM(data_free)/1024/1024,2) DataFreeInMB, 
   ROUND((((SUM(t.data_length + t.index_length))-SUM(t.data_free)) /
      ((SUM(t.data_length + t.index_length)))*100),2) PctUsedInMB
FROM INFORMATION_SCHEMA.SCHEMATA s INNER JOIN INFORMATION_SCHEMA.TABLES t 
   ON s.schema_name = t.table_schema
GROUP BY s.schema_name
ORDER BY total_size_in_mb DESC;

 
Example output:

schema_name total_size_in_mb data_used_in_mb data_free_in_mb pct_used_in_mb
staging     961.22           932.68          28.54           97.03
production  950.34           913.37          36.97           96.11
mysql       0.50             0.50            0.00            100.00

 

Get the space used by each table in each schema

SELECT s.schema_name, table_name,
   ROUND((SUM(t.data_length + t.index_length))/1024/1024,2) TotalSizeInMB, 
   ROUND(((SUM(t.data_length + t.index_length))-SUM(t.data_free))/1024/1024,2) DataUsedInMB,
   ROUND(SUM(data_free)/1024/1024,2) DataFreeInMB,
   ROUND((((SUM(t.data_length + t.index_length))-SUM(t.data_free)) /
      
((SUM(t.data_length + t.index_length)))*100),2) PctUsedInMB
FROM INFORMATION_SCHEMA.SCHEMATA s INNER JOIN INFORMATION_SCHEMA.TABLES t
   ON s.schema_name = t.table_schema
GROUP BY s.schema_name, table_name
ORDER BY schema_name, total_size_in_mb DESC;

 
Example output:

                          total_size data_used data_free pct_used
schema_name table_name     in_mb      in_mb     in_mb     in_mb
production  search_index   476.66     476.66    0.00      100.00
production  search_dataset 104.57     104.57    0.00      100.00
production  biblio         96.37      93.53     2.84      97.05
production  node           41.40      41.40     0.00      100.00
staging     node_revisions 17.53      16.76     0.77      95.61
staging     search_total   17.13      17.13     0.00      100.00
staging     biblio_keyword 11.92      11.92     0.00      100.00
staging     xmlsitemap     9.65       9.65      0.00      100.00

 

Get the tables with free space within them

Use this script to find tables with more than 100KB of free space.  These tables are candidates for an OPTIMIZE TABLE operation.

SELECT s.schema_name, t.table_name,
   ROUND(SUM(data_free)/1024,2) DataFreeInKB
FROM INFORMATION_SCHEMA.SCHEMATA s INNER JOIN INFORMATION_SCHEMA.TABLES t
   ON s.schema_name = t.table_schema
GROUP BY s.schema_name, t.table_name
HAVING DataFreeInKB > 100
ORDER BY schema_name, DataFreeInKB DESC;

 
Example output:

schema_name table_name  data_free_in_kb
production  cache_block 12285.47
production  cache_page  11714.50
production  cache_form  4349.23
production  biblio      2911.40
production  sessions    2560.51
staging     biblio      11367.68
staging     watchdog    4858.80
staging     cache_form  3522.44
staging     sessions    1732.30
staging     node        1706.16
staging     cache_block 904.29

Follow

Get every new post delivered to your Inbox.