Mar
19

Practical LAMP Optimisations

Programming No Comments

There are numerous articles floating around in internet land eschewing the various ways to squeeze the very last drop of performance from PHP. Unfortunately most of the time these articles list micro-optimisations that are quite frankly a complete waste of time. Two of the all-time classics have to be “echo is faster than print” and the whole “single vs double quoted strings”.

What most of the articles seem to ignore, or at least gloss over, is that PHP rarely stands on it’s own. More often than not it’s a single component of a larger (typically LAMP based) application.

There’s no point changing all instances of print to echo or worrying about which quoting style to use, while ignoring the mammoth database query that runs on every page request against fifteen unindexed tables containing millions of records.

The best way to ensure good application performance is to write Good Code™ and implement Good Database Design™ in the first place. Both of which are topics I’ll be covering in the future. In the mean time, here’s my guide to practical optimisations for LAMP applications…

  • Use an opcode cache such as APC or XCache.
    This is mostly likely the single biggest performance gain you will achieve with PHP!
  • Use E_STRICT and don’t suppress errors.
    Setting error_reporting to E_STRICT and not using @ to suppress errors and warnings will catch most common code bugs such as undefined variables and unquoted strings. Use the following to ignore errors when absolutely necessary:
    $old = error_reporting(0);
    action();
    error_reporting($old);
  • Identify slow queries.
    Turn on the Slow Query Log in MySQL and use EXPLAIN to examine offending queries.
  • Make Fewer HTTP Requests.
    Combine and minify your Javascript and CSS files. Consider implementing CSS sprites to reduce requests for images.
  • Enable output compression on the webserver.
    This will reduce the amount of data travelling over the network, reducing the time required to load each page.
  • Use the Expires header.
    This will greatly reduce the number of requests for static content such as images, Javascript and CSS.

The above tips should be at the top of your optimisation list as they are almost certainly the ones that will provide the largest performance increases. Once they have been taken care of we move into the realms of more specific optimisations of code, queries and server settings; which I’ll be covering in future posts.

Jan
20

MySQL Datatypes

MySQL No Comments

Choosing the wrong data type to store your data can have a dramatic effect on performance. This post will cover the general guidelines for choosing the correct data type along with summaries of the various data types available in MySQL.

General

  • Smaller is better (usually) - smaller data types take up less space in memory and on disk. Choose the smallest data type you don’t think you’ll exceed.
  • Strings are slow - character data types require extra processing to handle character sets and collations, use the built in data types for dates/times and store IP addresses as integers.
  • Avoid NULL - nullable columns require extra processing and use more storage space; not to mention storing NULL values is bad database design in 9/10 cases.

Integers

MySQL has five integer data types ranging from 8 to 64 bits, all of which can be signed or unsigned. The “width” setting is only used for formatting purposes by the MySQL interactive tools (the command-line client for example). It does not restrict the range of allowed values and has no bearing on how the values are stored or processed during computation; INT(1) is identical to INT(20).

Type Bytes Minimum Value Maximum Value
(Signed/Unsigned) (Signed/Unsigned)
TINYINT 1 -128 127
0 255
SMALLINT 2 -32,768 32,767
0 65,535
MEDIUMINT 3 -8,388,608 8,388,607
0 16,777,215
INT 4 -2,147,483,648 2,147,483,647
0 4,294,967,295
BIGINT 8 -9,223,372,036,854,775,808 9,223,372,036,854,775,807
0 18,446,744,073,709,551,615

Real Numbers

MySQL has three datatypes for storing real numbers. FLOAT and DOUBLE are used to represent approximate numeric values and DECIMAL is used to store exact numeric values.

DECIMAL should be used in cases where it is important to preserve exact precision, for example with monetary data, as you can specify the number of digits allowed before and after the decimal point. The precision choosen will affect the space required to store the value with digits being packed into binary strings (nine digits per four bytes).

Type Bytes Minimum Value Maximum Value
FLOAT 4 ±1.175494351E-38 ±3.402823466E+38
DOUBLE 8 ±2.2250738585072014E-308 ±1.7976931348623157E+308
DECIMAL Varies The maximum range of DECIMAL values is the same as for DOUBLE, but the actual range for a given DECIMAL column is constrained by the choice of precision.

Strings

The two main string data types are VARCHAR and CHAR.

VARCHAR columns store variable length strings and can require less storage space than CHAR columns as they only use as much space as they need. VARCHAR uses 1 or 2 extra bytes to record the value’s length. VARCHAR columns generally help performance as they use less space, however, extra work is needed if an UPDATE causes a row to grow and no longer fit in its original location.

VARCHAR is generally worth using when the maximum column length is much greater than the average length, or when updates are rare. CHAR columns are used for fixed length fields such as MD5 hashes, very short values such as Y or N and for data that’s changed frequently.

The above defines how the values are stored on disk, not necessarily in memory. MySQL often allocates fixed-size chunks of memory to hold values internally, which may lead to performance issues if being over-generous with the size of VARCHAR columns.

Type Bytes Maximum Length
CHAR L 255
VARCHAR L + 1/2 65,535

BLOB and TEXT

BLOB and TEXT data types are designed to store large amount of binary and character data respectively. There are four sub-types for each determining how much data each type can hold. When sorting BLOB and TEXT columns, MySQL sorts only the first max_sort_length bytes of the column. In addition, MySQL can’t index the full length of these columns and can’t use the indexes for sorting.

Type Bytes Maximum Length
TINYBLOB / TINYTEXT L + 1 255
BLOB / TEXT L + 2 65,535
MEDIUMBLOB / MEDIUMTEXT L + 3 16,777,215
LONGBLOB / LONGTEXT L + 4 4,294,967,295

Dates and Times

MySQL has five datatypes for storing temporal values, DATETIME, TIMESTAMP, DATE, TIME and YEAR. The last three (DATE, TIME and YEAR) should be sufficiently self explanatory in regards of what types of value they store. DATETIME and TIMESTAMP are used to store both date and time values in a single column, and are very similar.

DATETIME columns can hold a large range of values, are independent of timezone and are packed into an integer in YYYYMMDDHHMMSS format.

TIMESTAMP columns store the number of seconds elasped since the UNIX epoch (Midnight, 1st January 1970), the same as a UNIX timestamp. The values displayed also depend on the current time zone settings. TIMESTAMP columns also have special behavior when inserting and updating rows.

Type Bytes Minimum Value Maximum Value
TIMESTAMP 4 1970-01-01 00:00:01 2038-01-09 03:14:07
DATETIME 8 1000-01-01 00:00:00 9999-12-31 23:59:59
DATE 3 1000-01-01 9999-12-31
TIME 3 -838:59:59 838:59:59
YEAR 1 1901 2155

Enums

Enums columns are typically used for string values where the range of permitted values is quite small, e.g. state, gender, status, etc. Each Enum column can store up to 65.535 distinct values, with each value being stored as an integer representing its position in the field definition list. An Enum column therefore requires one or two bytes depending on the number of items in the list. Enum columns are sorted using the integer values and not the actual string values.

Nov
13

Random Data Generator Class

Programming 1 Comment

As a programmer, I’ve often needed to generate random data of one kind or another, whether it be names, addresses or just random text. Such a task is usually very boring and tedious, and I’ve thought on several occassions ‘I should really write a tool to automate this’. Well now I have…

Introducing SPF_Random

SPF_Random is the latest class to join Si’s PHP Framework (SPF) and includes methods to generate various forms of random data:

  • names (forenames, surnames and full names)
  • addresses
  • postcodes
  • phone numbers
  • dates
  • job titles
  • lorem ipsum text (words or paragraphs)

It’s not perfect (addresses and phone numbers are UK oriented for example) and I’m sure there are better solutions elsewhere on the interweb; but it’s certainly a very quick and simple way to generate personal details for a hundred random people. Plus it’s a completely stand-alone class, you don’t need to download the whole framework to be able to use it.

For the future I’m thinking of adding IP addresses, alphanumeric strings and item lists (especially useful for <select> lists).

Nov
5

MySQL Storage Engines

MySQL 2 Comments

Storage engines are responsible for storing and retrieving all the data stored “in” MySQL. Each storage engine has its own strengths and weaknesses that determine its suitability in a particular situation.

MyISAM

MyISAM is the default storage engine in MySQL and provides a good comprimise between performance and features.

Pros:

  • platform independent
  • concurrent inserts
  • full-text indexes
  • compression

Cons:

  • no transations
  • table-level locking
  • potentially long repair times

Good For:

  • applications with many reads and few writes.

InnoDB

InnoDB is a transactional storage engine that uses MVCC and row-level locking, and includes automatic crash recovery.

Pros:

  • ACID transactions
  • row-level locking and MVCC
  • crash recovery
  • clustered indexes
  • foreign key constraints

Cons:

  • alterations to table structure can be slow on large tables

Good For:

  • online ordering and other transaction based applications.

Memory

Memory tables store all their data in memory which means they are very fast because there is no waiting for disk I/O. They also use hash indexes which makes them very fast for lookup queries. The table definition of a Memory table will survive a server restart, but all data will be lost.

Pros:

  • very fast

Cons:

  • uses fixed-length rows which can waste memory
  • table-level locking
  • no support for TEXT or BLOB datatypes
  • no transactions

Used for:

  • lookup or mapping tables
  • caching results of periodically aggregated data
  • intermediate results when analysing data

Archive

The archive engine is optimised for high-speed inserting and data compression. It supports only INSERT and SELECT queries and doesn’t support DELETE, REPLACE, or UPDATE queries, or indexes. Rows are buffered and compressed using zlib as they are inserted which means much less disk I/O than MyISAM tables.

Pros:

  • fast INSERTs
  • compression

Cons:

  • no support for indexes, SELECTs will perform a full table scan.

Uses:

  • storing large amounts of rarely accessed data in a very small footprint, e.g. logs and audit records

CSV

The CSV storage engine stores data in text files using comma-seperated values. Other applications can open the table data file directly and read the contents. Likewise, if an application exports a CSV and saves it in the server’s data directory, the server can read the file straight away. CSV tables do not support indexes.

Uses:

  • data interchange and certain types of logging.

Other Engines

There are several other storage engines available.

  • Blackhole - Essentially a no-op storage engine, all INSERTS are discarded although they are recorded in the binary log and can be replayed on slaves.
  • Federated - Federated tables refer to tables on a remote MySQL server.
  • NDB Cluster - A specialised storage engine designed for high-performance with redundancy and load-balancing capabilities.
  • Falcon - A next-generation storage engine designed for todays hardware (64bit CPUs and plenty of memory).
  • Maria - A replacement for MyISAM that includes transactions, row-level locking, MVCC, and better crash recovery.

Blackhole, Federated and NDB Cluster are suitable only for specific purposes and should only be used after careful consideration. Falcon and Maria are the two modern storage engines although neither are currently considered production-stable.

Oct
26

Yet Another Redesign

Projects 1 Comment

If I spent as much time writing posts for this site as I do redesigning it…

After reviewing the previous design over a period of time, and listening to comments made by some of my friends, it was time for some design tweaks.

Gone are the rounded corners and grey headers. Now there’s sexy gradients and borders. There’s also a splash of colour (orange and green) to offset the heavy use of blue.

The biggest change is to the links pages. These now use the feeds from my new Delicious account, which makes it insanely easy for me to share the cool stuff I find on the interweb. The only problem now will be the amount of cool stuff I find. I think some bi-monthly pruning may be required to remove all the ‘fluff’.

The old links pages and Weekly Round-up posts have been removed and their links moved across to Delicious. The Daily Reading and a new Friends category have been placed in the blog sidebar, which means Leyton should stop pestering me about linking to his site.

This still leaves the problem of me actually writing some posts… So the future plan is to create some summary posts on a variety of topics so that this site will become my own quick reference guide. The first few posts will be MySQL based and cover storage engines, datatypes, query optimisations and other goodies.

Oct
20

Deployed - Ayerst Environmental

Projects No Comments

My first solo web design gig is now live!

Ayerst Environmental are an asbestos consultancy based on the outskirts of London. They wanted a revamping of their original (and very dated) site along with the ability to modify the content themselves.

After investigating a couple of different approaches, I decided on using WordPress for the backend of the site. The main reasons for this were my previous experience with WordPress and it’s ease of use. I felt a “full blown” CMS would be overly complicated both in terms of setting up and future maintenance. The KISS principle certainly applies in this case.

The result is a clean and consistent design with content that can be easily updated when needed.

Jul
30

Cuil Is Not So Cool

Technology 1 Comment

For those of you that haven’t already heard, there a new player on the search scene. Cuil (pronounced “cool”) is taking on the Big 3 (Google, Yahoo and Microsoft) and claims a larger search index (120 billion web pages) than any of them.

Comparisons with Google are inevitable, especially given the founders are ex-Googlers, and the general reaction so far is that Cuil fails to live up to expectations, primarily with regard to result relevance. Indeed, my first impressions were certainly not that great…

Having recently released a website for my Air Cadet squadron I thought I’d see how easy it is to find using Cuil. The query was “1343 squadron”; the results were, pretty useless, seeming to be a random collection of results for either the number 1343 or the word squadron. There was no apparent linking of the two. Appending “atc” to the query to narrow the search, results in absolutely nothing found, whereas the top four Google results for both queries are directly relevant to the search.

The biggest problem for Cuil at the moment is the risk of people trying it out, only to discover it doesn’t help them find what they’re looking for and simply returning to Google. The tech-savvy crowd will no doubt stick around to see how things progress, but I doubt the majority of web users will be so forgiving of search results that are lacking in relevance.

Further reading:

Jul
23

The Number 1 Error Made By Online Retailers

Technology No Comments

The other week I was discussing the issues I have with my knees when running and it was suggested I buy a knee support. So being the web-literate and lazy person that I am, I did a quick Google search in order to find a purveyor of said items. Every site I went to required me to create an account before being able to make a purchase.

Which brings me on nicely to the Number 1 Error Made By Online Retailers…
Why, oh why, oh why, do I need to sign up, create an account or join your community in order to purchase something from you?

All I want is a knee support, a bunch of flowers, a simple birthday gift; I’m not looking for a long term relationship at this point. If you do well this time and manage to deliver what I ordered, on time and for a reasonably price I may very well come back, but I don’t know you well enough yet. I don’t want to worry about how you store my password, address and credit card details, or whether you’re going to send me spam.

The number one thing that will make me leave your website and probably never come back is requiring me to open an account in order to complete a simple and potentially one-off purchase. I don’t have to do it when I shop in the high street, so why do you feel it’s ok on the Web?

A couple of years ago I used to spend a lot of time and money at Overclockers.co.uk. Their prices weren’t the lowest but the website was simple and uncluttered, and more importantly I didn’t need to create an account. Just find what I want to buy, go to the checkout, fill in the absolute minimum details for completing a credit card transaction over the internet and that’s it. Then they redesigned their site and included the apparently obligatory “My Account” functionality. Unfortunately the redesigned site was almost identical to Dabs and eBuyer and had totally lost its simple and original look and feel. Seeing as I already had accounts with eBuyer and Dabs and didn’t want another one, I stopped shopping there.

If you’re a web developer creating online retail sites, think twice before requiring an account before allowing a user to make a purchase, you could be causing your client to lose sales…

Jun
13

Calculus In 20 Minutes

Random Verbiage No Comments

An entire Calculus 1 course in 20 minutes. I even vaguely understood some of it as well…

Part 1, Part 2

Jun
13

Open Source PHP Projects and E_STRICT

Programming No Comments

PHP 5 included a new error reporting directive, E_STRICT, which according to the manual allows PHP to “suggest changes to your code which will ensure the best interoperability and forward compatibility“.

Basically it catches things that have been deprecated (such as call-time pass-by-reference) or are technically incorrect (such as calling a static method in a non-static way, and vice-versa).

I have this enabled on my development and production servers. Mainly because I want to know when I do something stupid like call a static method incorrectly, but also because I’m a PHP purist and believe the PHP error log shouldn’t contain anything unless explicitly put there by the application.

Recently I’ve noticed that numerous PHP applications and libraries emit all kinds of E_STRICT notices. Offenders noted so far are ADOdb, Smarty, Wordpress, Plogger and Gallery v2.

All of these are fanastic projects but the fact the code within them emits E_STRICT notices means they’re harder for me to implement and use. I usually end up having to globally search replace something, or call error-reporting() to set a lower error reporting level. The latter is made a lot harder when the application has multiple entry points (e.g. WordPress and Plogger).

The reasons behind these E_STRICT notices are usually due to the project remaining backwardly compatible with PHP4. Especially with regard to call-time pass-by-reference, as this was used extensively in PHP4 to speed up code execution when passing objects around. However the incorrect calling of a static method is a different story, it’s just sloppy OOP in my opinion. Simply testing with E_STRICT turned on would highlight these and fixing them would save a lot of headaches for those of us who run servers with E_STRICT turned on.

Alas, for that to happen we’d need projects to forego the backward compatibility with PHP4, and that is extremely unlikely. No large project with any sense is going to tear their community in two because of people like me harping on about E_STRICT.

Our only hope is for new projects to spring up focused solely on PHP5 and “proper” OOP…