Nov
13

Random Data Generator Class

Programming 1 Comment

As a programmer, I’ve often needed to generate random data of one kind or another, whether it be names, addresses or just random text. Such a task is usually very boring and tedious, and I’ve thought on several occassions ‘I should really write a tool to automate this’. Well now I have…

Introducing SPF_Random

SPF_Random is the latest class to join Si’s PHP Framework (SPF) and includes methods to generate various forms of random data:

  • names (forenames, surnames and full names)
  • addresses
  • postcodes
  • phone numbers
  • dates
  • job titles
  • lorem ipsum text (words or paragraphs)

It’s not perfect (addresses and phone numbers are UK oriented for example) and I’m sure there are better solutions elsewhere on the interweb; but it’s certainly a very quick and simple way to generate personal details for a hundred random people. Plus it’s a completely stand-alone class, you don’t need to download the whole framework to be able to use it.

For the future I’m thinking of adding IP addresses, alphanumeric strings and item lists (especially useful for <select> lists).

Nov
5

MySQL Storage Engines

MySQL 2 Comments

Storage engines are responsible for storing and retrieving all the data stored “in” MySQL. Each storage engine has its own strengths and weaknesses that determine its suitability in a particular situation.

MyISAM

MyISAM is the default storage engine in MySQL and provides a good comprimise between performance and features.

Pros:

  • platform independent
  • concurrent inserts
  • full-text indexes
  • compression

Cons:

  • no transations
  • table-level locking
  • potentially long repair times

Good For:

  • applications with many reads and few writes.

InnoDB

InnoDB is a transactional storage engine that uses MVCC and row-level locking, and includes automatic crash recovery.

Pros:

  • ACID transactions
  • row-level locking and MVCC
  • crash recovery
  • clustered indexes
  • foreign key constraints

Cons:

  • alterations to table structure can be slow on large tables

Good For:

  • online ordering and other transaction based applications.

Memory

Memory tables store all their data in memory which means they are very fast because there is no waiting for disk I/O. They also use hash indexes which makes them very fast for lookup queries. The table definition of a Memory table will survive a server restart, but all data will be lost.

Pros:

  • very fast

Cons:

  • uses fixed-length rows which can waste memory
  • table-level locking
  • no support for TEXT or BLOB datatypes
  • no transactions

Used for:

  • lookup or mapping tables
  • caching results of periodically aggregated data
  • intermediate results when analysing data

Archive

The archive engine is optimised for high-speed inserting and data compression. It supports only INSERT and SELECT queries and doesn’t support DELETE, REPLACE, or UPDATE queries, or indexes. Rows are buffered and compressed using zlib as they are inserted which means much less disk I/O than MyISAM tables.

Pros:

  • fast INSERTs
  • compression

Cons:

  • no support for indexes, SELECTs will perform a full table scan.

Uses:

  • storing large amounts of rarely accessed data in a very small footprint, e.g. logs and audit records

CSV

The CSV storage engine stores data in text files using comma-seperated values. Other applications can open the table data file directly and read the contents. Likewise, if an application exports a CSV and saves it in the server’s data directory, the server can read the file straight away. CSV tables do not support indexes.

Uses:

  • data interchange and certain types of logging.

Other Engines

There are several other storage engines available.

  • Blackhole – Essentially a no-op storage engine, all INSERTS are discarded although they are recorded in the binary log and can be replayed on slaves.
  • Federated – Federated tables refer to tables on a remote MySQL server.
  • NDB Cluster – A specialised storage engine designed for high-performance with redundancy and load-balancing capabilities.
  • Falcon – A next-generation storage engine designed for todays hardware (64bit CPUs and plenty of memory).
  • Maria – A replacement for MyISAM that includes transactions, row-level locking, MVCC, and better crash recovery.

Blackhole, Federated and NDB Cluster are suitable only for specific purposes and should only be used after careful consideration. Falcon and Maria are the two modern storage engines although neither are currently considered production-stable.