rulururu

post New Design and a New Domain

April 18th, 2008

Filed under: General, Personal — Brenton Alker @ 19:17

Well, I finally bought myself a domain. You can now find me at my new home on the web: blog.tekerson.com (Not that you get a choice in the matter, the old domain redirects).
To go along with it, I’ve got a new design (for those of you that actually come back). I think it’s better suited than the old one.
That is all!

post Initial thoughts on CouchDB

April 14th, 2008

Filed under: CouchDB, Software — Brenton Alker @ 15:18

Since I was first introduced to CouchDB late last year when Jan Lehnardt discussed it on the PHP Abstract Podcast my interest was piqued. I have since read as much as I can about it, which doesn’t actually amount to a great deal unfortunately. However, I think the idea has merit as an alternative to relational databases in various scenarios.

I have been using memcached to store aggregated objects and it often seems pointless to break them into disparate parts to fit into a relational schema. While I realise CouchDB is not an object cache, being able to store data "as is", in a form (JSON) that is directly malleable by many languages makes a lot of sense.

The "views", used to retrieve data, select the documents to return by passing them through user defined JavaScript filter functions, instead of querying using SQL as in most relational databases; providing near infinite flexibility for data retrieval.

The major concern that I haven’t seen adequately addressed is performance, especially given larger datasets. With the unstructured nature of the data and the flexibility of the view functions, it seems performance could be a challenge. Though given Erlang’s ground up concurrency and some of the algorithmic genius they have apparently borrowed from Google’s MapReduce — datasets don’t get much larger than that — I might yet be proved wrong. I hope I am.

post Using a MySQL table as a thread-safe queue

April 8th, 2008

Filed under: Code, MySQL — Brenton Alker @ 10:56

As part of the current application I am developing, I have the need for a reliable queue that is not going to allow duplicate reads when popped from multiple threads or processes.

The queue in this case is an outgoing mail queue. The system needs to read the task from the queue, generate the email by substituting in the member’s details, and then send it to the mail server.

When reading the queue with only 1 process, it is easy — read the queue, process the email, delete the queue entry, repeat. Concurrency adds the problem that 2 (or more) processes could read the same email, and we really don’t want an email going to a member 2 or 3 times.

My first solution involved a single "master" thread that would read the queue and delegate the processing to worker threads. While this worked, it was complicated and error prone. After some discussion with some people on #mysql on freenode. I found what should be a suitable database level method.

By selecting the queue entry with the FOR UPDATE modifier, the row is placed under an exclusive lock — the same lock used when a row is being updated, and won’t be allowed to be read until it is updated (N.B. only works when using the INNODB storage engine)

SELECT id, task FROM queue WHERE processing = 0 FOR UPDATE;

The process now has a lock on that row, and it won’t be read by any others. It can then be updated to mark it as "being processed" or deleted from the queue, depending on your needs.

UPDATE queue SET processing = 1 WHERE id = :id;

With the queue entry safely belonging to the thread, it can now take as long as it needs to process. By keeping the time between the SELECT…FOR UPDATE and UPDATE to a minimum the throughput should be increased significantly from the original non-concurrent solution.

ruldrurd
Powered by WordPress, Web Design by Laurentiu Piron
Entries (RSS) and Comments (RSS)