Home > Uncategorized > Have a Coding Buddy

Have a Coding Buddy

Often, all a developer needs to solve a problem is to hear himself or herself state said problem out loud, and the answer (which had previously plagued them) seems evident with obnoxious clarity.  I had a similar moment just now.  I was about to post the following question on Stack Overflow.  I encourage you to follow along, and try to solve the problem with me.  You’ll see where I had my epiphany and stopped typing.  All I had to do was state the problem.  My great friend Frank and I often do this to one another on a very regular basis.  Here’s the (one sided) conversation:  “So, I’m having this problem where [fill in the first 70% of the explanation here]… Oh.  Never mind.  Duh.”  For this reason, I’d highly encourage every developer to find, and maintain a tight relationship with, an exit buddy coding buddy.

Exit-Buddy

Here’s the contents of the question I nearly posted:

Premise: I have a MySQL table defined as follows:

CREATE TABLE IF NOT EXISTS `stuff` (
    `id` int(10) unsigned NOT NULL auto_increment,
    `hash` varchar(255) NOT NULL,
    `field1` varchar(255) NOT NULL,
    `field2` varchar(255) NOT NULL,
    `field3` varchar(255) NOT NULL,
    PRIMARY KEY  (`id`),
    KEY `hash` (`hash`)
);

This table can easily grow to a few million rows or more.  Every record has a ‘hash’ value which is the result of running some of the data fields through SHA1().  In practice, I see very high rates of duplication here.  Hundreds of thousands of records are loaded into this table using ‘LOAD DATA LOCAL INFILE’ queries on a nightly basis.

I implemented this solution for ensuring all duplicate records were removed, in the form:

DELETE stuff
    FROM stuff
    LEFT OUTER JOIN (
    SELECT MIN(id) AS id
        FROM stuff
        GROUP BY hash
    ) AS keepers ON (stuff.id = keepers.id )
    WHERE keepers.id IS NULL
;

I find that after around 6k rows, the complexity of the DELETE query begins to really slow down.  Is there a more time/space efficient solution to achieving the same end?  Or perhaps a different way to phrase my DELETE query so as to preclude certain overheads?  As far as I’m aware, there isn’t a way to combine ‘LOAD DATA LOCAL INFILE’ with ‘INSERT IGNORE’ – otherwise I’d simply

… And that’s when I thought I might just – on a whim (just in case I’m wrong) – check the MySQL documentation.  In other words, “RTFM.”  And what would you know.  All I needed was to state the problem, and the obvious answer smacked me in the face.  In this particular case, my coding buddy was a website (Stack Overflow), but usually it’s another developer.  In any case, you need to have a coding buddy.

If you’re interested in more bulk loading goodness, you can get it straight from the horses mouth.

Update: Evidently, Jeff Atwood seems to agree with me, although his language and arguments are far more graceful.

  1. No comments yet.
  1. No trackbacks yet.