Have a Coding Buddy
Often, all a developer needs to solve a problem is to hear himself or herself state said problem out loud, and the answer (which had previously plagued them) seems evident with obnoxious clarity. I had a similar moment just now. I was about to post the following question on Stack Overflow. I encourage you to follow along, and try to solve the problem with me. You’ll see where I had my epiphany and stopped typing. All I had to do was state the problem. My great friend Frank and I often do this to one another on a very regular basis. Here’s the (one sided) conversation: “So, I’m having this problem where [fill in the first 70% of the explanation here]… Oh. Never mind. Duh.” For this reason, I’d highly encourage every developer to find, and maintain a tight relationship with, an exit buddy coding buddy.

Here’s the contents of the question I nearly posted:
Premise: I have a MySQL table defined as follows:
CREATE TABLE IF NOT EXISTS `stuff` ( `id` int(10) unsigned NOT NULL auto_increment, `hash` varchar(255) NOT NULL, `field1` varchar(255) NOT NULL, `field2` varchar(255) NOT NULL, `field3` varchar(255) NOT NULL, PRIMARY KEY (`id`), KEY `hash` (`hash`) );
This table can easily grow to a few million rows or more. Every record has a ‘hash’ value which is the result of running some of the data fields through SHA1(). In practice, I see very high rates of duplication here. Hundreds of thousands of records are loaded into this table using ‘LOAD DATA LOCAL INFILE’ queries on a nightly basis.
I implemented this solution for ensuring all duplicate records were removed, in the form:
DELETE stuff FROM stuff LEFT OUTER JOIN ( SELECT MIN(id) AS id FROM stuff GROUP BY hash ) AS keepers ON (stuff.id = keepers.id ) WHERE keepers.id IS NULL ;
I find that after around 6k rows, the complexity of the DELETE query begins to really slow down. Is there a more time/space efficient solution to achieving the same end? Or perhaps a different way to phrase my DELETE query so as to preclude certain overheads? As far as I’m aware, there isn’t a way to combine ‘LOAD DATA LOCAL INFILE’ with ‘INSERT IGNORE’ – otherwise I’d simply
… And that’s when I thought I might just – on a whim (just in case I’m wrong) – check the MySQL documentation. In other words, “RTFM.” And what would you know. All I needed was to state the problem, and the obvious answer smacked me in the face. In this particular case, my coding buddy was a website (Stack Overflow), but usually it’s another developer. In any case, you need to have a coding buddy.
If you’re interested in more bulk loading goodness, you can get it straight from the horses mouth.
Update: Evidently, Jeff Atwood seems to agree with me, although his language and arguments are far more graceful.