I’m excited about the opportunity to pull apart data on campaign contributions over the past couple of years. It’s the first time I’ve had a chance to examine all the contributions a person makes to different candidates over the course of an election cycle. I’ve got the data, I know the initial questions I want to answer and I thought I knew how to reach those answers.
Instead, frustration. I think the problem is that there’s a whole lot of data. Well, let me rephrase that. It’s a big file, by no means huge. Yet even relatively simple manipulations of the data in either Excel or Filemaker seem to invariably bog down. They just can’t get the job done, and either stall in fact or effectively stall because they just are so darn slow.
I think both Excel and Filemaker should be able to handle much larger files, so perhaps I’m doing something wrong along the way since I’m anxious to get to those interesting results.
My fallback plan is to reinstall Windows and Microsoft Access, which typically handles lots of data with ease.
But although I’ve been through this same experience on earlier occasions, I still have a hard time believing that on this dimension, the Mac experience falls so short.
It shouldn’t be that hard to take 25,000 or so contributions, sort out the ones that came from individuals rather than PACs or corporations, and then aggregate and rank them to get a list of the top individual donors. I’ve gotten that far, at least a couple of times.
Then the data need to be cleaned up. Names get written down differently, say with or without a middle initial, or an initial with a period and without a period, and sometimes someone’s name might appear as “Steve” and other places “Steven”. So once I’ve got a rough list of the biggest rollers, I’ll have to comb through the data manually and make sure that all the different variations are included in the total for each person.
Then the next stage is to look for clumps of donors. Officers or employees of the same company, for example. Or employers of the same lobbyist. The goal is to see the patterns of influence beyond individual donors, to find the connections between them.
Anyway, that’s where I’m heading. Any advice/assistance much appreciated.