Comment by didgetmaster
3 years ago
I downloaded all the .CSV files from that site and quickly loaded them into a table. It just took a couple minutes, but I didn't stop to verify that there were not duplicate rows across the various files.
When I added up the totals, I got: APC - 7,225,399 LP - 5,286,181 PDP - 5,285,900 NNPP - 1,529,575
Note: I was using a beta version of a new database tool I created to do this.
should be something quick to whip up in a few minutes in pandas I'd think assuming the column headers are identical and in the same order. It would translate into a bunch of pandas concat call and with the merged table a value_counts for the column where the vote is retained.
I have no doubt that a pandas expert (or a postgres expert, or a mysql expert, or..) can whip up something fairly quickly to load in the data and find the totals.
My tool is designed for people who are not experts but just have a basic understanding of relational tables (e.g. someone comfortable with a spreadsheet) to be able to load a data set like this and analyze it with just a few clicks of a mouse. Using it, I was able to do the whole thing in about 2 minutes.
BTW: When my numbers did not match up with another HN commenter on this thread, I investigated and found a bug in my code. Once fixed the numbers were correct. (I guess that's why it is still in beta!)