At WordCamp San Francisco 2011, Matt Mullenweg gave the a presentation entitled State of the Word. During the presentation, he talked about the 2011 WordPress User/Developer survey they did.
Then today they released an anonymized copy of the data as a compressed CSV file. I took a quick look at the CSV and whipped up the following MySQL script to load the data.
CREATE TABLE `survey` ( `id` int(11) NOT NULL AUTO_INCREMENT, `year_submitted` year, `how_use` varchar(255) DEFAULT NULL, `job_type` varchar(255) DEFAULT NULL, `c_do` text, `c_cms_blog` varchar(255) DEFAULT NULL, `c_customize` varchar(255) DEFAULT NULL, `c_number` varchar(255) DEFAULT NULL, `c_percent` varchar(255) DEFAULT NULL, `c_done_with_wp` varchar(255) DEFAULT NULL, `c_living` varchar(255) DEFAULT NULL, `d_do` text, `d_cms_blog` varchar(255) DEFAULT NULL, `d_customize` varchar(255) DEFAULT NULL, `d_number` varchar(255) DEFAULT NULL, `d_percent` varchar(255) DEFAULT NULL, `d_done_with_wp` varchar(255) DEFAULT NULL, `d_cost` varchar(255) DEFAULT NULL, `d_living` varchar(255) DEFAULT NULL, `u_do` text, `u_installed` varchar(255) DEFAULT NULL, `u_installed_other` varchar(255) DEFAULT NULL, `u_customize` varchar(255) DEFAULT NULL, `u_living` varchar(255) DEFAULT NULL, `x_living` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE = MyISAM COMMENT = 'WordPress 2011 Survey Results'; LOAD DATA INFILE '/var/lib/mysql/anon-data.csv' INTO TABLE `survey` FIELDS ENCLOSED BY '"' TERMINATED BY ',' LINES TERMINATED BY '\r' IGNORE 1 LINES (`how_use`, `job_type`, `c_do`, `c_cms_blog`, `c_customize`, `c_number`, `c_percent`, `c_done_with_wp`, `c_living`, `d_do`, `d_cms_blog`, `d_customize`, `d_number`, `d_percent`, `d_done_with_wp`, `d_cost`, `d_living`, `u_do`, `u_installed`, `u_installed_other`, `u_customize`, `u_living`, `x_living`); UPDATE `survey` SET `year_submitted` = YEAR(NOW());
This has only been tested on MySQL 5.1.54-1ubuntu4. It should work on any recent copy of MySQL, but YMMV. Also, I added 2 additional fields to the table. One is a simple ID field to make it easier to reference individual responses while the other is `year_submitted`. I added the latter field; so if they reuse this survey next year, I can simply add that year’s responses to the same table and track the differences. If I find the time, I may try digging into the data to see if I can find anything interesting in it (but don’t hold your breath on me finding the time to do so).