At WordCamp San Francisco 2011, Matt Mullenweg gave the a presentation entitled State of the Word. During the presentation, he talked about the 2011 WordPress User/Developer survey they did.

Then today they released an anonymized copy of the data as a compressed CSV file. I took a quick look at the CSV and whipped up the following MySQL script to load the data.

CREATE TABLE `survey` (
 `id` int(11)  NOT NULL AUTO_INCREMENT,
 `year_submitted` year,
 `how_use` varchar(255) DEFAULT NULL,
 `job_type` varchar(255) DEFAULT NULL,
 `c_do` text,
 `c_cms_blog` varchar(255) DEFAULT NULL,
 `c_customize` varchar(255) DEFAULT NULL,
 `c_number` varchar(255) DEFAULT NULL,
 `c_percent` varchar(255) DEFAULT NULL,
 `c_done_with_wp` varchar(255) DEFAULT NULL,
 `c_living` varchar(255) DEFAULT NULL,
 `d_do` text,
 `d_cms_blog` varchar(255) DEFAULT NULL,
 `d_customize` varchar(255) DEFAULT NULL,
 `d_number` varchar(255) DEFAULT NULL,
 `d_percent` varchar(255) DEFAULT NULL,
 `d_done_with_wp` varchar(255) DEFAULT NULL,
 `d_cost` varchar(255) DEFAULT NULL,
 `d_living` varchar(255) DEFAULT NULL,
 `u_do` text,
 `u_installed` varchar(255) DEFAULT NULL,
 `u_installed_other` varchar(255) DEFAULT NULL,
 `u_customize` varchar(255) DEFAULT NULL,
 `u_living` varchar(255) DEFAULT NULL,
 `x_living` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE = MyISAM COMMENT = 'WordPress 2011 Survey Results';

LOAD DATA INFILE '/var/lib/mysql/anon-data.csv'
 INTO TABLE `survey`
 FIELDS ENCLOSED BY '"' TERMINATED BY ','
 LINES TERMINATED BY '\r'
 IGNORE 1 LINES
 (`how_use`, `job_type`, `c_do`, `c_cms_blog`, `c_customize`,
`c_number`, `c_percent`, `c_done_with_wp`, `c_living`, `d_do`, `d_cms_blog`,
`d_customize`, `d_number`, `d_percent`, `d_done_with_wp`, `d_cost`, `d_living`,
`u_do`, `u_installed`, `u_installed_other`, `u_customize`, `u_living`,
`x_living`);

UPDATE `survey` SET `year_submitted` = YEAR(NOW());

This has only been tested on MySQL 5.1.54-1ubuntu4. It should work on any recent copy of MySQL, but YMMV. Also, I added 2 additional fields to the table. One is a simple ID field to make it easier to reference individual responses while the other is `year_submitted`. I added the latter field; so if they reuse this survey next year, I can simply add that year’s responses to the same table and track the differences. If I find the time, I may try digging into the data to see if I can find anything interesting in it (but don’t hold your breath on me finding the time to do so).

Related Posts