Looking for cool data to visualize for my Google I/O talk (High Performance KML for Google Earth and Maps), I found a few datasets from the US Census and American Community Survey that were pretty interested.
(Download this KML at the bottom of the page... read the warning ;P)
- County and State-level data for 1900-1990 census population counts (described and also available here) were found in compiled form here at the National Bureau of Economic Research
- 2000 & 2010 County and State data was downloaded from queries on the US Census American FactFinder #2 page
- Data on houses lacking indoor plumbing (1940-1990) was found here
- Data on citizens and children lacking health insurance (1987-2009) was found here and here
Most of this data was pretty messy, in a number of different formats (Excel, csv, raw text) and required a lot of scrubbing. I used Google Refine, Excel and Open Office Calc and good ol' Notepad++ to massage this data into a usable format (CSV).
I then found state and county polygon data on Google Fusion Tables. By maintaining the FIPS codes for all the states and counties in the Census/ACS data, merging with the geometries in Fusion Tables was pretty easy. I then downloaded a master CSV file from Fusion Tables that had a row for each state/county, and columns for each of the time-series values.
Here is the Fusion Table of all the raw data I used to make the map: raw data
To generate a KML from this csv file I used the python-based pyKML library, developed by Tyler Erickson. It is based on lxml's Objectify API, and supports schema validation against the KML XSDs. I’ll try to share some code snippets once I clean it up... right now it’s a bunch of spaghetti :)
I also used the Google Charts API Wizard to dynamically generate graphs for each state/county during the animations (you must start the tour, pause or wait until the end and then click on a polygon to see the chart). This was made possible by a lot of <BalloonStyle> + <ExtendedData> hackery (see this tutorial).
The final product is almost 650K lines of KML including 200K+ polygon vertices and point coordinates!
Ultimately, we need to add a feature to KML that allows you to associate altitudes with timestamps, like the way <gx:Track> works. But in the mean time, this will do :)
U.S. Population by State
California Population by Counties
Percent of children lacking health care in 1987 by State
Chart API for children without health insurance in 1994 in Texas
Percent of homes without complete indoor plumbing 1940
Warning: This is a massive KML (3MB compressed, 33MB uncompressed). The "Population (1900-2010) (County)" Tour is particularly cumbersome for laptops/older computers, because there are 3000+ animated county polygons. Click on the tour in the left hand panel to view the data.
Here are some of the KML tags I used to make this: