---------------------------------------------------------------

This collection of samples is all about the winning analysis for the 
2009 ASA/JSM (American Statistical Association / Joint Statistical Meeting) 
Data Analysis Expo ...

   http://www.amstat.org/meetings/jsm/2009/
   http://stat-computing.org/dataexpo/2009/
   
--------------------------------------------------------------

Team Members: Rick Wicklin (pictured) & Robert Allison



Details:

There were 22 csv files (one per year, 1987-2008),
with 1 line per flight, describing over 123,000,000 flights.
Each individual year csv file is about 500MB-600MB size.
Each individual year SAS file (data set) is about 1GB size.
The master SAS data set (all years) is about 20GB
(with the index being an extra 3GB).

Unlike whimpier software (such as Microsoft Excel), SAS is
very capable of handling data sets of this size.

Used a Linux machine to do the following...
Imported each year's csv into a separate SAS data set,
and also created a master data set containing all years.
Sorted and indexed the SAS data sets on "Year" and "UniqueCarrier",
since those are things we frequently subsetted the data by.

The data sets can be accessed directly via NFS, but to make
it easier to access the data from Unix and Windows platforms
(without needing NFS mounts, etc) set up a SAS/Share server.
This allows all hardware platforms to access the data using a 
simple SAS libname.

Many SAS/Graph analyses were run - some creating "static" 
interactive web output (all graphs & drilldowns created ahead
of time), and some set up with "dynamic" web output (each
graph and drilldown is created on-the-fly, via SAS/Intrnet).

To produce the final (winning) graphics, much data prep was 
performed using Base SAS (proc sql) to summarize the data,
and SAS/Graph and SAS IML Studio were used to create the
graphics.  

-----

Note: The bottom 4 samples were ones we didn't actually 
use in the poster, such as the gif animation (which would
be hard to put on a poster).  But I included them, in case
they might be useful to someone wanting to analyze this 
type of data.

Back to Samples Index