Click here to see the SAS code.
Click here to see the example.
---------------------------------------------------------------
I got all the mpg data (2007-1984) from:
fueleconomy.gov
Some years were in excel spreadsheets, some in csv's, and some
in column-delimited text files. I used import.sas to read in
all the data, and fill in missing values (such as 'Class') where
that seemed safe, and combined into a single dataset (mpg_data).
According to the "Gas Guzzler" faq on this page, the 'unadj' mpg
is only used for the gas guzzler tax. The adjusted mpg is what's
on the window-sticker (and is generally about 15% lower than the
unadjusted number - and is more realistic). 1984 only has the
unadjusted mpg - all years 1985-2007 have both unadjusted and
adjusted.
Only years 1998-2007 had the 'Class' (such as truck, suv, station
wagon, etc) values in the data. For earlier years, I tried to
"guess" the Class, and fill that value in, where it seemed
safe to do so (there are a lot of blank values though).
One of the main problems with this data is that it doesn't tell how
many of each vehicle were sold - therefore, when I plot the data,
a vehicle such as a Ferrari (which maybe only a handful were sold)
that gets 10mpg has the same visual 'weight' in the graphs as a
Camry (which 1000s were sold) that gets 30mpg+.
Here's how I got the 2007 data, for example:
Clicked the 2007 zip file.
Saved out the
2007_FE_guide_ALL_no_sales_May_01_2007.xlsm
(sas v9.2 can't import xlsm)
Brought up in Excel.
Saved as a 2005 file.
2007_FE_guide_ALL_no_sales_May_01_2007.xls
(sas 9.2 import wizard doesn't work with it - combination of sheet name,
and sas' required sheet name string-stuff exceeded 32 chars?)
Had to rename sheet from '2007_FE_guide_ALL_with_5_cycle_' to 'Sheet1'
I create the final dataset with "compress=yes" to save a lot of
disk space (only used 2Mb instead of 7Mb). But sas jobs that use
the data will run a little slower.
Back to Samples Index