Click here to see the SAS code.
Click here to see the example.

---------------------------------------------------------------

This is a gif animation of over 100 years of Baseball data!

Each plot marker represents a team.
The left axis is the total attendance for the year.
The bottom axis is the percent of games won during that year.

Since the size of stadiums & the US population has grown over the 100+ years,
I let the left axis auto-scale - but I hard-code the bottom axis since the
percent of games won should be "comparable" over the years, and therefore
it's desireable to have all the years plotted on the same scale.

I let gplot calculate the regression line using "interpol=rl"
in the symbol statement.

Disclaimer: I'm not a statistician! :)

This is not a professional/statistical analysis of the data,
but the gif animation does seem to bear out that there is 
a general correlation that the teams which win a higher
percentage of the games seem to have higher attendance.

Visually, in any given year, the data don't look that strongly correlated,
but when looked at over the 100-year span, it does seem to show some
correlation.  If the relationship were more "random", then the direction
of the regression lines would be more "random", and the animation would 
just look like spastic lines pointing hither-and-yon ... but they 
(mostly) point in generally the same direction year after year.

Perhaps teams that win a lot of games draw more fans, which
creates more demand for seats, and therefore larger stadiums 
are built for them ... which can hold more people, and that
produces higher attendance for the teams that consistently
win(?) - just one theory :)

It would also be interesting to plot the %-fullness of the stadiums 
rather than just attendance, but I don't have that data.


Back to Samples Index