Friday, May 23, 2008
« Creating cross platform GUI's with IronR... | Main | It's time to give up on Twitter »
I have a small fantasy football related project that I'm going to be working on that requires access to the full 2008 NFL Schedule.  I decided to use Hpricot, and scrape the schedule from ESPN.  The CSV file I created with the output from the below script can be found here: nfl-schedule.csv (11.48 KB)  

Download script (parse_schedule.rb)

#!ruby
require 'rubygems'
require 'hpricot'
require 'open-uri'

class Game
   attr_accessor :date, :week, :away_team, :home_team, :time

   def to_s
      "#{@date} #{@time} #{@away_team} at #{@home_team}"
   end

   def to_csv
      "#{@week},#{@date.gsub(",", "")},#{@time},#{@away_team},#{@home_team}"
   end
end

def parse_games(doc)
   games = []
   doc.search("//table[@class='tablehead']//tr").each do |tr|
      @week = tr.search("/td/a").inner_html if(tr[:class] == 'stathead')
      @date = tr.at("td").inner_html if(tr[:class] == 'colhead')

      teams = []
      tr.at("td").search("a").each do |team|
         teams << team.inner_html
      end

      if(teams.size == 2)
         @time = tr.search("td:eq(1)").inner_html
         game = Game.new()
         game.date = @date
         game.week = @week
         game.time = @time
         game.away_team = teams[0]
         game.home_team = teams[1]
         games << game
      end
   end
   games
end

games = parse_games(Hpricot(open("http://sports.espn.go.com/nfl/schedule")))
games.each do |g|
   puts g.to_csv
end
puts "Total games: #{games.size}"