I have a small fantasy football related project that I'm going to be working on that requires access to the full 2008 NFL Schedule. I decided to use
Hpricot, and scrape the
schedule from ESPN. The CSV file I created with the output from the below script can be found here:
nfl-schedule.csv (11.48 KB)
Download script (parse_schedule.rb)
#!ruby
require 'rubygems'
require 'hpricot'
require 'open-uri'
class Game
attr_accessor :date, :week, :away_team, :home_team, :time
def to_s
"#{@date} #{@time} #{@away_team} at #{@home_team}"
end
def to_csv
"#{@week},#{@date.gsub(",", "")},#{@time},#{@away_team},#{@home_team}"
end
end
def parse_games(doc)
games = []
doc.search("//table[@class='tablehead']//tr").each do |tr|
@week = tr.search("/td/a").inner_html if(tr[:class] == 'stathead')
@date = tr.at("td").inner_html if(tr[:class] == 'colhead')
teams = []
tr.at("td").search("a").each do |team|
teams << team.inner_html
end
if(teams.size == 2)
@time = tr.search("td:eq(1)").inner_html
game = Game.new()
game.date = @date
game.week = @week
game.time = @time
game.away_team = teams[0]
game.home_team = teams[1]
games << game
end
end
games
end
games = parse_games(Hpricot(open("http://sports.espn.go.com/nfl/schedule")))
games.each do |g|
puts g.to_csv
end
puts "Total games: #{games.size}"