How to Print Urls with Nokogiri

It has been a while since I made a post here, the excuses will be off and I’ll just get to the post.

I will show a way of printing a url by building a Ruby script with Nokogiri.
What I want is to print all the compressed files in the website, these are contained inside the td (table data) of tr (table rows).

require 'nokogiri'
require 'open-uri'

class Scraper
  def initilize(url)
    @url = url
  end

  def file_url
    page = Nokogiri::HTML(open(@url))
    rows = page.css('table tr')

    rows[1..-2].each do |row|
      hrefs = row.css("td a").map { |a|
        a['href'] if a['href'].match(".tar.gz")
      }.compact.uniq

      hrefs.each do |href|
        puts @url + href
      end
    end
  end
end

scrap = Scraper.new('https://cran.r-project.org/src/contrib/')

scrap.file_url
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s