HTML Table Parser

A couple o' weeks ago, a matey asked me what programmin' language I would suggest fer that scurvey dog t' write a html table parser in. The ornery cuss had heard that Python would work well, an' since I know PHP, I said PHP. I talked about how in Drupal, thar is th' Feeds module that I use t' scrape some content from a website (will write about this in another blog). I thought this would be th' best approach because Drupal can do anythin'. In gettin' more information from that scurvey dog, I learned that th' Feeds Module would be overkill an' wouldn't solve this problem.

What he wants

The ornery cuss plays WOW (World o' Warcraft) an' he is a leatherworker. Yaaarrrrr! The ornery cuss showed be a webpage ( that shows what goods he can make an' fer what cost, yo ho, ho Since it takes considerable time t' make anythin', he wanted t' make th' most dubloons (in-game dubloons) possible. So what he ended up doin' were bein' copyin' an' pastin' all o' th' tables into Excel, an' then havin' an Excel Formula t' calculate profit an' sort accordingly. The ornery cuss did this once, an' realized thar had t' be an easier way.

What t' do

In lookin' at th' page source o', I realized that thar is a huge number o' tables in th' DOM. I'm assumin' thar is a reason fer havin' so many tables, but it looks unneccessary t' me. My plan were bein' t' have PHP parse th' tables, spit out th' 3 columns I need (Item, Mats, & Price), calculate profit, an' then have jQuery take care o' sortin' th' data, to be sure. I knew that I could likely use some arrays t' sort th' information by profit, but I knew that sortin' is somethin' that jQuery can do.

What I did

So it were bein' pretty simple t' create th' Table parser once I refreshed me knowledge on this. The code that parses this is as follows:


You'll notice that I have if ($cols->item(1)->nodeValue !== "95% CI") { t' make sure th' td isn't "95% CI". Truth be told, I'm not sure why I need this line. All I know is that I were bein' gettin' some information spit out that I di'nae want. I tried t' find this in th' source code, but I couldn't figure out what th' issue were bein', by Davy Jones' locker. The code worked, so I'm not too worried about it.

This took care o' gettin' th' information that I needed, an' now all I needed were bein' t' sort th' information.

jQuery Table Sorter t' th' rescue

I found, that showed me a perfect example o' how works. What I really like about Tablesorter, is that it allows ye t' choose a secondary, or tertiary, sort by pressin' shift + click, with a chest full of booty. This works perfectly fer me example so ye can sort by items that thar currently aren't any an' have a secondary sort by price.

To Summerize

  • I originally thought that Drupal would o' been th' best solution fer this, but after lookin' more at th' problem I realized that Drupal would be overkill fer this application.
  • I kept me PHP simple by not sortin' th' rows
  • I used a jQuery plugin t' take care o' sortin' th' rows instead

I showed this t' me shipmate, an' he were bein' blown away, we'll keel-haul ye! It were bein' exactly what he needed. This were bein' a fun project fer me because I got t' use me PHP knowledge t' do some custom code instead o' always usin' Drupal which I dern't get t' write custom code often.

See it in action at