HTML Table Parser

A couple of weeks ago, a friend asked me what programming language I would suggest for him to write a html table parser in. He had heard that Python would work well, and since I know PHP, I said PHP. I talked about how in Drupal, there is the Feeds module that I use to scrape some content from a website (will write about this in another blog). I thought this would be the best approach because Drupal can do anything. In getting more information from him, I learned that the Feeds Module would be overkill and wouldn't solve this problem.

What he wants

He plays WOW (World of Warcraft) and he is a leatherworker. He showed be a webpage (https://theunderminejournal.com/category.php?realm=H-Stormreaver&categor...) that shows what goods he can make and for what cost. Since it takes considerable time to make anything, he wanted to make the most money (in-game money) possible. So what he ended up doing was copying and pasting all of the tables into Excel, and then having an Excel Formula to calculate profit and sort accordingly. He did this once, and realized there had to be an easier way.

What to do

In looking at the page source of https://theunderminejournal.com/category.php?realm=H-Stormreaver&categor..., I realized that there is a huge number of tables in the DOM. I'm assuming there is a reason for having so many tables, but it looks unneccessary to me. My plan was to have PHP parse the tables, spit out the 3 columns I need (Item, Mats, & Price), calculate profit, and then have jQuery take care of sorting the data. I knew that I could probably use some arrays to sort the information by profit, but I knew that sorting is something that jQuery can do.

What I did

So it was pretty simple to create the Table parser once I refreshed my knowledge on this. The code that parses this is as follows:

 

You'll notice that I have if ($cols->item(1)->nodeValue !== "95% CI") { to make sure the td isn't "95% CI". Truth be told, I'm not sure why I need this line. All I know is that I was getting some information spit out that I didn't want. I tried to find this in the source code, but I couldn't figure out what the issue was. The code worked, so I'm not too worried about it.

This took care of getting the information that I needed, and now all I needed was to sort the information.

jQuery Table Sorter to the rescue

I found http://blog.teamtreehouse.com/how-to-code-sortable-tabular-data-with-jquery, that showed me a perfect example of how http://tablesorter.com/ works. What I really like about Tablesorter, is that it allows you to choose a secondary, or tertiary, sort by pressing shift + click. This works perfectly for my example so you can sort by items that there currently aren't any and have a secondary sort by price.

To Summerize

  • I originally thought that Drupal would of been the best solution for this, but after looking more at the problem I realized that Drupal would be overkill for this application.
  • I kept my PHP simple by not sorting the rows
  • I used a jQuery plugin to take care of sorting the rows instead

I showed this to my coworker, and he was blown away. It was exactly what he needed. This was a fun project for me because I got to use my PHP knowledge to do some custom code instead of always using Drupal which I don't get to write custom code often.

See it in action at http://natemillin.com/demos/wow