Using Magpie RSS to scrape blog headlines to html

posted by ian grant on March 25, 2006 at 9:43 am | in digital art hacks, general, net art, web 2.0 | 1 comment

This walkthrough assumes you have access to a server running PHP and the ability to change permissions on directories.

Step One: Get MagpieRSS here! Head to sourceforge and grab the latest copy of the excellent MagpieRSS.

Step Two: Read the docs. Quickstart: setup a directory on the webserver that looks a bit like this:

Directory 001

Set the permission of the “cache” directory to 777 – world writable. You may be able to get away with more restricted permissions.

Step Three: use the code below as a starting point for exploration. You can see the results of it here here

There are several lines you can comment/uncomment to see the object magpierss returns. The current example is set to return the results of a blogger feed. With some extra code one can detect the feed and provide summaries accordingly… that is to come.


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<?php
require_once('magpierss/rss_fetch.inc');
// the @ suppresses errors
// change the URL to the blog atom / rss feed.
// If the feed is not atom but RSS some of the
// item names will be different - one will need
// to check. The info is in the 'channel' array.

$rss = @fetch_rss( 'http://internetandnetworkart.blogspot.com/atom.xml' );
// $rss = @fetch_rss( 'http://ellington.tvu.ac.uk/dev/?feed=rss2' );

// dump the object to the screen to study the structure magpie returns
echo '

';
print_r($rss);
echo '

';
// end dump

$channel = $rss->channel;
echo "Blog Title: " .$channel['title'];

//display links recent blog entries:

echo “Latest blog additions:\n";
foreach ($rss->items as $item) {

   $href = $item['link'];
   $title = $item['title'];
   $author = $item['author_name'];
   $created = $item['created'];
   $content = $item['atom_content'];

   echo "$title created by $author on $created\n
   $content\n"
}

echo '

';

?>

The sample files can be downloaded here:

magpierss_blog_scrape.zip

1 comment

  1. Hey, awesome site! :) check out SimplePie for working with RSS moving forward. Magpie doesn’t seem to have moved forward in a long time.

    comment by steve cooley — March 20, 2007 #

sorry, the comment form is now closed.

(cc) ian grant some rights reserved