Life after Google Reader, day 2

Having exported the Shared/Starred items from Google Reader, I considered several ways of keeping them around, and eventually decided to put them right here (see “Reading Lists” in the main navigation bar). My reasons were:

  • This site is unlikely to disappear overnight, since I am paying WordPress to host it.
  • Simple, structured HTML is not much harder for computers to parse than JSON, and is easier for humans to use.
  • Someone else might be interested in the stuff that I find interesting.

First step was to trim down shared.json and starred.json, and flatten them into a comma-separated list. For this I used Google Refine, which, coincidentally, is another tool that Google no longer develops. (It is being continued by volunteers as an open-source project OpenRefine). The only attributes I kept were

  • Title
  • Hyperlink
  • Author(s)
  • Feed name
  • Timestamp
  • Summary

Springer feeds gave me more headache than anything else in this process. They bundled the authors, abstract, and other information into a single string, within which they were divided only by presentational markup. I cleaned up some of Springer-published entries, but also deleted many more than I kept. “Sorry, folks – your paper looks interesting, but it’s in Springer.”

Compared to the clean-up, conversion to HTML (in a Google Spreadsheet) took no time at all. I split the list into chunks 2008-09, 2010-11, 2012, and 2013, the latter being updated periodically. Decided not to do anything about LaTeX markup; much of it would not compile anyway.

Last question: how to keep the 2013 list up-to-date, given that Netvibes offers no export feature for marked items? jQuery to the rescue:

Now with a freehand circle!
Now with a freehand circle!

The script parses the contents of Read Later tab, extracts the items mentioned above, and wraps them into the same HTML format as the existing reading list.

$(document).ready(function() {
  function exportSaved() {
  var html='';
  $('.hentry.readlater').each(function() {
    var title = $('.entry-innerTitle', this).html().split('<span')[0];
    var hyperref = $('.entry-innerTitle', this).attr('href');
    var summary = ($('.entry-content', this).html()+'').replace('\n',' ');
    var timestamp = $('.entry-innerPublished', this).attr('time');
    var feed = $('.entry-innerFeedName', this).html();
    var author = ($('.author', this).html()+'').slice(5);
    var date = new Date(timestamp*1000);
    var month = ('0'+(date.getMonth()+1)).slice(-2);
    var day = ('0'+date.getDate()).slice(-2);
    var readableDate = date.getFullYear()+'-'+month+'-'+day;
        html = html+"<div class='list-item'><h4 class='title'><strong><a href='"+hyperref+"'>"+title+"</a></strong></h4><h4 class='author'>"+author+"</h4><h5 class='source'>"+feed+" : "+readableDate+"</h5><div class='summary' style='font-size:80%;line-height:140%;'>"+summary+"</div></div>";  
    });
 console.log(html);
   }
   $('export').insertAfter($('#switch-view')).click(function() { exportSaved(); });
}); 

The script accesses the page via an extension in Google Chrome. It labels HTML elements with classes for the convenience of future parsing, not for styling. I had to use inline styles to keep WordPress happy. Dates are formatted according to ISO 8601, making it possible to quickly jump to any month using in-page search for “yyyy-mm”. Which would not work with mm/dd/yyyy, of course.

Unfortunately, some publishers still insert the authors’ names in the summary. Unlike arXiv, which serves reasonably structured feeds, albeit with unnecessary hyperlinks in the author field. While I am unhappy with such things (and the state of the web in general), my setup works for me, and this concludes my two-post detour from mathematics.

xkcd 743: Infrastructures

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s