Life after Google Reader, day 2

Having exported the Shared/Starred items from Google Reader, I considered several ways of keeping them around, and eventually decided to put them right here (see “Reading Lists” in the main navigation bar). My reasons were:

  • This site is unlikely to disappear overnight, since I am paying WordPress to host it.
  • Simple, structured HTML is not much harder for computers to parse than JSON, and is easier for humans to use.
  • Someone else might be interested in the stuff that I find interesting.

First step was to trim down shared.json and starred.json, and flatten them into a comma-separated list. For this I used Google Refine, which, coincidentally, is another tool that Google no longer develops. (It is being continued by volunteers as an open-source project OpenRefine). The only attributes I kept were

  • Title
  • Hyperlink
  • Author(s)
  • Feed name
  • Timestamp
  • Summary

Springer feeds gave me more headache than anything else in this process. They bundled the authors, abstract, and other information into a single string, within which they were divided only by presentational markup. I cleaned up some of Springer-published entries, but also deleted many more than I kept. “Sorry, folks – your paper looks interesting, but it’s in Springer.”

Compared to the clean-up, conversion to HTML (in a Google Spreadsheet) took no time at all. I split the list into chunks 2008-09, 2010-11, 2012, and 2013, the latter being updated periodically. Decided not to do anything about LaTeX markup; much of it would not compile anyway.

Last question: how to keep the 2013 list up-to-date, given that Netvibes offers no export feature for marked items? jQuery to the rescue:

Now with a freehand circle!
Now with a freehand circle!

The script parses the contents of Read Later tab, extracts the items mentioned above, and wraps them into the same HTML format as the existing reading list.

$(document).ready(function() {
  function exportSaved() {
  var html='';
  $('.hentry.readlater').each(function() {
    var title = $('.entry-innerTitle', this).html().split('<span')[0];
    var hyperref = $('.entry-innerTitle', this).attr('href');
    var summary = ($('.entry-content', this).html()+'').replace('\n',' ');
    var timestamp = $('.entry-innerPublished', this).attr('time');
    var feed = $('.entry-innerFeedName', this).html();
    var author = ($('.author', this).html()+'').slice(5);
    var date = new Date(timestamp*1000);
    var month = ('0'+(date.getMonth()+1)).slice(-2);
    var day = ('0'+date.getDate()).slice(-2);
    var readableDate = date.getFullYear()+'-'+month+'-'+day;
        html = html+"<div class='list-item'><h4 class='title'><strong><a href='"+hyperref+"'>"+title+"</a></strong></h4><h4 class='author'>"+author+"</h4><h5 class='source'>"+feed+" : "+readableDate+"</h5><div class='summary' style='font-size:80%;line-height:140%;'>"+summary+"</div></div>";  
   $('export').insertAfter($('#switch-view')).click(function() { exportSaved(); });

The script accesses the page via an extension in Google Chrome. It labels HTML elements with classes for the convenience of future parsing, not for styling. I had to use inline styles to keep WordPress happy. Dates are formatted according to ISO 8601, making it possible to quickly jump to any month using in-page search for “yyyy-mm”. Which would not work with mm/dd/yyyy, of course.

Unfortunately, some publishers still insert the authors’ names in the summary. Unlike arXiv, which serves reasonably structured feeds, albeit with unnecessary hyperlinks in the author field. While I am unhappy with such things (and the state of the web in general), my setup works for me, and this concludes my two-post detour from mathematics.

xkcd 743: Infrastructures

Life after Google Reader

My first reaction to the news was to grab my data and remove Google Reader from bookmarks. I am not in the habit of clinging to doomed platforms. The downloaded zip archive has a few files with self-explanatory names. For example, subscriptions.xml is the list of subscriptions, which can be imported, for example, to Netvibes. Which is what I did, having found the Reader mode of Netvibes a reasonably close approximation. Some of the keyboard shortcuts work the same way (J and K), but most are different:

Netvibes reader shortcuts
Netvibes reader shortcuts

There is no native Android app, and the webapp does not play well with Opera Mobile. At least it works with the stock Android browser:

Netvibes version of “star” is “mark to real later”. But apparently, there is no way to export the list of such items. (Export feeds returns only the list of feeds.) The list of items of interest is more important to me than my collection of feeds.

Besides, I already have two sizable lists of such items, courtesy of Google: shared.json, from back when Google Reader had sensible sharing features; and starred.json from recent years. I guess my next step will be to work out a way to merge this data and keep it up-to-date and under my control in the future.