Skip to content

Add a dateformat option to customize how dates are parsed#48

Closed
jamessharp wants to merge 1 commit intodanmactough:masterfrom
ortoo:master
Closed

Add a dateformat option to customize how dates are parsed#48
jamessharp wants to merge 1 commit intodanmactough:masterfrom
ortoo:master

Conversation

@jamessharp
Copy link
Copy Markdown

Hi Dan - I'm UK based and have come across a couple of feeds that have UK date formats (dd/mm/yyyy). So I've added a dateformat option to use when this is the case.

I've updated the README so hopefully don't need to do any more explanation here...

James

@danmactough
Copy link
Copy Markdown
Owner

Thanks for the thorough pull request, but the case you're trying to address is an invalid feed format. All date-times in RSS must conform to RFC 822. Date-times in Atom feeds must conform to RFC 3339. It's not a matter of i18n/l10n -- that feed is just invalid.

I am inclined not to implement this.

@jamessharp
Copy link
Copy Markdown
Author

Even though it's spec-invalid it still exists out in the wild and a lot of the time the feed owner isn't going to fix their end. Obviously you can't address every invalid case, but this one is going to be probably the most common case of spec non-conformity, the fix seems harmless and makes node-feedparser that little bit more useful (and saves me from maintaining a fork... 👍 )

Just my two cents though - I'll now shut up.

@danmactough
Copy link
Copy Markdown
Owner

exists out in the wild

Do you have some links? When I checked the feed linked in your example, it's a valid feed (although it's only got one article from nearly three years ago).

@jamessharp
Copy link
Copy Markdown
Author

here is where I got the example from. What with it also containing the wrong link it seems a bit crap overall, so I'm happy for you to bin the pull request and I'll keep this feature in my fork only (unfortunately I have to work with this feed and there is an almost 0 chance of getting it fixed...). I've come across similar ones but on the same site - not anywhere else so far.

@danmactough
Copy link
Copy Markdown
Owner

@jamessharp I wouldn't want to maintain a fork for this either. Plus you'll miss all the bug fixes! 😄

Luckily, you don't need to maintain a fork -- you can just override the handleMeta and handleItem methods to "massage" those dates into shape, like so:

var feedparser = require('feedparser')
  , moment = require('moment');

var feed = 'http://www.esinet.norfolk.gov.uk/cadmin/ecourier/govmi.xml';
var fmt = 'DD/MM/YYYY HH:mm:ss';

// Save the original methods to apply after we're done massaging
var handleMeta = feedparser.prototype.handleMeta;
var handleItem = feedparser.prototype.handleItem;

// Override the methods to massage those bad dates
feedparser.prototype.handleMeta = function (node, type, options) {
  if (node['pubdate'] && node['pubdate']['#']) {
    node['pubdate']['#'] = moment(node['pubdate']['#'], fmt).toJSON();
  }
  if (node['lastbuilddate'] && node['lastbuilddate']['#']) {
    node['lastbuilddate']['#'] = moment(node['lastbuilddate']['#'], fmt).toJSON();
  }
  // Now that we've fixed up the malformatted dates, apply the original method
  return handleMeta.apply(this, arguments);
};

feedparser.prototype.handleItem = function (node, type, options) {
  if (node['pubdate'] && node['pubdate']['#']) {
    node['pubdate']['#'] = moment(node['pubdate']['#'], fmt).toJSON();
  }
  // see handleMeta, above -- ditto here
  return handleItem.apply(this, arguments);
};

feedparser.parseUrl(feed).on('complete', function (meta, articles) { 
  console.log(meta.title);
  console.log(meta.date);
  articles.forEach(function (article) {
    console.log('%s - %s', article.title, article.pubdate);
  });
  console.log('Woot!');
});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants