|Agent Source Token||
|Subject Coverage||Blog posts mentioned in RSS and Atom feeds on
|Object Coverage||All DOIs, Landing Page URLs, plain-text DOIs.|
|Data Contributor||Curators of RSS and Atom feed aggregators. Authors of blog posts.|
|Data Origin||RSS and Atom feeds, and the blog posts they point to.|
|Freshness||Every few hours.|
|Identifies||Linked DOIs, unlinked DOIs, Landing Page URLs|
|License||Creative Commons CC0 1.0 Universal (CC0 1.0)|
|Looks in||HTML of webpages (mostly blog posts) linked to from RSS and Atom feeds.|
|Produces Evidence Records||Yes|
|Produces relation types||
|Updates or deletions||None expected|
What it is
Links from blogs and other content with a newsfeed.
What it does
The Agent has a list of RSS feeds. It monitors each one for links to blog posts. If a blog post links to Registered Content, or mentions DOIs in the text, they are extracted into Events.
Where data comes from
newsfeed-list Artifact is consulted. On a regular basis the Agent retrieves the Artifact, then follows the link to every blog post or page mentioned. Data sources:
newsfeed-listArtifact, curated by Crossref.
- The content of each newsfeed. Each newsfeed may be operated by a different organisation.
- The content of the individual blog posts.
Example RSS feeds include:
- ScienceSeeker blog aggregator
- ScienceBlogging blog aggregator
Content to follow.
On a regular basis (approximately every hour) the Newsfeed Agent starts a scan. Each scan:
- It retrieves the most recent version of the
- It scans over every RSS feed.
- It passes the URL of every Blog post to the Percolator.
- Includes batches of
Edits / Deletion
- Events may be edited if they are found to be faulty, e.g. non-existent DOIs
- Links to blog posts are followed. If a summary of the blog post is included in the RSS feed, it is not consulted.
- RSS feeds may be taken offline.
- RSS feeds may contain incomplete data.
- RSS feeds may update too quickly for the Agent to keep up.
- Publisher sites may block the Event Data Bot collecting Landing Pages.
- Publisher sites may prevent the Event Data Bot collecting Landing Pages with robots.txt
- Blog sites may block the Event Data Bot collecting Landing Pages.
- Blog sites may prevent the Event Data Bot collecting Landing Pages with robots.txt