Skip to content

Archivers

Archivers are responsible for fetching content from external sources and transforming it into Inkfeed's internal data model.

Data Model

Article

Represents a single piece of content:

  • title — Article title
  • url — Original URL
  • content_html — Full article content as HTML
  • Additional source-specific fields (comments, score, etc.)

GroupResult

A logical grouping of articles from one fetch:

  • display_name — Human-readable name for this group
  • articles — List of Article objects
  • cache_dir — Local directory for cached assets (images)

ArchiveResult

The top-level result from an archiver:

  • groups — List of GroupResult objects
  • source_config — Reference back to the source configuration

Base Class

All archivers inherit from a common base defined in archiver/base.py:

class Archiver:
    def __init__(self, source: SourceConfig, output_dir: Path):
        ...

    def run(self, *, max_workers: int, max_retries: int) -> ArchiveResult:
        ...

Built-in Archivers

HackerNewsArchiver

Fetches stories from the Hacker News API (hackernews.py).

  • Uses the official HN Firebase API
  • Fetches top stories, then retrieves each story's details
  • Optionally fetches full article content via readability extraction
  • Optionally fetches comment threads with configurable depth

KagiNewsArchiver

Fetches stories from Kagi News (kaginews.py).

  • Fetches stories across multiple configurable categories
  • Supports language filtering
  • Extracts article content from linked pages

RSSArchiver

Generic RSS/Atom feed archiver (rss.py).

  • Parses any standard RSS or Atom feed via feedparser
  • Optionally fetches and extracts full article content
  • Works with any valid feed URL

Archiver Registration

Archivers are registered in main.py via the ARCHIVER_MAP:

ARCHIVER_MAP = {
    "hackernews": HackerNewsArchiver,
    "kaginews": KagiNewsArchiver,
    "rss": RSSArchiver,
}

The lookup uses the source name first, falling back to the source type. This means:

  • A source named hackernews gets the HackerNewsArchiver
  • A source named my_blog with type = "rss" gets the RSSArchiver

Adding a New Archiver

  1. Create a new file in inkfeed/archiver/ (e.g., reddit.py)
  2. Implement a class that accepts (source, output_dir) and has a run() method returning ArchiveResult
  3. Register it in ARCHIVER_MAP in main.py
  4. Add configuration options to config.py if needed