Archivers¶

Archivers are responsible for fetching content from external sources and transforming it into Inkfeed's internal data model.

Data Model¶

`Article`¶

Represents a single piece of content:

title — Article title
url — Original URL
content_html — Full article content as HTML
Additional source-specific fields (comments, score, etc.)

`GroupResult`¶

A logical grouping of articles from one fetch:

display_name — Human-readable name for this group
articles — List of Article objects
cache_dir — Local directory for cached assets (images)

`ArchiveResult`¶

The top-level result from an archiver:

groups — List of GroupResult objects
source_config — Reference back to the source configuration

Base Class¶

All archivers inherit from a common base defined in archiver/base.py:

class Archiver:
    def __init__(self, source: SourceConfig, output_dir: Path):
        ...

    def run(self, *, max_workers: int, max_retries: int) -> ArchiveResult:
        ...

Built-in Archivers¶

`HackerNewsArchiver`¶

Fetches stories from the Hacker News API (hackernews.py).

Uses the official HN Firebase API
Fetches top stories, then retrieves each story's details
Optionally fetches full article content via readability extraction
Optionally fetches comment threads with configurable depth

`KagiNewsArchiver`¶

Fetches stories from Kagi News (kaginews.py).

Fetches stories across multiple configurable categories
Supports language filtering
Extracts article content from linked pages

`RSSArchiver`¶

Generic RSS/Atom feed archiver (rss.py).

Parses any standard RSS or Atom feed via feedparser
Optionally fetches and extracts full article content
Works with any valid feed URL

Archiver Registration¶

Archivers are registered in main.py via the ARCHIVER_MAP:

ARCHIVER_MAP = {
    "hackernews": HackerNewsArchiver,
    "kaginews": KagiNewsArchiver,
    "rss": RSSArchiver,
}

The lookup uses the source name first, falling back to the source type. This means:

A source named hackernews gets the HackerNewsArchiver
A source named my_blog with type = "rss" gets the RSSArchiver

Adding a New Archiver¶

Create a new file in inkfeed/archiver/ (e.g., reddit.py)
Implement a class that accepts (source, output_dir) and has a run() method returning ArchiveResult
Register it in ARCHIVER_MAP in main.py
Add configuration options to config.py if needed

Archivers¶

Data Model¶

Article¶

GroupResult¶

ArchiveResult¶