Archivers¶
Archivers are responsible for fetching content from external sources and transforming it into Inkfeed's internal data model.
Data Model¶
Article¶
Represents a single piece of content:
title— Article titleurl— Original URLcontent_html— Full article content as HTML- Additional source-specific fields (comments, score, etc.)
GroupResult¶
A logical grouping of articles from one fetch:
display_name— Human-readable name for this grouparticles— List ofArticleobjectscache_dir— Local directory for cached assets (images)
ArchiveResult¶
The top-level result from an archiver:
groups— List ofGroupResultobjectssource_config— Reference back to the source configuration
Base Class¶
All archivers inherit from a common base defined in archiver/base.py:
class Archiver:
def __init__(self, source: SourceConfig, output_dir: Path):
...
def run(self, *, max_workers: int, max_retries: int) -> ArchiveResult:
...
Built-in Archivers¶
HackerNewsArchiver¶
Fetches stories from the Hacker News API (hackernews.py).
- Uses the official HN Firebase API
- Fetches top stories, then retrieves each story's details
- Optionally fetches full article content via readability extraction
- Optionally fetches comment threads with configurable depth
KagiNewsArchiver¶
Fetches stories from Kagi News (kaginews.py).
- Fetches stories across multiple configurable categories
- Supports language filtering
- Extracts article content from linked pages
RSSArchiver¶
Generic RSS/Atom feed archiver (rss.py).
- Parses any standard RSS or Atom feed via
feedparser - Optionally fetches and extracts full article content
- Works with any valid feed URL
Archiver Registration¶
Archivers are registered in main.py via the ARCHIVER_MAP:
ARCHIVER_MAP = {
"hackernews": HackerNewsArchiver,
"kaginews": KagiNewsArchiver,
"rss": RSSArchiver,
}
The lookup uses the source name first, falling back to the source type. This means:
- A source named
hackernewsgets theHackerNewsArchiver - A source named
my_blogwithtype = "rss"gets theRSSArchiver
Adding a New Archiver¶
- Create a new file in
inkfeed/archiver/(e.g.,reddit.py) - Implement a class that accepts
(source, output_dir)and has arun()method returningArchiveResult - Register it in
ARCHIVER_MAPinmain.py - Add configuration options to
config.pyif needed