Read-link command

Convert a URL into clean markdown using GitHub raw-content fast paths, direct markdown detection, and Firecrawl fallback scraping.

Basic usage

webctx read-link https://github.com/amxv/webctx/blob/main/README.md

read-link returns a markdown document. If a title can be found, output starts with an H1, then the original URL, then the extracted content.

GitHub fast path

For GitHub file URLs, webctx converts the page URL into a raw GitHub URL before using any scraping provider.

Repository root URLs are treated as README requests. If README.md is not found, webctx also tries readme.md, Readme.md, and README.

Tree URLs are not treated as files and fall through to the other paths.

Direct markdown path

For direct markdown-style URLs, webctx checks whether a .md document is available. If the given URL does not end in .md, it tries the same URL with .md appended.

The HEAD response must look like markdown or plain text and have enough content length to be useful. Then webctx fetches the markdown directly and derives the title from the first # heading when possible.

Firecrawl fallback

When the fast paths do not work, webctx uses Firecrawl:

endpoint: https://api.firecrawl.dev/v2/scrape
formats: markdown
onlyMainContent: true
skipTlsVerification: true
blockAds: true
removeBase64Images: true
maxAge: 600000

The request excludes common non-content tags such as scripts, styles, navigation, footers, headers, asides, SVGs, images, and ad selectors.

PDF handling

If the URL ends in .pdf, webctx asks Firecrawl to use the PDF parser.

Rate limiting

Firecrawl scrape requests pass through a process-local queue with a token bucket limiter. It starts with 10 tokens and refills one token every six seconds. The queue keeps scrape calls serialized so agent workflows do not burst into Firecrawl.