How to download an entire subreddit
He wrote a clean script to pull every post in a subreddit, ran it, and got back exactly 1,000. The sub had 80,000. The wall he hit is the first thing to understand before you start.
Two very different jobs
Downloading "an entire subreddit" is actually two different tasks, and the route you take depends on which one you mean. If you want the recent posts — the last few months, going forward — you use a bulk downloader on the live API, and it works well. If you want the full history — everything since the subreddit was created — the live API cannot give it to you, and you have to go to an archive instead. Conflating these is why so many people start with a script and end up frustrated.
The reason is a hard limit built into Reddit itself, and it is the single most important thing to understand before you write any code. So before the how-to, the wall.
The 1,000-post wall
Reddit's API and listings will only ever return about 1,000 items for any given view. Sort a subreddit by new, top, or hot, and page through it as far as you can, and you will hit roughly 1,000 posts and then the well runs dry — no matter how many hundreds of thousands the subreddit actually contains. This is not a rate limit you can wait out or a bug you can fix; it is a deliberate cap on how deep any listing goes. The practical consequence is decisive: you cannot download a subreddit's full history through the live API or any tool built on it, full stop. The top 1,000 by some sort order, yes. Everything, no. The moment your goal is "all of it" rather than "the recent slice," you stop reaching for the API and start reaching for an archive. Almost every "I tried to download a subreddit and only got 1,000 posts" story traces back to not knowing this up front.
The three routes
The clean mental model: live tools for recent and forward-looking, archives for history. If you need both — a complete back-catalog and new posts as they arrive — you combine them: an archive dump for the past, a bulk downloader running on a schedule for the present.
Route A: recent posts with a bulk downloader
- 1
Register a Reddit app
Create a script-type app in your Reddit account settings to get a client ID and secret. This is the OAuth credential the downloader authenticates with — the same free-tier access the pricing guide describes.
- 2
Install the downloader
BDFR is a Python tool; install it with pip. It runs from the command line and has three modes — download (media and content), archive (post and comment metadata as JSON), and clone (both at once).
- 3
Point it at the subreddit
Run it against the subreddit with your chosen sort and limit. Remember the cap: you will get up to roughly 1,000 posts per sort order, plus their comment trees, not the entire history.
- 4
Let it handle rate limits
A good bulk downloader paces itself against the API limits and can resume if interrupted, so a large pull does not fail halfway. Leave it running rather than babysitting it.
- 5
Check what you got
Confirm the output — JSON files, media, or both depending on mode — and sanity-check the post count against what you expected, remembering the ceiling. If you needed more than the recent slice, that is your cue to switch to Route B.
Route B: full history from an archive
- 1
Decide browser or bulk
For a single subreddit and moderate needs, the Arctic Shift download tool lets you specify a subreddit and date range and download through the browser — no code. For many subreddits or the whole corpus, go to the bulk dumps instead.
- 2
For bulk, get the Academic Torrents dumps
The historical Reddit corpus is published as per-subreddit, compressed NDJSON files on Academic Torrents, covering roughly 2005 through 2025. Download the file for your subreddit, or the full set if you need many.
- 3
Decompress and parse
The dumps are zstandard-compressed NDJSON. Use the open-source parsing scripts written for them to turn the files into something you can query or load — this is the step that needs a little developer comfort.
- 4
Filter to what you need
A full dump is large and contains everything. Filter by date, author, or keyword down to the slice your project actually uses before you load it into analysis tooling.
- 5
Verify coverage
Archives are not perfect mirrors — spot-check a period and post you know to confirm completeness before you treat the dump as the whole truth, especially for the most recent months and removed content.
How much disk, how long
Scale surprises people in both directions. A recent-slice pull with a bulk downloader — 1,000 posts and their comments — is modest: megabytes to low gigabytes depending on media, and minutes to a couple of hours with rate limiting. That is a laptop-and-an-afternoon job. The full-history dumps are a different animal. A single busy subreddit's complete history can be many gigabytes compressed, and the entire corpus across the top tens of thousands of subreddits runs to hundreds of gigabytes or more. That is a real storage and processing commitment, not a casual download.
So size your ambition honestly before you start. "I want to study the last six months of one subreddit" and "I want the complete history of one subreddit" and "I want a hundred subreddits to train a model" are three projects with very different hardware and time bills. The middle one is usually a single archive file; the last one is a weekend of downloading and a machine with room to spare.
Honest caveats
- The 1,000-post cap is non-negotiable on the live API — if you need more than that per subreddit, you need an archive. No tool or trick gets around it.
- Archives lag and have gaps — they are weeks to months behind the present and imperfect on removed content. They are wrong for anything time-sensitive.
- Storage is a real constraint at full scale — full-history dumps are large; confirm you have the disk before you start a multi-hundred-gigabyte download.
- Deleted and removed content lives in the dumps — full-history archives often retain posts users later deleted, which carries the ethical weight covered below.
- Comment trees multiply everything — "the whole subreddit including every comment" is far larger and slower than just the posts; be clear about which you need.
The ethics of a full archive
Holding a complete copy of a subreddit is more sensitive than reading it live, because the dump freezes things people may have since deleted and lets you process them at scale. The responsible defaults: use the archive for aggregate analysis — trends, counts, themes — not to build profiles of or resurface specific users; do not re-publish deleted or removed content as if it were live; and if you are doing institutional research, your ethics board or IRB has rules about exactly this that override any convenience. A subreddit archive is a powerful research input and a genuine privacy liability at the same time. Treat it as both. None of this is legal advice; the legality guide covers the broader picture.
If you want the findings, not the files
Downloading a subreddit is the right move when you genuinely need to own the raw corpus — training a model, archiving for preservation, running custom large-scale processing. But if the real goal is to understand what a subreddit is saying — the recurring complaints, the sentiment, the topics that dominate — a few hundred gigabytes of NDJSON is a long way around. rawneed works from a plain-English question, gathers the relevant threads, classifies them into structured fields, and returns a ranked report with sources, no download or parser required. If you need the files, the routes above are exactly right. If you need the picture the files contain, that is the shortcut.
See the analysis approachFrequently asked questions
How do I download an entire subreddit?
It depends whether you mean recent posts or the full history. For recent posts, use a bulk downloader like BDFR on the official API — but it can only return about 1,000 posts per sort order because of Reddit's listing cap. For the complete history, you cannot use the live API; you download the subreddit's data from a Pushshift-lineage archive instead, either through the Arctic Shift download tool in a browser or the bulk dumps on Academic Torrents.
Why can I only get 1,000 posts from a subreddit?
Because Reddit caps every listing — new, top, hot, or via the API — at roughly 1,000 items. It is a deliberate limit, not a bug or a rate limit you can wait out. No tool built on the live API can exceed it. The only way to get a subreddit's full history beyond that 1,000 is through a historical archive like Arctic Shift or the Academic Torrents dumps, which are built from data collected before the cap applied.
What is the best tool to download a subreddit?
For recent posts and comments, BDFR (the bulk-downloader-for-reddit) is the standard — it authenticates via OAuth and offers download, archive, and clone modes. For full history, use Arctic Shift's download tool for a single subreddit through the browser, or the Academic Torrents Pushshift dumps with a parsing script for bulk offline work. Match the tool to recent-versus-historical.
Can I download all comments from a subreddit?
For recent threads, yes — a bulk downloader pulls comment trees along with posts, within the ~1,000-post listing cap. For the complete comment history, you need the archives: the Academic Torrents dumps include per-subreddit comment files, and Arctic Shift can return comments through its API or download tool. Full comment history is much larger than posts alone, so budget storage accordingly.
How much storage do I need to download a subreddit?
A recent-slice download of one subreddit is modest — megabytes to a few gigabytes. A single busy subreddit's complete compressed history can be many gigabytes, and the full multi-subreddit corpus runs to hundreds of gigabytes or more. Decide whether you need the recent slice, one subreddit's full history, or many before you start, because the storage bill differs by orders of magnitude.
Is downloading a subreddit legal?
Downloading public subreddit data for personal research sits in a low-risk zone, but holding a full archive raises specific issues: Reddit's terms restrict commercial use and redistribution, and dumps often contain content users later deleted. Aggregate analysis is generally fine; re-publishing deleted content or building profiles of individuals is not, and commercial use needs permission. See the dedicated legality guide, and note this is not legal advice.
Keep reading
See what people really say about your competitors
Track how buyers really compare tools and why they switch.
Read →Write content about what your audience actually asks
Write about the questions your audience is actually asking.
Read →How to get Reddit data (the honest map)
He needed two years of posts from one subreddit by Friday. He tried Pushshift (dead), the API docs (a pricing table), and a Stack Overflow answer from 2019 (broken). The data exists — the map to it is just out of date everywhere he looked.
Read →Pushshift alternatives that actually work in 2026
Her dissertation pipeline ran on Pushshift for two years. One morning every call returned a 403. The data she needed still existed — it had just moved, quietly, to three different places nobody had told her about.
Read →How to export Reddit comments to CSV
She had the perfect thread — 600 comments arguing about exactly the feature her team was debating. She needed it as rows in a spreadsheet by the 2pm standup, not as an afternoon of copy-paste. There is a five-minute way and a five-hour way.
Read →Reddit API pricing, explained without the panic
The headlines said Reddit's API change cost one app developer $20 million a year. So when a solo dev needed 5,000 posts for a side project, she budgeted for the worst. Her actual bill came to exactly zero — she just had to know which tier she was in.
Read →Is scraping Reddit legal? An honest, non-lawyer answer
His lawyer's answer was the one founders hate: "it depends." But it depends on a small number of specific things — and once he understood which side of each line his project sat on, the grey area got a lot smaller.
Read →How to analyze a subreddit
A 2M-member sub can be a graveyard of three posts a day while a 40k sub two clicks away is a town square. Activity beats size — here’s how to profile a community before you dive in.
Read →