Methodology

How the pipeline actually works

No black box. Here is exactly how a plain-English claim becomes a ranked, sourced report — the steps, the scoring, the cost, and how we test that it holds up.

Start a claim See how it works

From a claim to a report in six steps

1
State the hypothesis
You write your claim in plain English — e.g. “indie founders struggle to get their first 100 customers.”
2
Clarify the scope
A short set of AI clarifying questions sharpens who you mean and what counts, so the search does not drift.
3
Suggest & tier subreddits
Candidate subreddits are validated against r/<sub>/about.json — dead, private, or sub-500-member communities are dropped, and the rest are tiered Bullseye / Decent / Off-topic.
4
Mine real search queries
Queries are built from the actual title phrases people post in your chosen subs, with frequency counts, so you search the way your audience writes — not the way you guess they do.
5
Set the pipeline knobs
Choose how many threads to classify and how many run in parallel; a live cost estimate updates as you adjust, so there are no surprises.
6
Review & launch
Watch live logs, a running cost meter, and a per-thread status grid as the run executes. Stop and resume any time — runs are resumable.

What each thread is scored on

Every thread is classified into a fixed schema so the fields actually aggregate across hundreds of posts instead of becoming unique per-thread prose:

pain_signal — 0–100 intensity of the frustration in the thread
wtp_tier — willingness to pay, bucketed high / medium / low / none
tools_mentioned[] — the products and services named in the discussion
sentiment_toward_tools — positive / negative / mixed / neutral, per tool
primary_use_case — market research, lead gen, brand monitoring, content ideation, or other
relevance_score — 0–10 match between the thread and your claim
key_quotes, best_quote_from_OP, best_quote_from_top_reply — verbatim, with links back to source
summary — a one-line, plain-English gist of the thread

What a run costs and how long it takes

StageTimeCost

Fetch threads2–3 minFree — public JSON

Filter (rule-based prune)< 1 secFree

Classify with Gemini10–15 min · ~300 threads$0.13–0.40

Render the report< 1 secFree

Cost scales with comments per thread: ~$0.13 at five comments each, ~$0.30–0.40 at fifty to a hundred. Gemini 2.5 Flash is billed at $0.30 per million input tokens and $2.50 per million output, and each thread takes roughly 11 seconds to classify.

Built to survive Reddit’s API

Every thread comes from Reddit’s public JSON endpoints — no OAuth, no API key, no quota approval to wait on.

Runs are resumable and tolerate rate-limiting: a 429 or a dropped connection mid-run is caught and retried rather than losing progress, and a top-N cap lets you bound exactly how many threads get classified, which is the main cost lever.

That public-JSON foundation is also why the pipeline is not exposed to the API-pricing changes that have shut down other Reddit research tools.

How we test that it holds up

We do not just assert the pipeline works — we measure the parts that could quietly fail, with scripts anyone can re-run.

When we needed to know whether the wizard’s AI-suggested subreddits could be trusted, we probed 100 suggestions across ten domains: 90 were live, public communities and only one was a hallucinated name. When we needed to know whether exact-phrase Reddit search beats loose keyword matching, we ran a quoted-vs-tokenized A/B and found roughly three in four multi-word phrases return enough exact-match results to use directly.

Both tests live as reproducible scripts, and both carry caveats we keep in view — small samples, and a search ranker that shifts over time. We publish those caveats next to the numbers rather than burying them.

Maintained by Bhupendra Singh Chauhan · Founder, Reddit Research Pipeline.

Validate what people actually say, not what you wish they would.

Start a claim Browse use cases

How the pipeline actually works

From a claim to a report in six steps

State the hypothesis

Clarify the scope

Suggest & tier subreddits

Mine real search queries

Set the pipeline knobs

Review & launch

What each thread is scored on

What a run costs and how long it takes

Built to survive Reddit’s API

How we test that it holds up

Keep reading

Validate your startup idea with evidence, not optimism

How we score a thread for pain and willingness to pay

Does AI hallucinate subreddit names? We tested 100

Quoted vs tokenized Reddit search: an A/B test

Validate what people actually say, not what you wish they would.