How I Built an Automated Research Digest for a Small Team
Most small teams say they want to stay close to the market. The constraint is attention.
At Klysera, the market conversation we care about moves across several places: X threads, engineering blogs, founder essays, hiring reports, and research feeds.
People write about AI-native engineering teams. They talk about changing hiring signals, startup operating systems, and what technical leaders now expect from early engineers.
Some of those signals later become polished reports; by then, the conversation has already moved.
Our content writer was tracking part of this manually. She had accounts to monitor, keywords to search, and conversations to find for content distribution.
Every day, she opened X and checked profiles. She searched terms, copied links, and tried to separate useful posts from noise.
She asked if automation would remove the daily manual pass.
Within a day, I had a working prototype. Within a week, the prototype turned into Klysera Research Signal.
It became a weekday digest that monitors X conversations, pulls from research feeds, ranks the useful items, and sends one email to the team at 5pm WAT.
The system is small: about 600 lines of Python across three files. The useful part was the product judgment around it.
Internal automation works when it respects the team's existing habits. For us, that meant Playwright for collection, RSS for research, and email for delivery.
Why X Was The First Source
X is messy, but it is still where a lot of early market signal appears.
For our work at Klysera, that signal includes conversations about engineering hiring, AI-native teams, technical leadership, startup talent, and how companies evaluate software engineers.
A hiring report gives structure later; an operator thread often shows the pressure while it is still forming.
That made X the first source to automate. The problem was access.
The official API was not a good fit for this use case. It was too expensive for a small internal workflow, and the limits did not match the kind of daily monitoring we needed.
I used Playwright instead.
Playwright let the script run a real Chromium browser. It loaded JavaScript-rendered pages and reused a saved browser session from a one-time manual login.
Each scheduled run picked up the saved session file and operated as the logged-in account.
The basic collection loop was simple:
- open an account profile or search page
- wait for posts to render
- extract post text, links, timestamps, and engagement signals
- scroll
- repeat until the run hits the configured limit
Extraction was straightforward; filtering carried most of the engineering weight.
X timelines include replies, reposts, promoted content, and algorithmic recommendations.
Keyword search has a different noise problem. Job board bots and generic career accounts often post engineering-related phrases with no useful signal behind them.
The first version used three filters:
- account posts had to match the handle we requested
- keyword results needed a minimum engagement threshold
- known spam accounts were excluded by a blocklist
A standard run across 9 accounts and 8 keyword searches produced output like this:
-- Account Timelines ------------------
@GergelyOrosz ... 7 posts
@lethain ... 3 posts
@patio11 ... 2 posts
@swyx ... 4 posts
@eladgil ... 1 posts
@hunterwalk ... 2 posts
@sriramk ... 3 posts
@shreyas ... 2 posts
-- Keyword Search ---------------------
'AI engineer hiring' ... 4 results
'AI native engineer' ... 3 results
'what makes a great engineer' ... 5 results
'engineering team AI' ... 3 results
'junior engineer hiring' ... 2 results
'CTO hiring engineer' ... 3 results
Fetching research feeds...
8 research items
Done. 24 account posts + 31 keyword results after filtering.
That output was enough to prove the collection layer. The next question was how the team would actually consume it.
Why I Chose Email Over A Dashboard
The first interface was a dashboard. It had tabs, filters, categories, and a clean card layout.
It looked like a product. That was the problem.
A dashboard asks the team to build a new habit. Someone has to remember the URL, open it, scan it, and repeat that behavior tomorrow.
For a small team already switching between Slack, docs, research, customer conversations, and product work, a new destination is easy to ignore.
Email was a better product surface because it already had distribution.
The digest arrived where the team already worked. It came at a predictable time and did not ask anyone to remember another tool.
Once I chose email, the design constraints changed.
HTML email is still a constrained environment. Many clients strip external stylesheets. Modern layout support is inconsistent. Table-based structure remains the safest option.
The design had to be simple, readable, and resistant to broken rendering.
The design decision that changed the email was hierarchy.
Most automated digests flatten everything into a list of links. That makes the tool technically complete and editorially weak.
A useful digest makes a decision about what deserves attention first.
The final email structure used four sections:
- Research items from RSS feeds
- One "Thread to watch" from X
- Account timeline highlights grouped by handle
- Keyword search results grouped by query
Research leads the email because it requires deeper reading. The X section follows with the strongest account post first, then the rest of the monitored conversation.
The structure gives the team a quick path: read the top item, skim the rest, save what matters.
That hierarchy kept the email from becoming another link dump. The team did not need every item to carry the same weight; it needed the digest to make the first decision.
The Research Layer
X covered the live conversation. RSS covered slower, more durable analysis.
I added 15 sources across the following categories:
- Engineering Leadership: The Pragmatic Engineer, lethain.com, GitHub Blog, Stack Overflow Blog, and Hacker News
- Strategy: a16z, First Round Review, Paul Graham, and Y Combinator Blog
- Talent: Indeed Hiring Lab and LinkedIn Talent Blog
- Research: MIT Technology Review
- Design: Nielsen Norman Group, Smashing Magazine, and Figma Blog
The RSS layer used feedparser, which handles RSS 1.0, RSS 2.0, and Atom without per-feed parsing logic.
The sources lived in a YAML file with four fields: name, URL, category, and priority.
sources:
- name: "The Pragmatic Engineer"
url: "https://newsletter.pragmaticengineer.com/feed"
category: "Engineering"
priority: high
- name: "Indeed Hiring Lab"
url: "https://www.hiringlab.org/feed/"
category: "Talent"
priority: high
The priority field gave the digest a simple editorial rule. High-priority sources appeared first; within each priority tier, items sorted by recency.
This kept the feed useful without building a complex ranking system.
On one run, the research section surfaced 8 items.
It pulled two pieces from The Pragmatic Engineer on self-modifying software and AI operating systems. It also pulled three from Indeed Hiring Lab on labor market data.
The same run included one piece from lethain.com on hypergrowth stages and two from Nielsen Norman Group on chatbot design and user panels.
The research layer added only a few seconds to the job. It also made the digest feel less like a social media scraper and more like a daily market briefing.
What The System Actually Does
The weekday job runs at 5pm WAT.
Each run performs the following steps:
- Load the saved browser session for X.
- Visit monitored account profiles.
- Run configured keyword searches.
- Extract posts and engagement signals.
- Filter out replies, unrelated accounts, low-signal keyword results, and blocked accounts.
- Fetch research items from the RSS source list.
- Rank and group the results.
- Render the HTML email.
- Send the digest through Gmail.
The whole system stayed intentionally small. It did not need a database, a dashboard, or a long-running server.
For this workflow, a scheduled script and an email template were enough.
The run log stayed plain. It recorded timestamps, account counts, keyword counts, research item counts, and a final completion line.
That was enough for the first version because the workflow had one job and one delivery surface.
What I Would Improve
The weakest part is session management for X.
The saved browser session works until the cookie expires. When that happens, the job fails.
A better version detects authentication failure, pauses the X collection step, and sends an alert instead of leaving the error in a log file.
The RSS layer also needs adaptive lookback. Some sources publish daily; others publish once every few weeks.
If the digest returns too few high-quality research items, the script extends the lookback window for slower sources instead of leaving the section thin.
A small provenance record would improve the next version. Each item should carry its source, timestamp, extraction path, filter reason, and delivery status.
If a post appears in the email, the team needs a clear path back to where it came from and why it passed the filter.
The Practical Lesson
This build gave me a simple operating rule for internal tools: fit the team's behavior before adding a new surface.
For Klysera Research Signal, the practical decisions shaped the system:
- Use Playwright because the source was JavaScript-rendered and the API did not fit the budget.
- Use RSS because durable research belongs beside live social signal.
- Use email because the team already checks it.
- Add hierarchy because a digest has to make choices for the reader.
The stack was simple. The product judgment mattered more.
If you are building a similar workflow, start with the behavior you want to support.
Then pick the smallest system that collects the signal, filters the noise, and delivers the result where people already pay attention.