Agentic coding

Stop asking AI questions.
Start building AI workflows.

A practical field guide to coding agents, skills, plugins, subagents, personal workflows, and feedback loops.

agentsskillspluginssubagentsautoresearch
JT • May 2026
01 / 32
The reframe

“Coding” is the misleading part.

The important part is that these agents can operate computers.

what people hear

AI writes code

A narrow developer assistant.

what is actually happening
reads files
writes docs
runs CLIs
browses web
calls APIs
repeats workflows

A coding agent is a computer-operating agent.

02 / 32
Minimal definition

An agent = model + harness.

The model thinks. The harness gives it a body, tools, and a workspace.

Model

Reasoning, language, planning, judgment.

Harness

Memory, tool calling, permissions, files, terminal, browser.

Workspace

The computer environment where work actually changes.

GPT5.5 + CodexOpus 4.7 + Claude CodeGemma4 + OpenClawGemma4 + Hermes
03 / 32
What they can do

Agents become useful when they can act.

Even the basic file operations are already a big deal.

Read

Open files, notes, docs, transcripts, codebases, logs.

Write

Create summaries, reports, code, markdown pages, emails.

Create

Generate scripts, folders, databases, workflows, artifacts.

Delete / edit

Clean up files, refactor code, remove stale work — with permission boundaries.

The leap is from chatbot to operator.

04 / 32
Tooling stack

Tools let agents do things.

Files are the baseline. CLIs and MCPs extend the agent’s reach.

File tools

Read, write, create, delete, search. This is the agent manipulating the workspace directly.

CLIs

Command-line tools compactly expose tested functions: scrape, query, transform, deploy, analyze.

MCPs

Connect agents to other systems. Think of MCPs as APIs designed for agents.

05 / 32
The problem

More tools can make agents worse.

Access is not free. Every always-loaded tool competes for limited attention.

Loaded with intent

skill: company-research
tool: exa-search
file: account.md
Actual task has room to breathe.

Loaded with everything

mcp: calendar schema
mcp: 38 unused tools
old logs from failed run
stale assumption
irrelevant API docs
half-decision from 40 messages ago
more tool descriptions...
Actual task squeezed into leftovers.

Tool bloat becomes context rot. Context rot becomes bad judgment.

06 / 32
The response

MCPs → CLIs → Skills

The useful abstraction moves upward: from access, to execution, to judgment.

Layer 1
MCPs

Expose systems and APIs to the agent.

Layer 2
CLIs

Expose compact, testable actions.

Layer 3
Skills

Expose procedure: when to use tools, what good looks like, and how to verify.

Skills are not another hand. They are the operating procedure.

07 / 32
Skills

A skill is an SOP for an agent.

Not magic. Usually just markdown that tells the agent how to perform a task.

What a skill can contain

  • When to use this workflow
  • Step-by-step procedure
  • Tools, CLIs, MCPs to use
  • Expected output format
  • Pitfalls and verification checks
  • Optional scripts packaged with the skill
# Daily research digest skill When the user asks for an AI research digest: 1. Collect sources from X / RSS / papers 2. Filter for robotics, CV, frontier labs, OSS models, agentic coding 3. Summarize key claims 4. Separate signal from hype 5. Save the output as markdown
08 / 32
Skills + tools

A skill routes the agent to the right tools.

The skill is the judgment layer. CLIs, MCPs, APIs, files, and scripts are action surfaces.

company-research/SKILL.md
  1. Search recent news with Exa MCP
  2. Use LinkedIn/search CLI for leaders
  3. Save account intelligence to markdown
  4. Draft outreach with cited claims
  5. Verify sources before output
routes to
CLIs

Compact local actions.

MCPs

External systems.

Files

Persistent workspace.

Scripts

Reusable helpers.

Tools do the work. Skills decide the workflow.

09 / 32
Progressive disclosure

Skills reduce context bloat by loading in stages.

Tiny summaries stay visible. Full workflow instructions enter context only when needed.

1. Skill index

Short descriptions are always available.

shopping-search
company-research
health-brief

2. Match request

“Find the best deal for this product.”

chosen: shopping-search

3. Load SOP

Full markdown, tools, pitfalls, scripts, and verification checks load only now.

Capability stays available without stuffing the context window.

10 / 32
Plugins

A plugin is a bundle of skills.

One skill teaches one workflow. A plugin packages a whole operating system of workflows.

Skill

One reusable SOP for a specific task.

Plugin

A group of skills packaged together around a domain.

Example

Superpowers packages software-development practices into agent skills.

https://github.com/obra/superpowers

11 / 32
Case study

Superpowers encodes an SDLC.

It helps the agent follow a software-development lifecycle instead of vibe-coding.

Brainstorm

Clarify questions, requirements, codebase, research.

Design spec

Overall structure, functionality, system behavior.

Implementation plan

Granular libraries, code changes, task breakdown.

Subagents execute

Work through the tasklist with focused contexts.

Methodology beats vibes.

12 / 32
Architecture pattern

Subagent-driven development

Your main chat becomes the orchestrator. Subagents do focused work in fresh contexts.

Orchestrator Agent

Reads the spec, decomposes tasks, assigns work, reviews summaries.

delegates necessary context

Task A

Fresh context
Clear goal
Independent work

Task B

Parallel if possible
Blocked if needed
Spec-driven

Task C

Uses only relevant files
Less context rot
Less drift

Review

Checks output
Returns evidence
Compresses findings

summaries return, not every implementation detail
13 / 32
Why subagents matter

Subagents are context hygiene.

Not just parallel processing. They keep the main agent from turning into a junk drawer.

One giant chat

logs + failed attempts + stale assumptions

Orchestrated work

Worker A

Implemented parser. Returned diff + tests.

Worker B

Reviewed edge cases. Returned risks only.

Worker C

Checked docs. Returned citations.

main thread sees summaries, not the mess
14 / 32
Try other workflows

Superpowers is one version of the pattern.

Other plugins are also trying to encode SDLC into agent workflows.

Get Shit Done

GSD packages an opinionated build workflow.

github.com/gsd-build/get-shit-done

BMAD Method

Breakthrough Method for Agile Development.

github.com/bmad-code-org/BMAD-METHOD

oh-my-claude-code

Another ecosystem of Claude Code workflows and conventions.

github.com/yeachan-heo/oh-my-claudecode

15 / 32
Beyond software development

Skills are bottled expertise.

Once the pattern clicks, a skill becomes a way to package domain judgment.

Trading

Research and execution workflows.

Startup advice

Office hours, design reviews, YC-style judgment.

Sales prep

Account research and stakeholder intelligence.

Health

Daily recovery interpretation from personal data.

The interesting question becomes: whose judgment can be packaged?

16 / 32
Personal journey

Building your own skills is not that difficult.

The hard part is not markdown. The hard part is noticing your own workflows.

Take one repeated task. Do it once with the agent. Then ask the agent to package the workflow as a reusable skill.

Processes

What steps do I follow?

Preferences

How do I want this done?

Best practices

What mistakes should the agent avoid?

17 / 32
Example 1

Daily health monitoring skill

A personal WHOOP-like recovery brief built from my own health data.

Input

Path to my iCloud health database.

Agent work

Claude Code built a Python script and SQLite DB to import and analyze sleep, HR, and exercise.

Output

A daily recovery report: how recovered I am, and how hard I should push exercise today.

18 / 32
Example 2

Daily X / Twitter research skill

Turning my content diet into an automated research brief.

My interests

AI research, robotics, computer vision, frontier labs, OSS models, agentic coding.

Access pattern

The agent used my already-logged-in browser session by extracting session variables like ct0 and auth_token.

Daily output

Scrape posts I care about, summarize them, and separate signal from noise.

Important caveat: browser cookies and auth tokens are sensitive. Treat agent access like privileged operator access.

19 / 32
Example 3

Shopping research orchestrator

One shopping task becomes four platform-specific search skills plus an orchestrator.

AliExpress skill
+
Shopee skill
+
Carousell skill
+
Lazada skill

Teach each platform

Use Playwright CLI to open the site, search for a product, and capture useful results.

Package each workflow

Once it worked, ask Claude Code to save it as a platform search skill.

Orchestrate

Create a higher-level skill that spawns four subagents to run the platform skills in parallel.

20 / 32
Example 4

Salesforce business relations sales skills

A sales research workflow for preparing better client outreach.

Company research

Understand the account, business model, news, and strategic priorities.

C-suite research

Find leaders, public statements, LinkedIn/news signals, and pain points.

Product fit

Map client problems and tech stack to relevant Salesforce products.

Outbound assets

Whitepaper proposal and tailored email draft for each C-suite stakeholder.

21 / 32
Example 5

Corporate banking RM prep skills

Same workflow pattern, different domain vocabulary.

Before meetings

Research the client account, leadership, business context, and current problems.

Tailor to the role

The skills are similar to Salesforce sales research, but adapted for OCBC corporate banking workflows.

The general pattern

Domain research → stakeholder intelligence → pain points → meeting strategy.

Skills travel across domains when the workflow shape is similar.

22 / 32
Milestone

Karpathy’s autoresearch

Agentic coding applied to machine-learning experimentation.

Goal

Reduce validation loss / improve evaluation score.

Playground

The agent can modify the training program and run experiments.

Feedback

An evaluator grades whether the experiment worked.

https://github.com/karpathy/autoresearch

23 / 32
The three-file contract

Autoresearch works because the game is well-defined.

The agent gets freedom inside the playground, but the evaluator keeps score.

program.md

Human direction.
Research priorities, constraints, taste, and what to try next.

train.py

Agent playground.
The implementation the agent is allowed to rewrite and run.

eval.py

Immutable score.
The external judge that turns vague “better” into measurable feedback.

The eval turns exploration into a game.

24 / 32
Hackathon example

AWS Agentic AI Hackathon: autoresearch with a scoreboard.

We won first place by treating the agent like an experiment runner, not a magic chatbot.

AWS provided the playground

Pet store chatbot, RAG PDFs, and Lambda tool calls already wired up.

Our variables

System prompt, guardrails, deployment config, and a few tool/format experiments.

The score function

A hidden eval dashboard scored robustness. Max score: 1000.

Eval score over iterations

Max possible score: 1000

peak +250 pts
05001000 350baseline 500prompt v1 250tool experiment 600guardrails tuned 550final pass

That is autoresearch: give the agent a playground, a score, and iteration loops.

25 / 32
What actually happened

The human became the “PhD advisor.”

The agent ran experiments. The human shaped the problem, supplied context, and judged what to try next.

Human

  • Frames the problem
  • Feeds context and constraints
  • Interprets score changes
  • Kills clever ideas that regress

Agent

  • Reads context folder
  • Proposes changes
  • Deploys and tests
  • Reports evidence back

The loop was: propose → deploy → score → adjust.

26 / 32
The deeper lesson

The new skill is designing feedback loops.

Problem formulation, constraints, and scoring functions become the human leverage point.

27 / 32
Tip 1

Stop treating one chat as memory.

A chat is a workspace with context limits. A skill is the reusable package.

One forever chat

old preference
failed run
random log
half-decision
stale assumption
new task squeezed in

Reusable skill

Do the task once. Package the workflow. Reuse it with new variables.

28 / 32
Tip 2

For browser agents, use a headed browser.

If you want to see what the agent is doing, ask it to drive a live browser through playwright-cli.

Invisible automation

Clicks, waits, failures, retries — and you only see logs after the fact.

live headed session
search
click
debug
teach workflow

Use it for web QA, shopping research, form flows, and platform-specific browser skills.

29 / 32
Tip 3

Start with a strong CLAUDE.md file.

This is the project manual loaded into every coding-agent session.

CLAUDE.md
  1. Project structure
  2. Coding conventions
  3. Preferred libraries
  4. Testing patterns
  5. Architectural decisions

Without it

The agent re-learns your project from scratch every session.

Recommended example

Andrei Karpathy's CLAUDE.md
github.com/forrestchang/andrej-karpathy-skills

30 / 32
Tip 4

If you need memory, write files.

Agent memory is increasingly file-based: markdown, specs, wikis, and project instructions.

file-based memory cabinet
CLAUDE.md
AGENTS.md
account notes
LLM Wiki
research pages
decisions.md

Memory that survives the chat has to live somewhere.

31 / 32
Tip 5 + closing challenge

Scope your skills deliberately.

Some skills should be global. Some should live only inside a project repo.

Global

Saved in places like ~/.claude/skills. Useful everywhere.

Project

Saved inside a repo, e.g. .claude/, when the workflow belongs to that codebase.

Shared

Repo folders like .claude, .codex, .openclaw make workflows portable.

The winners will be the people who understand their workflows deeply enough to teach them to machines.
32 / 32
1 / 32