Agentic coding

Stop asking AI questions.
Start building AI workflows.

A practical field guide to coding agents, skills, plugins, subagents, personal workflows, and feedback loops.

agentsskillspluginssubagentsautoresearch

JT • May 2026

01 / 32

The reframe

“Coding” is the misleading part.

The important part is that these agents can operate computers.

what people hear

AI writes code

A narrow developer assistant.

what is actually happening

reads files

writes docs

runs CLIs

browses web

calls APIs

repeats workflows

A coding agent is a computer-operating agent.

02 / 32

Minimal definition

An agent = model + harness.

The model thinks. The harness gives it a body, tools, and a workspace.

Model

Reasoning, language, planning, judgment.

→

Harness

Memory, tool calling, permissions, files, terminal, browser.

→

Workspace

The computer environment where work actually changes.

GPT5.5 + CodexOpus 4.7 + Claude CodeGemma4 + OpenClawGemma4 + Hermes

03 / 32

What they can do

Agents become useful when they can act.

Even the basic file operations are already a big deal.

Read

Open files, notes, docs, transcripts, codebases, logs.

Write

Create summaries, reports, code, markdown pages, emails.

Create

Generate scripts, folders, databases, workflows, artifacts.

Delete / edit

Clean up files, refactor code, remove stale work — with permission boundaries.

The leap is from chatbot to operator.

04 / 32

Tooling stack

Tools let agents do things.

Files are the baseline. CLIs and MCPs extend the agent’s reach.

File tools

Read, write, create, delete, search. This is the agent manipulating the workspace directly.

CLIs

Command-line tools compactly expose tested functions: scrape, query, transform, deploy, analyze.

MCPs

Connect agents to other systems. Think of MCPs as APIs designed for agents.

05 / 32

The problem

More tools can make agents worse.

Access is not free. Every always-loaded tool competes for limited attention.

Loaded with intent

skill: company-research

tool: exa-search

file: account.md

Actual task has room to breathe.

Loaded with everything

mcp: calendar schema

mcp: 38 unused tools

old logs from failed run

stale assumption

irrelevant API docs

half-decision from 40 messages ago

more tool descriptions...

Actual task squeezed into leftovers.

Tool bloat becomes context rot. Context rot becomes bad judgment.

06 / 32

The response

MCPs → CLIs → Skills

The useful abstraction moves upward: from access, to execution, to judgment.

Layer 1

MCPs

Expose systems and APIs to the agent.

Layer 2

CLIs

Expose compact, testable actions.

Layer 3

Skills

Expose procedure: when to use tools, what good looks like, and how to verify.

Skills are not another hand. They are the operating procedure.

07 / 32

Skills

A skill is an SOP for an agent.

Not magic. Usually just markdown that tells the agent how to perform a task.

What a skill can contain

When to use this workflow
Step-by-step procedure
Tools, CLIs, MCPs to use
Expected output format
Pitfalls and verification checks
Optional scripts packaged with the skill

# Daily research digest skill When the user asks for an AI research digest: 1. Collect sources from X / RSS / papers 2. Filter for robotics, CV, frontier labs, OSS models, agentic coding 3. Summarize key claims 4. Separate signal from hype 5. Save the output as markdown

08 / 32

Skills + tools

A skill routes the agent to the right tools.

The skill is the judgment layer. CLIs, MCPs, APIs, files, and scripts are action surfaces.

company-research/SKILL.md

Search recent news with Exa MCP
Use LinkedIn/search CLI for leaders
Save account intelligence to markdown
Draft outreach with cited claims
Verify sources before output

routes to

CLIs

Compact local actions.

MCPs

External systems.

Files

Persistent workspace.

Scripts

Reusable helpers.

Tools do the work. Skills decide the workflow.

09 / 32

Progressive disclosure

Skills reduce context bloat by loading in stages.

Tiny summaries stay visible. Full workflow instructions enter context only when needed.

1. Skill index

Short descriptions are always available.

shopping-search

company-research

health-brief

→

2. Match request

“Find the best deal for this product.”

chosen: shopping-search

→

3. Load SOP

Full markdown, tools, pitfalls, scripts, and verification checks load only now.

Capability stays available without stuffing the context window.

10 / 32

Plugins

A plugin is a bundle of skills.

One skill teaches one workflow. A plugin packages a whole operating system of workflows.

Skill

One reusable SOP for a specific task.

Plugin

A group of skills packaged together around a domain.

Example

Superpowers packages software-development practices into agent skills.

https://github.com/obra/superpowers

11 / 32

Case study

Superpowers encodes an SDLC.

It helps the agent follow a software-development lifecycle instead of vibe-coding.

Brainstorm

Clarify questions, requirements, codebase, research.

Design spec

Overall structure, functionality, system behavior.

Implementation plan

Granular libraries, code changes, task breakdown.

Subagents execute

Work through the tasklist with focused contexts.

Methodology beats vibes.

12 / 32

Architecture pattern

Subagent-driven development

Your main chat becomes the orchestrator. Subagents do focused work in fresh contexts.

Orchestrator Agent

Reads the spec, decomposes tasks, assigns work, reviews summaries.

delegates necessary context

Task A

Fresh context
Clear goal
Independent work

Task B

Parallel if possible
Blocked if needed
Spec-driven

Task C

Uses only relevant files
Less context rot
Less drift

Review

Checks output
Returns evidence
Compresses findings

summaries return, not every implementation detail

13 / 32

Why subagents matter

Subagents are context hygiene.

Not just parallel processing. They keep the main agent from turning into a junk drawer.

One giant chat

logs + failed attempts + stale assumptions

Orchestrated work

Worker A

Implemented parser. Returned diff + tests.

Worker B

Reviewed edge cases. Returned risks only.

Worker C

Checked docs. Returned citations.

main thread sees summaries, not the mess

14 / 32

Try other workflows

Superpowers is one version of the pattern.

Other plugins are also trying to encode SDLC into agent workflows.

Get Shit Done

GSD packages an opinionated build workflow.

github.com/gsd-build/get-shit-done

BMAD Method

Breakthrough Method for Agile Development.

github.com/bmad-code-org/BMAD-METHOD

oh-my-claude-code

Another ecosystem of Claude Code workflows and conventions.

github.com/yeachan-heo/oh-my-claudecode

15 / 32

Beyond software development

Skills are bottled expertise.

Once the pattern clicks, a skill becomes a way to package domain judgment.

Trading

Research and execution workflows.

Startup advice

Office hours, design reviews, YC-style judgment.

Sales prep

Account research and stakeholder intelligence.

Health

Daily recovery interpretation from personal data.

The interesting question becomes: whose judgment can be packaged?

16 / 32

Personal journey

Building your own skills is not that difficult.

The hard part is not markdown. The hard part is noticing your own workflows.

Take one repeated task. Do it once with the agent. Then ask the agent to package the workflow as a reusable skill.

Processes

What steps do I follow?

Preferences

How do I want this done?

Best practices

What mistakes should the agent avoid?

17 / 32

Example 1

Daily health monitoring skill

A personal WHOOP-like recovery brief built from my own health data.

Input

Path to my iCloud health database.

Agent work

Claude Code built a Python script and SQLite DB to import and analyze sleep, HR, and exercise.

Output

A daily recovery report: how recovered I am, and how hard I should push exercise today.

18 / 32

Example 2

Daily X / Twitter research skill

Turning my content diet into an automated research brief.

My interests

AI research, robotics, computer vision, frontier labs, OSS models, agentic coding.

Access pattern

The agent used my already-logged-in browser session by extracting session variables like ct0 and auth_token.

Daily output

Scrape posts I care about, summarize them, and separate signal from noise.

Important caveat: browser cookies and auth tokens are sensitive. Treat agent access like privileged operator access.

19 / 32

Example 3

Shopping research orchestrator

One shopping task becomes four platform-specific search skills plus an orchestrator.

AliExpress skill

Shopee skill

Carousell skill

Lazada skill

Teach each platform

Use Playwright CLI to open the site, search for a product, and capture useful results.

Package each workflow

Once it worked, ask Claude Code to save it as a platform search skill.

Orchestrate

Create a higher-level skill that spawns four subagents to run the platform skills in parallel.

20 / 32

Example 4

Salesforce business relations sales skills

A sales research workflow for preparing better client outreach.

Company research

Understand the account, business model, news, and strategic priorities.

C-suite research

Find leaders, public statements, LinkedIn/news signals, and pain points.

Product fit

Map client problems and tech stack to relevant Salesforce products.

Outbound assets

Whitepaper proposal and tailored email draft for each C-suite stakeholder.

21 / 32

Example 5

Corporate banking RM prep skills

Same workflow pattern, different domain vocabulary.

Before meetings

Research the client account, leadership, business context, and current problems.

Tailor to the role

The skills are similar to Salesforce sales research, but adapted for OCBC corporate banking workflows.

The general pattern

Domain research → stakeholder intelligence → pain points → meeting strategy.

Skills travel across domains when the workflow shape is similar.

22 / 32

Milestone

Karpathy’s autoresearch

Agentic coding applied to machine-learning experimentation.

Goal

Reduce validation loss / improve evaluation score.

Playground

The agent can modify the training program and run experiments.

Feedback

An evaluator grades whether the experiment worked.

https://github.com/karpathy/autoresearch

23 / 32

The three-file contract

Autoresearch works because the game is well-defined.

The agent gets freedom inside the playground, but the evaluator keeps score.

program.md

Human direction.
Research priorities, constraints, taste, and what to try next.

train.py

Agent playground.
The implementation the agent is allowed to rewrite and run.

eval.py

Immutable score.
The external judge that turns vague “better” into measurable feedback.

The eval turns exploration into a game.

24 / 32

Hackathon example

AWS Agentic AI Hackathon: autoresearch with a scoreboard.

We won first place by treating the agent like an experiment runner, not a magic chatbot.

AWS provided the playground

Pet store chatbot, RAG PDFs, and Lambda tool calls already wired up.

Our variables

System prompt, guardrails, deployment config, and a few tool/format experiments.

The score function

A hidden eval dashboard scored robustness. Max score: 1000.

Eval score over iterations

Max possible score: 1000

peak +250 pts

That is autoresearch: give the agent a playground, a score, and iteration loops.

25 / 32

What actually happened

The human became the “PhD advisor.”

The agent ran experiments. The human shaped the problem, supplied context, and judged what to try next.

Human

Frames the problem
Feeds context and constraints
Interprets score changes
Kills clever ideas that regress

→

←

→

Agent

Reads context folder
Proposes changes
Deploys and tests
Reports evidence back

The loop was: propose → deploy → score → adjust.

26 / 32

The deeper lesson

The new skill is designing feedback loops.

Problem formulation, constraints, and scoring functions become the human leverage point.

Task

What game are we playing?

Agent action

Generate, deploy, post, test.

Environment

Users, evals, markets, codebase.

Measurement

Loss, CTR, conversion, score.

Human judgment

Interpret signal. Adjust constraints.

Next run

Better prompt, tool, or workflow.

The game you define determines the agent you get.

27 / 32

Tip 1

Stop treating one chat as memory.

A chat is a workspace with context limits. A skill is the reusable package.

One forever chat

old preference

failed run

random log

half-decision

stale assumption

new task squeezed in

Reusable skill

Do the task once. Package the workflow. Reuse it with new variables.

28 / 32

Tip 2

For browser agents, use a headed browser.

If you want to see what the agent is doing, ask it to drive a live browser through playwright-cli.

Invisible automation

Clicks, waits, failures, retries — and you only see logs after the fact.

live headed session

click

debug

teach workflow

Use it for web QA, shopping research, form flows, and platform-specific browser skills.

29 / 32

Tip 3

Start with a strong CLAUDE.md file.

This is the project manual loaded into every coding-agent session.

CLAUDE.md

Project structure
Coding conventions
Preferred libraries
Testing patterns
Architectural decisions

Without it

The agent re-learns your project from scratch every session.

Recommended example

Andrei Karpathy's CLAUDE.md
github.com/forrestchang/andrej-karpathy-skills

30 / 32

Tip 4

If you need memory, write files.

Agent memory is increasingly file-based: markdown, specs, wikis, and project instructions.

file-based memory cabinet

CLAUDE.md

AGENTS.md

account notes

LLM Wiki

research pages

decisions.md

Memory that survives the chat has to live somewhere.

31 / 32

Tip 5 + closing challenge

Scope your skills deliberately.

Some skills should be global. Some should live only inside a project repo.

Global

Saved in places like ~/.claude/skills. Useful everywhere.

Project

Saved inside a repo, e.g. .claude/, when the workflow belongs to that codebase.

Shared

Repo folders like .claude, .codex, .openclaw make workflows portable.

The winners will be the people who understand their workflows deeply enough to teach them to machines.

32 / 32

Stop asking AI questions.Start building AI workflows.

“Coding” is the misleading part.

AI writes code

An agent = model + harness.

Model

Harness

Workspace

Agents become useful when they can act.

Read

Write

Create

Delete / edit

Tools let agents do things.

File tools

CLIs

MCPs

More tools can make agents worse.

Loaded with intent

Loaded with everything

MCPs → CLIs → Skills

A skill is an SOP for an agent.

What a skill can contain

A skill routes the agent to the right tools.

Skills reduce context bloat by loading in stages.

1. Skill index

2. Match request

3. Load SOP

A plugin is a bundle of skills.

Skill

Plugin

Example

Superpowers encodes an SDLC.

Brainstorm

Design spec

Implementation plan

Subagents execute

Subagent-driven development

Orchestrator Agent

Task A

Task B

Task C

Review

Subagents are context hygiene.

One giant chat

Orchestrated work

Superpowers is one version of the pattern.

Get Shit Done

BMAD Method

oh-my-claude-code

Skills are bottled expertise.

Trading

Startup advice

Sales prep

Health

Building your own skills is not that difficult.

Processes

Preferences

Best practices

Daily health monitoring skill

Input

Agent work

Output

Daily X / Twitter research skill

My interests

Access pattern

Daily output

Shopping research orchestrator

Teach each platform

Package each workflow

Orchestrate

Salesforce business relations sales skills

Company research

C-suite research

Product fit

Outbound assets

Corporate banking RM prep skills

Before meetings

Tailor to the role

The general pattern

Karpathy’s autoresearch

Stop asking AI questions.
Start building AI workflows.