Stop asking AI questions.
Start building AI workflows.
A practical field guide to coding agents, skills, plugins, subagents, personal workflows, and feedback loops.
“Coding” is the misleading part.
The important part is that these agents can operate computers.
AI writes code
A narrow developer assistant.
A coding agent is a computer-operating agent.
An agent = model + harness.
The model thinks. The harness gives it a body, tools, and a workspace.
Model
Reasoning, language, planning, judgment.
Harness
Memory, tool calling, permissions, files, terminal, browser.
Workspace
The computer environment where work actually changes.
Agents become useful when they can act.
Even the basic file operations are already a big deal.
Read
Open files, notes, docs, transcripts, codebases, logs.
Write
Create summaries, reports, code, markdown pages, emails.
Create
Generate scripts, folders, databases, workflows, artifacts.
Delete / edit
Clean up files, refactor code, remove stale work — with permission boundaries.
The leap is from chatbot to operator.
Tools let agents do things.
Files are the baseline. CLIs and MCPs extend the agent’s reach.
File tools
Read, write, create, delete, search. This is the agent manipulating the workspace directly.
CLIs
Command-line tools compactly expose tested functions: scrape, query, transform, deploy, analyze.
MCPs
Connect agents to other systems. Think of MCPs as APIs designed for agents.
More tools can make agents worse.
Access is not free. Every always-loaded tool competes for limited attention.
Loaded with intent
Loaded with everything
Tool bloat becomes context rot. Context rot becomes bad judgment.
MCPs → CLIs → Skills
The useful abstraction moves upward: from access, to execution, to judgment.
Expose systems and APIs to the agent.
Expose compact, testable actions.
Expose procedure: when to use tools, what good looks like, and how to verify.
Skills are not another hand. They are the operating procedure.
A skill is an SOP for an agent.
Not magic. Usually just markdown that tells the agent how to perform a task.
What a skill can contain
- When to use this workflow
- Step-by-step procedure
- Tools, CLIs, MCPs to use
- Expected output format
- Pitfalls and verification checks
- Optional scripts packaged with the skill
A skill routes the agent to the right tools.
The skill is the judgment layer. CLIs, MCPs, APIs, files, and scripts are action surfaces.
- Search recent news with Exa MCP
- Use LinkedIn/search CLI for leaders
- Save account intelligence to markdown
- Draft outreach with cited claims
- Verify sources before output
Compact local actions.
External systems.
Persistent workspace.
Reusable helpers.
Tools do the work. Skills decide the workflow.
Skills reduce context bloat by loading in stages.
Tiny summaries stay visible. Full workflow instructions enter context only when needed.
1. Skill index
Short descriptions are always available.
2. Match request
“Find the best deal for this product.”
chosen: shopping-search
3. Load SOP
Full markdown, tools, pitfalls, scripts, and verification checks load only now.
Capability stays available without stuffing the context window.
A plugin is a bundle of skills.
One skill teaches one workflow. A plugin packages a whole operating system of workflows.
Skill
One reusable SOP for a specific task.
Plugin
A group of skills packaged together around a domain.
Example
Superpowers packages software-development practices into agent skills.
https://github.com/obra/superpowers
Superpowers encodes an SDLC.
It helps the agent follow a software-development lifecycle instead of vibe-coding.
Brainstorm
Clarify questions, requirements, codebase, research.
Design spec
Overall structure, functionality, system behavior.
Implementation plan
Granular libraries, code changes, task breakdown.
Subagents execute
Work through the tasklist with focused contexts.
Methodology beats vibes.
Subagent-driven development
Your main chat becomes the orchestrator. Subagents do focused work in fresh contexts.
Orchestrator Agent
Reads the spec, decomposes tasks, assigns work, reviews summaries.
Task A
Fresh context
Clear goal
Independent work
Task B
Parallel if possible
Blocked if needed
Spec-driven
Task C
Uses only relevant files
Less context rot
Less drift
Review
Checks output
Returns evidence
Compresses findings
Subagents are context hygiene.
Not just parallel processing. They keep the main agent from turning into a junk drawer.
One giant chat
Orchestrated work
Implemented parser. Returned diff + tests.
Reviewed edge cases. Returned risks only.
Checked docs. Returned citations.
Superpowers is one version of the pattern.
Other plugins are also trying to encode SDLC into agent workflows.
Get Shit Done
GSD packages an opinionated build workflow.
github.com/gsd-build/get-shit-done
BMAD Method
Breakthrough Method for Agile Development.
github.com/bmad-code-org/BMAD-METHOD
oh-my-claude-code
Another ecosystem of Claude Code workflows and conventions.
github.com/yeachan-heo/oh-my-claudecode
Skills are bottled expertise.
Once the pattern clicks, a skill becomes a way to package domain judgment.
Trading
Research and execution workflows.
Startup advice
Office hours, design reviews, YC-style judgment.
Sales prep
Account research and stakeholder intelligence.
Health
Daily recovery interpretation from personal data.
The interesting question becomes: whose judgment can be packaged?
Building your own skills is not that difficult.
The hard part is not markdown. The hard part is noticing your own workflows.
Processes
What steps do I follow?
Preferences
How do I want this done?
Best practices
What mistakes should the agent avoid?
Daily health monitoring skill
A personal WHOOP-like recovery brief built from my own health data.
Input
Path to my iCloud health database.
Agent work
Claude Code built a Python script and SQLite DB to import and analyze sleep, HR, and exercise.
Output
A daily recovery report: how recovered I am, and how hard I should push exercise today.
Daily X / Twitter research skill
Turning my content diet into an automated research brief.
My interests
AI research, robotics, computer vision, frontier labs, OSS models, agentic coding.
Access pattern
The agent used my already-logged-in browser session by extracting session variables like ct0 and auth_token.
Daily output
Scrape posts I care about, summarize them, and separate signal from noise.
Important caveat: browser cookies and auth tokens are sensitive. Treat agent access like privileged operator access.
Shopping research orchestrator
One shopping task becomes four platform-specific search skills plus an orchestrator.
Teach each platform
Use Playwright CLI to open the site, search for a product, and capture useful results.
Package each workflow
Once it worked, ask Claude Code to save it as a platform search skill.
Orchestrate
Create a higher-level skill that spawns four subagents to run the platform skills in parallel.
Salesforce business relations sales skills
A sales research workflow for preparing better client outreach.
Company research
Understand the account, business model, news, and strategic priorities.
C-suite research
Find leaders, public statements, LinkedIn/news signals, and pain points.
Product fit
Map client problems and tech stack to relevant Salesforce products.
Outbound assets
Whitepaper proposal and tailored email draft for each C-suite stakeholder.
Corporate banking RM prep skills
Same workflow pattern, different domain vocabulary.
Before meetings
Research the client account, leadership, business context, and current problems.
Tailor to the role
The skills are similar to Salesforce sales research, but adapted for OCBC corporate banking workflows.
The general pattern
Domain research → stakeholder intelligence → pain points → meeting strategy.
Skills travel across domains when the workflow shape is similar.
Karpathy’s autoresearch
Agentic coding applied to machine-learning experimentation.
Goal
Reduce validation loss / improve evaluation score.
Playground
The agent can modify the training program and run experiments.
Feedback
An evaluator grades whether the experiment worked.
https://github.com/karpathy/autoresearch
Autoresearch works because the game is well-defined.
The agent gets freedom inside the playground, but the evaluator keeps score.
program.md
Human direction.
Research priorities, constraints, taste, and what to try next.
train.py
Agent playground.
The implementation the agent is allowed to rewrite and run.
eval.py
Immutable score.
The external judge that turns vague “better” into measurable feedback.
The eval turns exploration into a game.
AWS Agentic AI Hackathon: autoresearch with a scoreboard.
We won first place by treating the agent like an experiment runner, not a magic chatbot.
AWS provided the playground
Pet store chatbot, RAG PDFs, and Lambda tool calls already wired up.
Our variables
System prompt, guardrails, deployment config, and a few tool/format experiments.
The score function
A hidden eval dashboard scored robustness. Max score: 1000.
Eval score over iterations
Max possible score: 1000
That is autoresearch: give the agent a playground, a score, and iteration loops.
The human became the “PhD advisor.”
The agent ran experiments. The human shaped the problem, supplied context, and judged what to try next.
Human
- Frames the problem
- Feeds context and constraints
- Interprets score changes
- Kills clever ideas that regress
Agent
- Reads context folder
- Proposes changes
- Deploys and tests
- Reports evidence back
The loop was: propose → deploy → score → adjust.
The new skill is designing feedback loops.
Problem formulation, constraints, and scoring functions become the human leverage point.
Task
What game are we playing?
Agent action
Generate, deploy, post, test.
Environment
Users, evals, markets, codebase.
Measurement
Loss, CTR, conversion, score.
Human judgment
Interpret signal. Adjust constraints.
Next run
Better prompt, tool, or workflow.
Stop treating one chat as memory.
A chat is a workspace with context limits. A skill is the reusable package.
One forever chat
Reusable skill
Do the task once. Package the workflow. Reuse it with new variables.
For browser agents, use a headed browser.
If you want to see what the agent is doing, ask it to drive a live browser through playwright-cli.
Invisible automation
Clicks, waits, failures, retries — and you only see logs after the fact.
Use it for web QA, shopping research, form flows, and platform-specific browser skills.
Start with a strong CLAUDE.md file.
This is the project manual loaded into every coding-agent session.
- Project structure
- Coding conventions
- Preferred libraries
- Testing patterns
- Architectural decisions
Without it
The agent re-learns your project from scratch every session.
Recommended example
Andrei Karpathy's CLAUDE.md
github.com/forrestchang/andrej-karpathy-skills
If you need memory, write files.
Agent memory is increasingly file-based: markdown, specs, wikis, and project instructions.
Memory that survives the chat has to live somewhere.
Scope your skills deliberately.
Some skills should be global. Some should live only inside a project repo.
Global
Saved in places like ~/.claude/skills. Useful everywhere.
Project
Saved inside a repo, e.g. .claude/, when the workflow belongs to that codebase.
Shared
Repo folders like .claude, .codex, .openclaw make workflows portable.