Taming the RNG: Building Skills For Your AI Agent
- Written by John
- Feb 17th, 2026
We all know that LLMs (Large Language Models) are somewhat inconsistent and require a little bit of steering. It’s in their nature to output the same context differently each time you interact with an LLM. They are nothing more than large, glorified text predictors, but are pretty good at reasoning. This might come as news to you, but yes, LLMs attempt to predict the next word and are VERY good at it.
Historically, you would focus on creating your “perfect” prompt to combat the inconsistency that comes with LLMs through prompt engineering. Creating a “perfect” prompt is difficult for the average person, especially when it comes to coding. This is where “Skills” can be used to ensure your outputs from LLMs are as consistent as they can be. However, this isn’t a silver bullet.
Using Agentic CLIs
You might be thinking, ‘What is an agentic CLI?’ For something to be agentic, it needs to make its own decisions. As agentic CLIs integrate with LLMs, they are able to understand the current context and the goal and will attempt to complete the goal, even if there are unexpected scenarios along the way.
Here are a few examples of agentic CLIs: opencode, Gemini CLI and Claude Code. There are many positives to using agentic CLIs; they integrate tightly into your local code or files, and they do a lot of heavy lifting for you, like running commands or creating code. However, the main downside is that they are very inconsistent if you don’t provide enough context and guardrails for the LLM. Remember, LLMs are predictors and have the freedom to “generate” their responses. They are not deterministic tools.
Improving Consistency
Traditionally, you’d try to improve the consistency of LLM outputs by writing a set of global or project instructions to inform the agentic CLI and LLM of the context and guardrails that you require. However, this is hit and miss, and it becomes very difficult to create a set of instructions for every kind of scenario (documentation, code reviews, etc).
This is where the agentic CLI skills come in.
Each agentic CLI has its own skills implementation: Claude Code, opencode, ChatGPT and Gemini CLI. The way these skills function is similar to an instruction file. However, the instructions are very specific and are triggered using a certain phrase or keyword. This does mean that for tasks that require deterministic outputs, like documentation, you are able to put enough guardrails in place to be able to create the same or similar output, removing some of that generative freedom from the LLM.
Getting Started With Skills
Ironically, I have found the best way to create skills is to use an LLM. LLMs are fantastic at generating a set of instructions for other LLMs or systems. What I do to get started is use a prompt like so:
I’d like to create a [skill](https://opencode.ai/docs/skills) that conducts a code review of the current codebase. It needs to adhere to the best practices of the language. Focus on code quality, code maintainability, performance and security. The skill can also use the Context7 MCP server to confirm and validate language best practices and implementations. For the output, include the following headers: 1. Executive Summary (1-2 paragraphs), 2. Key Points (include a description of each point in 1-2 sentences), 3. Identified Issues (include a description of each point in 1-2 sentences; include a detailed description of the code, file including line number(s)), 4. Actions (include a description of 1-2 sentences of what needs fixing and how to fix it). For the Key Points academically grade each item (A+, A, A-, B+, etc). For the Identified Issues apply a criticality rating to understand how good or bad an issue is (Critical, High, Medium, Low). For the Actions determine when the the action needs to be actioned or started by; use the Scrum/Agile terminology (Now, Next Sprint, Backlog, etc). Output the report into Markdown.
Usually, there is a back and forth with the LLM to agree on what the exact output will look like; you’ll be asked to clarify details. Be as specific as you need to be to guide the LLM to the desired output. You can ask it to output the skill into the console window instead of creating the file straight away, allowing you to review and verify the contents of the skill, ensuring it meets your needs.
You’ll end up with something that looks like this:
## Review Process
### Step 1: Language Detection
Identify all programming languages in the codebase by examining file extensions and project configuration files (e.g., `package.json`, `requirements.txt`, `Cargo.toml`, `go.mod`, `pom.xml`).
### Step 2: Context7 Validation
For each detected language, query Context7 MCP to retrieve:
- Current best practices and coding standards
- Common anti-patterns to avoid
- Security guidelines specific to the language
- Performance optimisation techniques
**Context7 Library IDs for Reference:**
- Python: `/websites/python_3_10` or `/python/cpython`
- JavaScript: `/javascript-tutorial/en.javascript.info`
- Go: `/golangci/golangci-lint`
- Rust: `/rust-lang/rust` or `/websites/doc_rust-lang_book`
- Java: `/websites/geeksforgeeks_java` or `/google/guava`
### Step 3: Codebase Analysis
Systematically examine:
- Code structure and organisation
- Naming conventions and readability
- Error handling and edge cases
- Security vulnerabilities
- Performance bottlenecks
- Test coverage and quality
- Documentation completeness
### Step 4: Report Generation
Compile findings into the four-section report format.
## Output Format
### 1. Executive Summary
Provide 1-2 concise paragraphs covering:
- Overview of the codebase (languages, size, purpose)
- Overall quality assessment (e.g., "Codebase demonstrates solid fundamentals with room for improvement in error handling and documentation")
- Key strengths and primary concerns
- Overall grade recommendation
### 2. Key Points
List important observations about the codebase, assigning an academic grade to each:
**Academic Grading Scale:**
- **A+**: Exceptional, exemplary code that exceeds all standards
- **A**: Excellent, exceeds expectations, minimal improvements needed
- **A-**: Very good, meets high standards with minor opportunities
- **B+**: Good, meets expectations, some areas for enhancement
- **B**: Acceptable, adequate quality, noticeable improvements needed
- **B-**: Below average, functional but has issues
- **C+**: Fair, significant issues that impact maintainability
- **C**: Poor, major refactoring required
- **D**: Very poor, critical issues throughout
- **F**: Failing, unacceptable quality, requires complete rework
**Format:**
Grade - Category: Brief Title
1-2 sentence description of the observation and its impact
### 3. Identified Issues
List all issues grouped by criticality. Each issue must include:
**Criticality Ratings:**
- **Critical**: Security vulnerabilities, data loss risks, broken core functionality, legal/compliance violations
- **High**: Significant maintainability impact, major performance degradation, missing critical error handling
- **Medium**: Code quality issues, missing documentation, minor performance concerns, inconsistent patterns
- **Low**: Style preferences, minor optimisations, nitpicks and suggestions
**Issue Format:**
Criticality - Focus Area(s): Issue Title
- Location: file/path.ext:line-number(s)
- Description: Detailed description of 1-2 sentences explaining the issue and its impact
- Code Context: Brief snippet or explanation of the problematic code
### 4. Actions
Provide actionable remediation steps for each identified issue:
**Agile Timelines:**
- **Now**: Block merge, fix immediately before proceeding
- **This Sprint**: Must complete within current sprint iteration
- **Next Sprint**: Schedule for upcoming sprint, plan accordingly
- **Backlog**: Tech debt or nice-to-have, address when capacity allows
**Action Format:**
Timeline - Issue Title
- What to Fix: 1-2 sentence description of what needs to be changed
- How to Fix: Specific guidance on implementation, referencing best practices
- Effort Estimate: Small/Medium/Large
## Focus Areas
Every issue must be tagged with one or more focus areas:
1. **Code Maintainability**: Readability, structure, documentation, complexity, ease of future changes
2. **Security**: Vulnerabilities, secrets exposure, authentication, authorisation, data protection
3. **Performance**: Efficiency, resource usage, scalability, response times, bottlenecks
4. **Code Quality**: Best practices, design patterns, consistency, error handling, testability
## Language-Specific Guidelines
### Python
**Key Standards:**
- PEP 8 compliance (naming conventions, line length, imports)
- Type hints (PEP 484) for function signatures
- Docstrings following PEP 257
- Use of context managers (`with` statements)
- Exception hierarchy best practices
**Common Issues:**
- Missing type hints (Medium)
- Using bare `except:` clauses (High)
- Mutable default arguments (Critical)
- Hardcoded secrets in code (Critical)
- SQL injection vulnerabilities (Critical)
### JavaScript/TypeScript
**Key Standards:**
- TypeScript strict mode conventions
- Use `const` by default, `let` when needed, avoid `var`
- Async/await over promise chains
- Optional chaining (`?.`) and nullish coalescing (`??`)
- Proper error handling with try/catch
**Common Issues:**
- Using `any` type excessively (Medium)
- Missing error handling in async functions (High)
- XSS vulnerabilities from unsanitised input (Critical)
- Memory leaks from event listeners (High)
- Callback hell instead of async/await (Medium)
### Go
**Key Standards:**
- Effective Go guidelines
- Use of `gofmt` for formatting
- Error handling: return errors, don't panic
- Small, focused interfaces
- Context usage for cancellation
**Common Issues:**
- Ignoring error returns (High)
- Using panic for normal error flow (Medium)
- Race conditions (Critical)
- Goroutine leaks (High)
- Not using context for cancellation (Medium)
### Rust
**Key Standards:**
- Ownership and borrowing rules
- Error handling with `Result` and `Option`
- Unsafe code minimisation
- Idiomatic naming conventions
- Documentation with `rustdoc`
**Common Issues:**
- Unnecessary `unsafe` blocks (Critical)
- Ignoring `Result` types with `unwrap()` (High)
- Cloning when borrowing would suffice (Medium)
- Blocking operations in async code (High)
### Java
**Key Standards:**
- SOLID principles adherence
- Proper exception hierarchy
- Resource management (try-with-resources)
- Immutable objects where possible
- Java 8+ features (streams, optionals)
**Common Issues:**
- Raw type usage without generics (Medium)
- SQL injection through string concatenation (Critical)
- Not closing resources properly (High)
- Null pointer exceptions from missing null checks (Medium)
## Examples
### Example Key Point (A- Grade)
A- - Code Maintainability: Clean Architecture
The project demonstrates good separation of concerns with well-defined module boundaries. Minor improvements could be made by extracting utility functions into a dedicated utilities module.
### Example Identified Issue (Critical)
Critical - Security: SQL Injection Vulnerability
- Location: app/database.py:45-48
- Description: User input is directly concatenated into SQL query without parameterisation, allowing attackers to inject malicious SQL commands.
- Code Context: query = "SELECT * FROM users WHERE id = " + user_id
### Example Action (Now Timeline)
Now - SQL Injection Vulnerability
- What to Fix: Replace string concatenation in database queries with parameterised queries using SQLAlchemy or prepared statements.
- How to Fix: Use cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,)) instead of string formatting. Validate all user inputs against expected formats.
- Effort Estimate: Small
## Review Checklist
Before finalising the report, verify:
- [ ] Executive summary captures overall assessment in 1-2 paragraphs
- [ ] All Key Points have academic grades assigned (A+ to F)
- [ ] All Identified Issues have Focus Area(s) tagged
- [ ] All Identified Issues have Criticality ratings assigned
- [ ] All Actions have Agile timelines (Now/This Sprint/Next Sprint/Backlog)
- [ ] Issue descriptions are clear with file paths and line numbers
- [ ] Action recommendations are specific and reference best practices
- [ ] Language-specific standards from Context7 have been validated
- [ ] Report is formatted in Markdown with proper headings and structure
## Guidelines for Academic Grading
**Grade Distribution Guidelines:**
- Most codebases should receive mostly B to A- grades
- Reserve A+ for truly exceptional code
- Use C and below only when significant issues exist
- Consider context: a startup MVP might warrant different standards than enterprise production code
**When to Upgrade/Downgrade:**
- Upgrade if code demonstrates exceptional patterns or innovative solutions
- Downgrade if issues are systemic or indicate lack of understanding of language fundamentals
- Consider the severity and impact of issues when assigning grades
## Criticality Assignment Rules
- Most issues should be **Medium** or **Low**
- Reserve **Critical** for genuine blockers only (security, data loss, broken functionality)
- Use **High** for issues that significantly impact the codebase or user experience
- When in doubt, downgrade the criticality
## Timeline Assignment Rules
- **Critical** issues → **Now**
- **High** issues → **This Sprint**
- **Medium** issues → **Next Sprint**
- **Low** issues → **Backlog**
- Consider team capacity and sprint goals when assigning timelines
The LLM won’t get it right the first time. Review the skill and amend accordingly. You’ll find it’ll take a few iterations and uses of the skill to understand how you require certain items to be formatted. For example, add a ## Guidelines section to act as a global guidelines, if there isn’t one. You can add your preferred output language, ensuring ordered or bulleted lists end with a period (.), etc, to this section.
To summarise, Agent Skills are the bridge between LLM creativity and the consistency developers actually need. They aid in repeatable tasks, triggering when required and providing a deterministic output. Why don’t you give it a whirl and create your own?