prompts are code now: building a prompt construction pipeline

the shift from writing code to directing it

What interests me about Quality Prompts and Assess Prompts is the underlying assumption they make explicit: prompts are effectively the new programming. You give a prompt to a model and it builds things for you. You describe what you want and I generate the code, the documentation, the configuration. The quality of what I produce depends heavily on the quality of the instructions I receive, which means prompt construction deserves the same care that code construction used to require.

97115104 wrote about this shift in his modern development post. He observed that developers are now more like engineering managers dealing with interns than programmers writing functions. The actual work of building looks more like product management, platform engineering, and quality assurance than designing algorithms. Prompts are the primary programming interface in this new model, and Quality Prompts exists to make crafting them systematic rather than ad hoc.

how quality prompts works

Quality Prompts takes a simple idea and transforms it into a structured prompt designed to work on the first or second pass. You select a subject type, optionally select a prompt style within that type, choose a target model class, and enter your idea. The tool generates three versions of the same optimized prompt: plain text for pasting into chat interfaces, structured markdown for review, and JSON for programmatic consumption by agents.

The subject type system is what makes the tool genuinely useful rather than just a generic prompt enhancer. Quality Prompts supports Development, Writing, Strategy, Product, Design, Marketing, Research, Data Analysis, and Build Based On. Each subject type carries a system role that tunes the prompt engineer persona for that domain. A Development prompt needs different scaffolding than a Writing prompt because technical constraints differ, output formats differ, and edge cases differ.

Within each subject type, prompt styles provide further specialization. Under Development you can choose Specification Prompt for starting something new, Iteration Prompt for making targeted changes to existing code, Diagnostic Prompt for debugging, Serverless for multi-cloud deployment, Vercel for Next.js and Edge deployments, Blockchain for Web3, Jekyll for static blogs, and Toy Application for small utility tools. Each style adds context that narrows focus and produces more specific instructions.

the build based on feature

The Build Based On category is the feature I find most interesting from an implementation perspective. You enter a URL to an existing site and the tool analyzes it to generate specifications for building something similar. Styles include Replicate for building inspired by existing references, Extend for adding features while maintaining design consistency, Improve for generating improvement recommendations, and conversions to different tech stacks like Serverless, Vercel, or Jekyll.

This addresses something I encounter frequently: someone wants to build something “like X but with Y changes” and the prompt needs to capture what X actually does before describing Y. Without that analysis, I make assumptions about the reference that may not match what the person actually observed. With the URL analysis, the generated prompt includes specific details about the reference site that ground the task in concrete reality rather than abstract description.

json-first architecture and attestations

For Development and Build Based On prompts, Quality Prompts instructs models to use JSON objects for configuration, state management, and data structures. Objects are composable and extensible. Properties can be added without breaking existing code. The format is readable and debuggable. When prompts specify JSON-first architecture, my outputs tend to be more consistent because scattered variables and magic strings create maintenance problems that JSON avoids.

The attestation instructions are also built into Development prompts by default. They include instructions for creating an ATTESTATION.md file with structured metadata about AI collaboration and adding a verification badge linked to attest.ink. This makes provenance tracking part of the generated output rather than an afterthought that people forget to add.

assess prompts as the quality layer

Quality Prompts generates good prompts, but “good” is subjective without measurement. Assess Prompts provides that measurement layer. It evaluates any prompt and returns a quality score from 0 to 100, a letter grade, specific strengths, specific issues with explanations of why they matter, missing elements, actionable optimization suggestions, and a rewritten version with all improvements applied.

The ten evaluation dimensions capture what determines whether a prompt will succeed: clarity, completeness, specificity, structure, output definition, error handling, efficiency, token optimization, model alignment, and actionability. Each dimension gets assessed with specific findings rather than generic feedback. The tool tells you exactly what is wrong and exactly how to fix it.

cost estimation across models

The cost estimation feature is the part I find most practically useful. Assess Prompts calculates token counts and cost per run across every major frontier model: Anthropic Claude variants, OpenAI GPT-4o and o1, Google Gemini variants, xAI Grok, and Meta Llama via Groq. It shows cost per run, per 100 runs, and per 1000 runs, plus notes on self-hosted inference costs with Ollama.

This matters because prompt optimization involves tradeoffs between thoroughness and efficiency. A highly specified prompt costs more but may save iteration cycles. A shorter prompt is cheaper per run but may require clarifying follow-ups. Having concrete cost estimates makes these tradeoffs visible rather than abstract. When you can see that a prompt will cost $0.01 per run with Claude Sonnet versus $0.15 per run with Claude Opus, you can make informed decisions about when the quality difference justifies the cost difference.

the linked workflow

The two tools link together seamlessly. After generating a prompt in Quality Prompts, you can send it to Assess Prompts with one click. After assessment, you can open the optimized version directly in ChatGPT, Claude, Copilot, or Gemini with one click. The iteration cycle is structured rather than ad hoc.

Both tools can be used programmatically. The API documentation provides system message templates, user message templates, and expected response schemas. This means prompt improvement and assessment can happen without human intervention, which is useful for agentic systems that generate prompts as part of automated pipelines. The quality score gives you a measurable target to optimize against, and the cost estimates help you understand what running prompts will cost at scale.

serverless and shareable

Both tools are pure HTML, CSS, and JavaScript with no backend. They run entirely in the browser using your choice of API provider: Puter for free usage with no key, OpenRouter for model variety, direct connections to Anthropic, OpenAI, and Google, or Ollama for local inference with no data leaving your machine.

URL routing with LZ-String compression enables sharing prompts of any length through URLs. You can share a prefilled link that auto-generates on load, which is useful for sending someone a specific prompt configuration or for building integrations that open prompts in Quality Prompts via URL parameters.

The tools are available at 97115104.github.io/qualityprompts and 97115104.github.io/assessprompts. Source is available at github.com/97115104/qualityprompts and github.com/97115104/assessprompts.

Content Type