March 11, 2026

Hardcoded Credentials Detection: How to Find Secrets with Pattern and Entropy Analysis

What Are Hardcoded Credentials and Why Are They Dangerous?

Hardcoded credentials are API keys, tokens, passwords, and private keys embedded directly in source code as string literals. They are dangerous because version control systems preserve them in commit history indefinitely, automated bots scrape repositories for credential patterns within minutes of exposure, and a single leaked key can compromise cloud accounts, databases, and third-party services.

A developer who writes const apiKey = "sk-proj-abc123..." creates a credential that persists in every copy of the repository. Git stores the complete file contents for every commit, so git rm removes the key from the working tree but not from history. Anyone who clones the repository can extract the key from prior commits.

AI coding tools generate hardcoded credentials routinely. AI copilots generate code with embedded credentials because their training data contains millions of examples where developers hardcoded keys during testing. A Cursor or Copilot suggestion for an API integration frequently includes a realistic-looking key value that the developer may not recognize as a placeholder versus a real credential.

The cost of a leaked credential scales with the service it accesses. AWS keys enable cryptocurrency mining that generates five-figure cloud bills. OpenAI keys enable inference abuse. GitHub tokens expose private repositories and CI/CD pipelines. Database passwords expose customer data.

How Does Pattern Matching Detect Known Credential Formats?

Pattern matching detects known credential formats using regex rules calibrated with confidence scores. Vibe Owl runs five pattern detectors: OpenAI API keys (sk- prefix, 95% confidence), GitHub tokens (five gh*_ prefix formats, 95%), AWS access key IDs (AKIA prefix, 90%), private key blocks (-----BEGIN...PRIVATE KEY-----, 100%), and generic secret assignments (70% base confidence).

Each detector runs on every file open, edit, and save event. The OpenAI detector matches sk-[A-Za-z0-9]{20,}, catching both legacy sk- keys and project-scoped sk-proj- keys. The AWS detector matches AKIA[0-9A-Z]{16}, the standard 20-character access key format used across all AWS services.

Confidence scoring maps directly to severity. Scores of 0.95 and above produce critical findings. Scores between 0.80 and 0.94 produce high findings. Context keywords in surrounding text — secret, token, password, api_key, credential — add a 0.10 bonus. Sensitive file extensions (.env, .pem, .key) add another 0.10 bonus.

The generic assignment detector catches credentials that lack recognizable prefixes. Variables named api_key, token, secret, or password assigned string values of 8+ characters trigger at 70% base confidence. Overlapping findings are deduplicated — when a higher-confidence detector already covers the same location, the generic finding is suppressed.

How Does Entropy-Based Detection Find Credentials Without Known Patterns?

Entropy-based detection calculates Shannon entropy for string values between 20–200 characters with 10+ unique characters. Shannon entropy measures randomness using the formula -Σ (p_i × log2(p_i)) across character frequency distributions. Strings with entropy above 3.5 bits per character exhibit the high randomness characteristic of cryptographic keys and tokens.

Real credentials generated from cryptographic random sources exhibit entropy between 4.0 and 5.5 bits per character. English words and code identifiers exhibit entropy between 2.0 and 3.5 bits. The 3.5 threshold separates these distributions with minimal overlap, catching genuine credentials while passing through normal code strings.

The entropy scanner filters known false-positive patterns. Strings containing localhost, example, test, dummy, or changeme are skipped regardless of their entropy score. Placeholder values like __REPLACE_ME__, YOUR_TOKEN_HERE, and change-me are also excluded from detection.

Entropy detection catches credentials from providers that Vibe Owl does not have dedicated pattern rules for — Stripe keys, Twilio tokens, SendGrid API keys, and custom authentication tokens. Any high-entropy string assigned to a variable in a configuration context triggers review. Secret scanning in VS Code combines pattern matching and entropy analysis to maximize detection coverage.

How Do You Remove Hardcoded Credentials from Code?

Removing hardcoded credentials requires extracting the value to an environment variable using the correct language syntax, adding the actual value to .env, syncing .env.example with a redacted placeholder, verifying .env is in .gitignore, and scanning git history to confirm the credential does not persist in prior commits.

Vibe Owl's quick-fix code action automates the extraction step. The Extract to env placeholder action replaces the hardcoded value with the correct environment variable syntax for 11 languages: process.env.API_KEY for JavaScript/TypeScript, os.environ["API_KEY"] for Python, os.Getenv("API_KEY") for Go, System.getenv("API_KEY") for Java, std::env::var("API_KEY") for Rust, and equivalent patterns for C#, Ruby, PHP, Shell, and Swift.

The .env.example sync command ensures extracted variables are documented. Sensitive variables (names containing SECRET, KEY, TOKEN, or PASSWORD) receive redacted __REPLACE_ME__ placeholder values with guidance comments. The env file safety audit detects missing variables, hardcoded values in env files, and synchronization gaps between .env and .env.example.

How Does the False-Positive Trainer Reduce Alert Fatigue?

The false-positive trainer uses local machine-learning-style suppression to reduce repeated false positives. When a developer suppresses or downgrades a finding, the trainer records a fingerprint combining the detector ID, title, and evidence pattern. After the configured number of confirmations — 2 for low severity, 5 for medium — similar findings are auto-suppressed.

Evidence fingerprinting normalizes the detected string: lowercase, numbers replaced with 9, letters a–f replaced with a, letters g–z replaced with x, whitespace removed. This normalization groups structurally similar strings together, so suppressing one instance of a test API key pattern suppresses future instances that match the same structure.

The trainer stores up to 500 entries in .vibe-owl-learning.json within the workspace. Medium findings can be downgraded to low before reaching the full suppression threshold, reducing their visual prominence without removing them entirely. The Reset False-Positive Trainer Data command clears all learned patterns if the suppression behavior becomes too aggressive.

The allowlist provides manual suppression for specific locations. Adding a finding to .vibe-owl-allowlist.json requires a user-provided reason, creating an audit trail. Allowlisted findings are excluded from all scans, diagnostics, and the workspace health score. Preventing API key leaks requires balancing detection sensitivity with developer productivity — the trainer and allowlist provide this balance.

How Does Workspace-Wide Scanning Find Credentials Across a Project?

Workspace-wide scanning runs pattern matching and entropy analysis against every file in the project, respecting exclude globs and file size limits. The Vibe Owl: Scan Workspace for Secrets command produces a consolidated report with findings grouped by file and severity, covering credentials that live scanning may not have processed because the files were never opened.

The workspace scan applies the same five pattern detectors and entropy analyzer used for live scanning. Files matching default exclude globs — node_modules, .git, dist, build, lockfiles, and binary files — are skipped. Files exceeding the maximum size (default 512 KB) are also excluded. Maximum 120 findings per scan prevent output flooding in large projects.

The workspace health score aggregates all credential findings into a 0–100 score. Critical findings (confidence >= 0.95) deduct 18 points. High findings deduct 10. Medium deduct 4. Low deduct 1. The VS Code security extension tracks this score over time with trend analysis, showing whether the project's credential hygiene is improving or degrading across successive audits.