html-structure-validate by aiskillstore
Validate HTML5 structure and basic syntax. BLOCKING quality gate - stops pipeline if validation fails. Ensures deterministic output quality.
Content & Writing
85 Stars
2 Forks
Updated Jan 19, 2026, 04:39 AM
Why Use This
This skill provides specialized capabilities for aiskillstore's codebase.
Use Cases
- Developing new features in the aiskillstore repository
- Refactoring existing code to follow aiskillstore standards
- Understanding and working with aiskillstore's codebase structure
Install Guide
2 steps- 1
Skip this step if Ananke is already installed.
- 2
Skill Snapshot
Auto scan of skill assets. Informational only.
Valid SKILL.md
Checks against SKILL.md specification
Source & Community
Skill Stats
SKILL.md 437 Lines
Total Files 1
Total Size 0 B
License NOASSERTION
---
name: html-structure-validate
description: Validate HTML5 structure and basic syntax. BLOCKING quality gate - stops pipeline if validation fails. Ensures deterministic output quality.
---
# HTML Structure Validate Skill
## Purpose
This skill is a **BLOCKING quality gate** that ensures generated HTML meets minimum structural requirements. It is the **first deterministic validation** of probabilistic AI-generated output.
The skill checks:
- **HTML5 compliance** - Proper DOCTYPE, tags
- **Tag closure** - All tags properly closed
- **Required elements** - Meta tags, stylesheet links
- **Well-formedness** - Valid structure
If validation fails, the pipeline **STOPS** and triggers a hook to notify the user.
This enforces the principle: **Python validates, ensuring deterministic quality**.
## What to Do
1. **Load HTML file to validate**
- Read `04_page_XX.html` generated by AI skill
- Verify file exists and is readable
- Confirm file is text (not binary)
2. **Run validation checks**
- Check HTML5 structure compliance
- Verify tag closure
- Validate head section
- Check required CSS link
- Validate page container structure
3. **Generate validation report**
- Document all checks performed
- List any errors found
- Note warnings (non-blocking)
- Record informational findings
4. **Save validation report** as JSON
- Save to: `output/chapter_XX/page_artifacts/page_YY/06_validation_structure.json`
- Include timestamp
- Include all check results
5. **Exit with appropriate code**
- Return 0 if VALID (continue pipeline)
- Return 1 if INVALID (STOP pipeline, trigger hook)
## Input Parameters
```
html_file: <str> - Path to 04_page_XX.html
output_dir: <str> - Directory for validation report
strict_mode: <bool> - If true, warnings also fail (default: false)
page_number: <int> - Page number (for reporting)
chapter: <int> - Chapter number (for reporting)
```
## Validation Checks
### Check 1: DOCTYPE Declaration
**Requirement**: File must start with proper DOCTYPE
```html
<!DOCTYPE html>
```
**Check**:
- [ ] File contains `<!DOCTYPE html>` (case-insensitive)
- [ ] DOCTYPE appears before any tags
- [ ] DOCTYPE is on first line or near beginning
**Error if**: Missing or incorrect DOCTYPE
### Check 2: HTML Tags
**Requirement**: Proper `<html>` opening and closing tags
```html
<html lang="en">
...
</html>
```
**Checks**:
- [ ] `<html>` tag present
- [ ] `</html>` closing tag present
- [ ] Tags are properly paired
- [ ] No unclosed `<html>` tags
**Error if**: Missing either tag or improperly paired
### Check 3: Head Section
**Requirement**: Complete `<head>` section with metadata
```html
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>...</title>
<link rel="stylesheet" href="../../styles/main.css">
</head>
```
**Checks**:
- [ ] `<head>` and `</head>` tags present
- [ ] `<meta charset="UTF-8">` present
- [ ] `<meta name="viewport">` present (warning if missing)
- [ ] `<title>` tag with content present
- [ ] CSS `<link>` tag present with href attribute
**Error if**: Missing charset, title, or CSS link
**Warning if**: Missing viewport meta tag
### Check 4: Body Section
**Requirement**: Proper `<body>` tags with content
```html
<body>
<div class="page-container">
<main class="page-content">
...
</main>
</div>
</body>
```
**Checks**:
- [ ] `<body>` and `</body>` tags present
- [ ] `<div class="page-container">` present
- [ ] `<main class="page-content">` present inside container
- [ ] Body contains substantial content (> 100 bytes)
**Error if**: Missing tags or required container divs
### Check 5: Tag Closure Validation
**Requirement**: All tags must be properly closed
**Checks for**:
- Unmatched opening tags (e.g., `<p>` without `</p>`)
- Improper nesting (e.g., `<p><h2>text</h2></p>`)
- Self-closing tags used correctly (e.g., `<br/>`, `<img/>`)
- Comment blocks properly formatted (`<!-- -->`)
**Validation method**:
- Parse HTML into tree structure
- Verify all nodes properly matched
- Check nesting doesn't violate HTML5 rules
**Error if**: Any unmatched or improperly nested tags
### Check 6: Heading Tags (h1-h6)
**Requirement**: Valid heading hierarchy
```html
<h1>Chapter Title</h1>
<h2>Section Heading</h2>
<h3>Subsection</h3>
```
**Checks**:
- [ ] All heading tags properly closed
- [ ] First heading should be h1 (warning if not)
- [ ] Heading levels don't skip dramatically (h1 → h4 is suspicious)
- [ ] All headings have text content (not empty)
**Error if**: Heading tags improperly closed
**Warning if**: Suspicious hierarchy
### Check 7: Content Structure
**Requirement**: Meaningful content in page container
**Checks**:
- [ ] `<main class="page-content">` contains elements
- [ ] Content includes headings or paragraphs
- [ ] No completely empty content area
- [ ] Text nodes or elements present (> 100 words total)
**Error if**: No content or empty structure
### Check 8: List Integrity
**Requirement**: All lists properly structured
**Checks** for each `<ul>` or `<ol>`:
- [ ] List opening and closing tags matched
- [ ] List contains `<li>` elements
- [ ] All `<li>` tags properly closed
- [ ] `<li>` count matches opening/closing pairs
- [ ] No nested `<ul>` or `<ol>` improperly closed
**Error if**: Empty lists or unmatched `<li>` tags
### Check 9: Image and Link Tags
**Requirement**: Self-closing tags properly formatted
**Checks**:
- [ ] All `<img>` tags have `src` and `alt` attributes
- [ ] All `<a>` tags have valid `href` attributes
- [ ] Image paths don't have obvious errors (no broken syntax)
- [ ] Self-closing tags use proper syntax
**Warning if**: Images missing alt text or links missing href
### Check 10: Table Tags (if present)
**Requirement**: Proper table structure
**Checks**:
- [ ] `<table>`, `<tr>`, `<td>`, `<th>` tags properly nested
- [ ] All rows have consistent column counts
- [ ] Table headers and body properly structured
**Error if**: Malformed table structure
## Validation Report Format
### Output: `06_validation_structure.json`
```json
{
"page": 16,
"book_page": 17,
"chapter": 2,
"validation_type": "structure",
"validation_timestamp": "2025-11-08T14:34:00Z",
"overall_status": "PASS",
"error_count": 0,
"warning_count": 1,
"checks_performed": [
{
"check_name": "DOCTYPE Declaration",
"status": "PASS",
"details": "Valid HTML5 DOCTYPE found"
},
{
"check_name": "HTML Tags",
"status": "PASS",
"details": "Proper <html> opening and closing tags"
},
{
"check_name": "Head Section",
"status": "PASS",
"details": "All required meta tags and title present"
},
{
"check_name": "Body Section",
"status": "PASS",
"details": "Body and content structure valid"
},
{
"check_name": "Tag Closure",
"status": "PASS",
"details": "All tags properly matched and closed"
},
{
"check_name": "Heading Hierarchy",
"status": "PASS",
"details": "4 headings found, proper h1-h4 hierarchy"
},
{
"check_name": "Content Structure",
"status": "PASS",
"details": "Main content area contains 245 words across 3 paragraphs"
},
{
"check_name": "List Integrity",
"status": "PASS",
"details": "1 list with 3 items, all properly formed"
},
{
"check_name": "Image Tags",
"status": "PASS",
"details": "No images on this page"
},
{
"check_name": "Table Tags",
"status": "PASS",
"details": "No tables on this page"
}
],
"errors": [],
"warnings": [
{
"check": "Heading Hierarchy",
"message": "First heading is h2, typically should be h1 for page opening",
"severity": "LOW"
}
],
"summary": {
"total_checks": 10,
"passed": 9,
"failed": 0,
"warnings": 1,
"html_valid": true,
"tags_matched": true,
"content_substantial": true
}
}
```
## Validation Rules
### PASS Criteria
- DOCTYPE present and valid
- All required tags (`html`, `head`, `body`, `main`, `div.page-container`) present
- All tags properly closed and matched
- Title tag with content
- CSS stylesheet link present
- Content structure valid
- No structural errors
### FAIL Criteria (BLOCKS PIPELINE)
- Missing DOCTYPE
- Missing required tags
- Unmatched or improperly nested tags
- Missing title or CSS link
- Empty content
- Malformed lists or tables
### WARNING (Logged but doesn't block)
- Missing viewport meta tag
- First heading is not h1
- Large heading jumps (h1 → h4)
- Missing alt text on images
- Missing href on links
## Implementation: Using Python Script
This validation is performed by existing `validate_html.py` tool, run in **structure validation mode**:
```bash
cd Calypso/tools
# Validate single page HTML
python3 validate_html.py \
../output/chapter_02/page_artifacts/page_16/04_page_16.html \
--output-json ../output/chapter_02/page_artifacts/page_16/06_validation_structure.json \
--strict-structure
# Exit code:
# 0 = VALID (continue to next skill)
# 1 = INVALID (STOP pipeline)
```
## Hook Integration
When validation **FAILS**:
```bash
# Trigger hook: .claude/hooks/validate-structure.sh
# Receives:
# - Page number
# - HTML file path
# - Validation report path
# - Error details
# Hook behavior:
# - Log failure with details
# - Save error report
# - Notify user
# - STOP pipeline (no further processing)
```
## Error Recovery
**If validation fails**:
1. User reviews validation report
2. User identifies issue in AI-generated HTML
3. Options:
- Fix HTML manually and re-validate
- Re-run AI generation with improved prompt
- Review source extraction data for errors
- Proceed with caution (expert override)
## Quality Metrics
Validation provides metrics:
- Percentage of checks passing
- Error severity levels
- Content size (word count, element count)
- Structure complexity
These metrics feed into final quality reports.
## Success Criteria
✓ Validation completes successfully
✓ All structural checks pass (0 errors)
✓ Validation report saved in JSON format
✓ Exit code 0 returned (or 1 if invalid)
✓ Clear error messages if validation fails
## Next Steps After PASS
If validation passes:
1. All pages of chapter processed through this gate
2. **Skill 4** (consolidate pages) merges individual page HTMLs
3. **Quality Gate 2** (semantic validate) checks semantic structure
4. Continue through validation pipeline
## Next Steps After FAIL
If validation fails:
1. **PIPELINE STOPS**
2. Hook `validate-structure.sh` triggered
3. User receives error report with details
4. User must fix issues and retry
## Design Notes
- This is the **first deterministic quality gate**
- Uses proven `validate_html.py` tool
- Catches structural issues before semantic analysis
- Provides clear, actionable error messages
- Essential for ensuring pipeline reliability
## Testing
To test structure validation:
```bash
# Test with known-good HTML
python3 validate_html.py ../output/chapter_01/chapter_01.html
# Should show: ✓ VALID
# Test with invalid HTML (if needed)
python3 validate_html.py broken_html.html
# Should show: ✗ INVALID with specific errors
```
Name Size