coot-model-building by pemsley
Best Practices for Model-Building Tools and Refinement
Testing
153 Stars
53 Forks
Updated Jan 18, 2026, 09:04 PM
Why Use This
This skill provides specialized capabilities for pemsley's codebase.
Use Cases
- Developing new features in the pemsley repository
- Refactoring existing code to follow pemsley standards
- Understanding and working with pemsley's codebase structure
Install Guide
2 steps- 1
Skip this step if Ananke is already installed.
- 2
Skill Snapshot
Auto scan of skill assets. Informational only.
Valid SKILL.md
Checks against SKILL.md specification
Source & Community
Skill Stats
SKILL.md 429 Lines
Total Files 1
Total Size 0 B
License NOASSERTION
---
name: coot-model-building
description: "Best Practices for Model-Building Tools and Refinement"
---
# Key Lessons from Chain A Refinement Session
## Workflow Checklist - Follow This for Model-Building and Refimentment
Best practices for fixing any issue:
1. ☐ Center on the interesting residue: `coot.set_go_to_atom_chain_residue_atom_name(chain, resno, "CA")`
or interesting postion `coot.set_rotation_centre(x,y,z)`
2. ☐ Check current metrics (Rama/correlation/overlaps)
3. ☐ Make checkpoint if trying something experimental
4. ☐ Apply fix
5. ☐ Re-check metrics to confirm improvement
6. ☐ If worse, restore checkpoint
## Refinement Best Practices
### 1. Make Checkpoints Before Model Changes
**CRITICAL: Always create a checkpoint before making significant model changes.**
Use `make_backup_checkpoint()` before any operation that might need to be reverted:
- Adding/deleting residues
- Adding ligands or waters
- Major refinement operations
- Any experimental model building
```python
# Create a named checkpoint before risky operation
checkpoint_idx = coot.make_backup_checkpoint(0, "before adding OXT")
# Try the operation
coot.add_OXT_to_residue(0, "A", 93, "")
result = coot.refine_residues_py(0, [["A", 93, ""]])
# Check if it worked - if not, restore
if result_is_bad:
coot.restore_to_backup_checkpoint(0, checkpoint_idx)
```
**Why checkpoints are better than undo:**
- `apply_undo()` only steps back one operation at a time
- Checkpoints let you jump back to a specific point
- Named checkpoints are self-documenting
- Multiple checkpoints allow comparing different approaches
```python
# Compare two different approaches
checkpoint_before = coot.make_backup_checkpoint(0, "original state")
# Try approach 1
coot.auto_fit_best_rotamer(0, "A", 42, "", "", 1, 1, 0.01)
results = coot.refine_residues_py(0, [["A", 41, ""], ["A", 42, ""], ["A", 43, ""]])
score_approach1 = check_correlation(0, "A", 42)
checkpoint_approach1 = coot.make_backup_checkpoint(0, "after approach 1")
# Restore and try approach 2
coot.restore_to_backup_checkpoint(0, checkpoint_before)
coot.pepflip(0, "A", 42, "", "")
results = coot.refine_residues_py(0, [["A", 41, ""], ["A", 42, ""], ["A", 43, ""]])
score_approach2 = check_correlation(0, "A", 42)
# Keep the better result
if score_approach1 > score_approach2:
coot.restore_to_backup_checkpoint(0, checkpoint_approach1)
```
## MANDATORY: Complete Validation Workflow
**CRITICAL: You MUST check ALL validation metrics before AND after EVERY fix.**
Fixing only one problem (e.g., Ramachandran) while leaving others (rotamer, density fit) is a FAILED fix. A residue is only "fixed" when ALL metrics are acceptable.
### Before fixing ANY residue:
1. **ALWAYS** center on it: `coot.set_go_to_atom_chain_residue_atom_name(chain, resno, "CA")`
2. **ALWAYS** check ALL of these:
- Ramachandran probability (from `all_molecule_ramachandran_score_py`)
- Rotamer score (from `rotamer_graphs_py`)
- Density correlation - both all-atom and side-chain (from `map_to_model_correlation_stats_per_residue_range_py`)
- Atom overlaps involving this residue (from `molecule_atom_overlaps_py`)
### After fixing ANY residue:
3. **ALWAYS** re-check ALL the same metrics, Sometime residues/issues are just
not fixable (that's what makes refinement and model-building tricky).
4. **ONLY MOVE** on to the next residue/issue unless you have tried to make all of
these are true:
- Ramachandran probability > 0.02 (preferably > 0.1)
- Rotamer score > 1.0% (preferably > 5%)
- Density correlation > 0.7 (all-atom and side-chain, preferably > 0.8)
- No severe clashes (< 2.0 Å cubed overlap volume)
### If ANY metric is still bad after your first fix:
5. **MUST** try additional fixes:
- Bad rotamer → `auto_fit_best_rotamer()`, and try experiment with following that up
with refine_residues_py() for that residue and its upstream and downstream
neighbours (if any).
- Poor density fit → try alternative rotamers, check for missing atoms
- Persistent clashes → refine with the addition of spatial neighbors using
`residues_near_residue()`
6. **NEVER** declare a residue "fixed" based on only one metric improving
7. **ALWAYS** re-validate after each additional fix
### Acceptable Reasons to Stop (without perfect metrics):
- You've tried multiple approaches and documented them
- The best achievable metrics are still recorded
- You've created checkpoints to compare approaches
- You explain why the residue remains problematic (e.g., poor density, crystal contact)
## Example of CORRECT Workflow
```python
# 1. ALWAYS center on problem residue first
coot.set_go_to_atom_chain_residue_atom_name("A", 41, "CA")
# 2. Get ALL metrics BEFORE
rama_data = [r for r in coot.all_molecule_ramachandran_score_py(0)[5:]
if r[1] == ['A', 41, '']][0]
rama_prob_before = rama_data[2]
rotamer_data = [r for r in coot.rotamer_graphs_py(0)
if r[0] == 'A' and r[1] == 41][0]
rotamer_score_before = rotamer_data[3]
corr_data = [s for s in coot.map_to_model_correlation_stats_per_residue_range_py(0, "A", 1, 1, 0)[0]
if s[0][1] == 41][0]
correlation_before = corr_data[1][1]
overlaps_before = [o for o in coot.molecule_atom_overlaps_py(0, 30)
if (o['atom-1-spec'][1:3] == ['A', 41] or
o['atom-2-spec'][1:3] == ['A', 41])]
print(f"BEFORE: Rama={rama_prob_before:.4f}, Rotamer={rotamer_score_before:.2f}%, Corr={correlation_before:.3f}, Clashes={len(overlaps_before)}")
# 3. Apply first fix (e.g., pepflip for backbone)
coot.pepflip(0, "A", 41, "", "")
coot.refine_residues_py(0, [["A", 40, ""], ["A", 41, ""], ["A", 42, ""], ["A", 43, ""]])
# 4. Check ALL metrics AFTER first fix
rama_prob_after = [r for r in coot.all_molecule_ramachandran_score_py(0)[5:]
if r[1] == ['A', 41, '']][0][2]
rotamer_score_after = [r for r in coot.rotamer_graphs_py(0)
if r[0] == 'A' and r[1] == 41][0][3]
correlation_after = [s for s in coot.map_to_model_correlation_stats_per_residue_range_py(0, "A", 1, 1, 0)[0]
if s[0][1] == 41][0][1][1]
print(f"AFTER: Rama={rama_prob_after:.4f}, Rotamer={rotamer_score_after:.2f}%, Corr={correlation_after:.3f}")
# 5. If rotamer or correlation still bad, DON'T STOP - fix them!
if rotamer_score_after < 1.0:
print("Rotamer still bad - trying auto_fit_best_rotamer")
coot.auto_fit_best_rotamer(0, "A", 41, "", "", 1, 1, 0.01)
coot.refine_residues_py(0, [["A", 40, ""], ["A", 41, ""], ["A", 42, ""]])
# 6. ALWAYS re-check after additional fixes
rotamer_score_final = [r for r in coot.rotamer_graphs_py(0)
if r[0] == 'A' and r[1] == 41][0][3]
correlation_final = [s for s in coot.map_to_model_correlation_stats_per_residue_range_py(0, "A", 1, 1, 0)[0]
if s[0][1] == 41][0][1][1]
print(f"FINAL: Rotamer={rotamer_score_final:.2f}%, Corr={correlation_final:.3f}")
# 7. Only NOW can you move to the next residue
```
## Example of WRONG Workflow (DO NOT DO THIS)
```python
# ❌ WRONG: Checking only Ramachandran
coot.pepflip(0, "A", 41, "", "")
coot.refine_residues_py(0, [["A", 40, ""], ["A", 41, ""], ["A", 42, ""]])
rama_after = coot.all_molecule_ramachandran_score_py(0)[5][39][2]
print(f"Ramachandran improved to {rama_after}")
# MOVES ON without checking rotamer or density fit - WRONG!
# ❌ WRONG: Not centering on residue
# Goes straight to fix without set_go_to_atom_chain_residue_atom_name()
# ❌ WRONG: Not checking metrics before the fix
# How do you know if it improved if you don't know what it was before?
# ❌ WRONG: Declaring success with bad rotamer
rama = 0.30 # Good!
rotamer = 0.0001 # TERRIBLE!
correlation = 0.59 # POOR!
print("Residue fixed!") # NO IT ISN'T!
```
## Why This Matters
A residue with:
- ✅ Good Ramachandran (0.30)
- ❌ Terrible rotamer (0.01%)
- ❌ Poor density fit (0.59)
is NOT fixed. The side chain is clearly wrong. The backbone geometry might be OK, but the model is still incorrect.
**ALL metrics must be acceptable before moving on.**
### 2. Extend Selection Around Problem Residues
**Don't refine problem residues in isolation** - include neighboring residues for context.
- ❌ **Bad:** `refine_residues_py(0, [["A", 41, ""]])` - Often fails to correct the model
- ✅ **Good:** `refine_residues_py(0, [["A", 40, ""], ["A", 41, ""], ["A", 42, ""], ["A", 43, ""]])`
**Recommended approach:**
- For single problem residue: include ±1 or ±2 neighbors
- For consecutive problem residues: include ±1 neighbor on each end
- Larger regions (±3-4 residues) can sometimes help severe issues
**Example from session:**
- Residues 41-42 had severe Ramachandran outliers
- Refining just 41-42 failed
- Refining 40-43 succeeded: Residue 41 improved from p=0.00004 to p=0.308
**Neighboring Residues:**
- You can use coot.residues_near_residue() to find residues that are close in space
but distant in sequence, so that they can be added to the residue selection for
refinement.
### 3. Include Spatial Neighbours, Not Just Sequence Neighbours
**Critical insight:** Residues that are close in 3D space affect each other during refinement, even if they're far apart in sequence.
Coot's refinement includes spatially neighbouring atoms in the non-bonded contact interactions, but only the selected residues can move during minimization. If a nearby (but unselected) residue is in the wrong position, it will "push" your selected residues away via non-bonded contact penalties - potentially pushing them out of correct density to avoid the clash with the incorrectly-placed neighbour.
**Diagnostic workflow:**
1. **Check for clashes** after refinement:
```python
overlaps = coot.molecule_atom_overlaps_py(0, 50)
for o in overlaps:
spec1, spec2 = o['atom-1-spec'], o['atom-2-spec']
vol = o['overlap-volume']
if vol > 0.5: # Significant clash
print(f"{spec1[1]}/{spec1[2]} {spec1[4]} - {spec2[1]}/{spec2[2]} {spec2[4]}: {vol:.2f}")
```
2. **If a problem residue clashes with a distant residue**, fix the distant residue first:
```python
# Example: A/2 has poor correlation (0.13) and clashes with A/89
# First fix A/89:
coot.auto_fit_best_rotamer(0, "A", 89, "", "", 1, 1, 0.01)
results = coot.refine_residues_py(0, [["A", 88, ""], ["A", 89, ""], ["A", 90, ""]])
# Then re-refine A/2 INCLUDING A/89 as a spatial neighbour:
results = coot.refine_residues_py(0, [["A", 1, ""], ["A", 2, ""], ["A", 3, ""], ["A", 89, ""]])
# A/2 correlation improved: 0.13 → 0.81
```
**Why this matters:**
- Coot's refinement "feels" spatial neighbours via non-bonded contact terms
- But only selected residues can move during minimization
- A/89 was in a wrong position (correlation 0.050) and pushing A/2 away
- A/2 moved out of its correct density to reduce the non-bonded penalty with A/89
- Fixing A/89 first put it in the right place, so it no longer pushed A/2 incorrectly
**Real example:**
```
Before: A/2 correlation = 0.131, A/89 correlation = 0.050
A/2 CA ↔ A/89 CZ clash: 1.06 Ų
After fixing A/89 first, then refining together:
A/2 correlation = 0.805, A/89 correlation = 0.928
No clash
```
### 4. Iterative Refinement Strategy
Sometimes multiple rounds of refinement with different selections help:
1. **First pass:** Refine larger region to establish general geometry
2. **Second pass:** Refine smaller region to fine-tune specific problem
3. **Check validation** after each step
4. **Restore checkpoint** if results get worse
**Example workflow:**
```python
# Create checkpoint first!
checkpoint = coot.make_backup_checkpoint(0, "before iterative refinement")
# First: larger region
results_1 = coot.refine_residues_py(0, [["A", i, ""] for i in range(40, 44)])
check_validation() # Did it help?
# Second: targeted refinement
results_2 = coot.refine_residues_py(0, [["A", 41, ""], ["A", 42, ""], ["A", 43, ""]])
check_validation() # Better or worse?
# If worse:
coot.restore_to_backup_checkpoint(0, checkpoint)
```
### 5. Measure Before and After
**Always validate changes objectively** using:
- Ramachandran probabilities
- Density correlation (all-atom and side-chain)
- Geometry statistics
```python
def check_residue_validation(imol, chain_id, resno):
"""Check both Ramachandran and density correlation"""
# Get Ramachandran
rama_data = coot.all_molecule_ramachandran_score_py(imol)
residue_data = rama_data[5]
rama_score = None
for r in residue_data:
if r[1][0] == chain_id and r[1][1] == resno:
rama_score = r[2]
break
# Get density correlation
corr_data = coot.map_to_model_correlation_stats_per_residue_range_py(
imol, chain_id, 1, 1, 1
)
all_atom_corr = None
sidechain_corr = None
for r in corr_data[0]:
if r[0][1] == resno:
all_atom_corr = r[1][1]
break
for r in corr_data[1]:
if r[0][1] == resno:
sidechain_corr = r[1][1]
break
return {
'residue': resno,
'rama_prob': rama_score,
'all_atom_corr': all_atom_corr,
'sidechain_corr': sidechain_corr
}
# Usage with checkpoint
checkpoint = coot.make_backup_checkpoint(0, "before refinement test")
before = check_residue_validation(0, "A", 41)
coot.refine_residues_py(0, [["A", 40, ""], ["A", 41, ""], ["A", 42, ""], ["A", 43, ""]])
coot.accept_moving_atoms_py()
after = check_residue_validation(0, "A", 41)
# Compare and decide
if after['all_atom_corr'] > before['all_atom_corr']:
# Keep it!
pass
else:
# Revert to checkpoint
coot.restore_to_backup_checkpoint(0, checkpoint)
```
### 6. Use Checkpoints and Undo Liberally
**Don't be afraid to revert changes:**
- `make_backup_checkpoint()` / `restore_to_backup_checkpoint()` - for jumping back to a specific state
- `apply_undo()` - for stepping back one operation at a time
Use checkpoints when:
- Starting a new model-building task
- About to try something experimental
- Before a series of related operations
Use undo when:
- The last single operation made things worse
- Quick single-step revert needed
### 7. Auto-fit Rotamer for Side-chain Issues
**For poor side-chain density correlation**, try `auto_fit_best_rotamer()` first:
```python
# Create checkpoint first
checkpoint = coot.make_backup_checkpoint(0, "before rotamer fitting")
# Check if it's a side-chain problem
validation = check_residue_validation(0, "A", 89)
if validation['sidechain_corr'] < 0.5:
# Try auto-fit rotamer
score = coot.auto_fit_best_rotamer(0, "A", 89, "", "", 1, 1, 0.01)
if score > 0: # Positive score is good
# Check improvement
after = check_residue_validation(0, "A", 89)
if after['sidechain_corr'] > validation['sidechain_corr']:
# Success! (e.g., 0.034 → 0.900)
pass
else:
coot.restore_to_backup_checkpoint(0, checkpoint)
else:
# Negative score means failure
coot.restore_to_backup_checkpoint(0, checkpoint)
```
### 8. Set Refinement to Synchronous Mode
**Always call this at the start** to make refinement complete immediately:
```python
coot.set_refinement_immediate_replacement(1)
```
Without this, refinement may be asynchronous and difficult to control programmatically.
### 9. Navigate to Residue Before Working
**Bring residue to screen center** so you can watch the refinement:
```python
coot.set_go_to_atom_molecule(0)
coot.set_go_to_atom_chain_residue_atom_name("A", 41, "CA")
```
This helps with:
- Visual inspection of the problem
- Seeing the refinement in real-time
- Verifying the result makes geometric sense
### 10. Flipping peptides
If the Ramachandran Plot is poor, try using `coot.pepflip(imol, chain_id, res_no, ins_code, alt_conf)` followed by a refinement of the residues in the extended region.
### 11. Flipping side-chains terminal Chi-angle
If the Rotamer score is poor, try using `coot.do_180_degree_side_chain_flip()` to improve the Rotamer score. It is occasionally useful.
## Key Takeaway
**Context matters in refinement.** Including neighboring residues provides the geometric and density context needed for refinement algorithms to find better solutions, especially for severe outliers.
**Always checkpoint before changes.** Use `make_backup_checkpoint()` before any significant model modification so you can easily revert if needed.
Name Size