SAWNERGY

/SAWNERGY-DOCS

SAWNERGY Documentation

LOGO

A Python 3.11+ toolkit that converts molecular dynamics (MD) trajectories into residue interaction networks (RINs), samples random and self-avoiding walks (RW/SAW), trains skip-gram embeddings (PureML or PyTorch backends), and visualizes both networks and embeddings. All artifacts are Zarr v3 archives stored as compressed .zip files with rich metadata so every stage can be reproduced.
PyPI License Python GitHub


Requirements

Installation

pip install sawnergy
# If you want the PyTorch backend, install torch separately:
# pip install torch

Small visual example (constructed fully from trajectory and topology files)

RIN
Embedding

More visual examples:

Animated Temporal Residue Interaction Network of Full Length p53 Protein

RIN_animation

Residue Interaction Network of Full Length p53 Protein (on the right) and its Embedding (on the left)

Embedding_vs_RIN


End-to-End Quick Start (make sure cpptraj is discoverable)

New here? This is the shortest possible runnable script. Point it at your own topology/trajectory pair or at the bundled p53 example (see Quick-start MD example below).

from pathlib import Path
import logging
import torch  # optional, only when using model_base="torch"

from sawnergy.logging_util import configure_logging
from sawnergy.rin import RINBuilder
from sawnergy.walks import Walker
from sawnergy.embedding import Embedder

configure_logging("./logs", file_level=logging.WARNING, console_level=logging.INFO, force=True)

# 1) Build a Residue Interaction Network archive
rin_path = Path("RIN_demo.zip")
rin_builder = RINBuilder()  # auto-locates cpptraj
rin_builder.build_rin(
    topology_file="system.prmtop",
    trajectory_file="trajectory.nc",
    molecule_of_interest=1,
    frame_range=(1, 100),          # inclusive, 1-based
    frame_batch_size=10,           # processed in 10-frame batches
    prune_low_energies_frac=0.85,   # per-row quantile pruning
    include_attractive=True,
    include_repulsive=False,
    output_path=rin_path,
)

# 2) Sample random / self-avoiding walks
walks_path = Path("WALKS_demo.zip")
with Walker(rin_path, seed=123) as walker:
    walker.sample_walks(
        walk_length=16,
        walks_per_node=100,
        include_attractive=True,
        include_repulsive=False,
        time_aware=False,
        output_path=walks_path,
        in_parallel=False,      # required kwarg; set True only under a main-guard
    )

# 3) Train per-frame embeddings
embedder = Embedder(walks_path, seed=999)
emb_path = embedder.embed_all(
    RIN_type="attr",              # "attr" or "repuls"
    using="merged",               # "RW" | "SAW" | "merged"
    num_epochs=20,
    negative_sampling=True,      # SG (False) or SGNS (True)
    num_negative_samples=10,
    window_size=5,
    device="cuda" if torch.cuda.is_available() else "cpu",
    model_base="torch",
    kind="in",                    # stored embedding kind
    output_path="EMBEDDINGS_demo.zip",
)
print("Embeddings written to", emb_path)
MD Trajectory + Topology
          │
          ▼
      RINBuilder 
          │   →  RIN archive (.zip/.zarr) → Visualizer (display/animate RINs)
          ▼
        Walker
          │   →  Walks archive (RW/SAW per frame)
          ▼
       Embedder
          │   →  Embedding archive (frame × vocab × dim)
          ▼
     Downstream ML

Each stage consumes the archive produced by the previous one. Metadata embedded in the archives ensures frame order,
node indexing, and RNG seeds stay consistent across the toolchain.

Biophysical intuition (why these steps exist)

Quick-start MD example

A minimal dataset is included in example_MD_for_quick_start/ on GitHub to let you run the full SAWNERGY pipeline immediately:

See example_MD_for_quick_start/brief_description.md.

Five-minute “first run” for newcomers

  1. Install: pip install sawnergy (and pip install torch if you want GPU training). Confirm cpptraj -h works in your shell or set CPPTRAJ=/path/to/cpptraj.
  2. Download the p53 quick-start folder and cd into the repo root.
  3. Run the quick-start script above with topology_file="example_MD_for_quick_start/p53_DBD.prmtop" and trajectory_file="example_MD_for_quick_start/p53_DBD.nc".
  4. Inspect the outputs: RIN_demo.zip, WALKS_demo.zip, and EMBEDDINGS_demo.zip (all are Zarr-in-zip archives). Try sawnergy.visual.Visualizer("RIN_demo.zip").build_frame(1, show=True) to see the network.
  5. Swap in your own trajectory/topology once you’ve seen the expected behavior.

Archive Layouts (Zarr v3 in .zip)

All archives are Zarr v3 groups and can be opened directly with sawnergy.sawnergy_util.ArrayStorage (read via mode="r"; write/append via mode="a"/"w"). When compressed as .zip, they are read-only; create/append uses the .zarr directory form or a temporary store prior to compression.

Archive Core datasets (name → shape, dtype) Key root attrs
RIN ATTRACTIVE_transitions (T, N, N) float32 (opt) • REPULSIVE_transitions (T, N, N) float32 (opt) • ATTRACTIVE_energies (T, N, N) float32 (opt; pre-normalized) • REPULSIVE_energies (T, N, N) float32 (opt) • COM (T, N, 3) float32 com_name="COM"molecule_of_interestframe_range (tuple or None) • frame_batch_sizeprune_low_energies_fracattractive_transitions_name / repulsive_transitions_name (may be None) • attractive_energies_name / repulsive_energies_name (may be None) • time_created
Walks ATTRACTIVE_RWs (T, N·num_RWs, L+1) uint16 (opt) • REPULSIVE_RWs (T, N·num_RWs, L+1) uint16 (opt) • ATTRACTIVE_SAWs (T, N·num_SAWs, L+1) uint16 (opt) • REPULSIVE_SAWs (T, N·num_SAWs, L+1) uint16 (opt) seednum_workersin_parallelbatch_size_nodesnum_RWsnum_SAWsnode_counttime_stamp_countwalk_lengthwalks_per_node • dataset name attrs for each channel (may be None) • walks_layout="time_leading_3d"time_created
Embeddings FRAME_EMBEDDINGS (T, N, D) float32 frame_embeddings_nametime_stamp_countnode_countembedding_dimmodel_baseembedding_kind (`"in"

T equals the number of frame batches produced by RINBuilder (i.e., frame_range swept in frame_batch_size steps). Walk node ids are 1-based in storage; embedding training converts them to 0-based internally.


Stage Reference

RINBuilder (sawnergy.rin.RINBuilder)

Walker (sawnergy.walks.Walker)

Embedder (sawnergy.embedding.Embedder)

Backends:

Visualizers

Example code

from sawnergy.visual import Visualizer

v = Visualizer("./RIN_demo.zip")
v.build_frame(1,
    node_colors="rainbow",
    displayed_nodes="ALL",
    displayed_pairwise_attraction_for_nodes="DISPLAYED_NODES",
    displayed_pairwise_repulsion_for_nodes="DISPLAYED_NODES",
    show_node_labels=True,
    show=True
)

Visualizer lazily loads datasets and works even in headless environments (falls back to the Agg backend).

from sawnergy.embedding import Visualizer

viz = Visualizer("./EMBEDDINGS_demo.zip", normalize_rows=True)
viz.build_frame(1, show=True)

Utilities


Practical Notes


Project Structure

├── sawnergy/
│   ├── rin/           # RINBuilder and cpptraj integration helpers
│   ├── walks/         # Walker class and shared-memory utilities
│   ├── embedding/     # Embedder + SG/SGNS backends (PureML / PyTorch)
│   ├── visual/        # Visualizer and palette utilities
│   │
│   ├── logging_util.py
│   └── sawnergy_util.py
│
└── README.md

Minimal API Cheatsheet

All functions raise informative ValueError/RuntimeError when inputs are inconsistent (e.g., missing walks, out-of-range frame ids, invalid quantiles). Attributes recorded in each archive are intended to be sufficient to reproduce downstream stages without additional bookkeeping.