SAWNERGY

/SAWNERGY-DOCS

SAWNERGY

PyPI
License
Python

A toolkit for transforming molecular dynamics (MD) trajectories into rich graph representations, sampling
random and self-avoiding walks, learning node embeddings, and visualizing residue interaction networks (RINs). SAWNERGY
keeps the full workflow — from cpptraj output to skip-gram embeddings (node2vec approach) — inside Python, backed by efficient Zarr-based archives and optional GPU acceleration.


Installation

pip install sawnergy

Optional: For GPU training, install PyTorch separately (e.g., pip install torch).
Note: RIN building requires cpptraj (AmberTools). Ensure it is discoverable via $PATH or the CPPTRAJ
environment variable. Probably the easiest solution: install AmberTools via Conda, activate the environment, and SAWNERGY will find the cpptraj executable on its own, so just run your code and don't worry about it.


UPDATES:

v1.1.2 — What’s new:

v1.1.1 — What’s new:

v1.1.0 — What’s new:

v1.0.9 — What’s new:

v1.0.8 — What’s new:

v1.0.7 — What’s new:


Why SAWNERGY?


Pipeline at a Glance

MD Trajectory + Topology
          │
          ▼
      RINBuilder 
          │   →  RIN archive (.zip/.zarr) → Visualizer (display/animate RINs)
          ▼
        Walker
          │   →  Walks archive (RW/SAW per frame)
          ▼
       Embedder
          │   →  Embedding archive (frame × vocab × dim)
          ▼
     Downstream ML

Each stage consumes the archive produced by the previous one. Metadata embedded in the archives ensures frame order,
node indexing, and RNG seeds stay consistent across the toolchain.


Small visual example (constructed fully from trajectory and topology files)

RIN
Embedding


Core Components

sawnergy.rin.RINBuilder

sawnergy.visual.Visualizer

sawnergy.walks.Walker

sawnergy.embedding.Embedder

Supporting Utilities


Archive Layouts

Archive Key datasets (name → shape, dtype) Important attributes (root attrs)
RIN ATTRACTIVE_transitions(T, N, N), float32 • REPULSIVE_transitions(T, N, N), float32 (optional) • ATTRACTIVE_energies(T, N, N), float32 (optional) • REPULSIVE_energies(T, N, N), float32 (optional) • COM(T, N, 3), float32 time_created (ISO) • com_name = "COM"molecule_of_interest (int) • frame_range = (start, end) inclusive • frame_batch_size (int) • prune_low_energies_frac (float in [0,1]) • attractive_transitions_name / repulsive_transitions_name (dataset names or None) • attractive_energies_name / repulsive_energies_name (dataset names or None)
Walks ATTRACTIVE_RWs(T, N·num_RWs, L+1), int32 (optional) • REPULSIVE_RWs(T, N·num_RWs, L+1), int32 (optional) • ATTRACTIVE_SAWs(T, N·num_SAWs, L+1), int32 (optional) • REPULSIVE_SAWs(T, N·num_SAWs, L+1), int32 (optional)
Note: node IDs are 1-based.
time_created (ISO) • seed (int) • rng_scheme = "SeedSequence.spawn_per_batch_v1"num_workers (int) • in_parallel (bool) • batch_size_nodes (int) • num_RWs / num_SAWs (ints) • node_count (N) • time_stamp_count (T) • walk_length (L) • walks_per_node (int) • attractive_RWs_name / repulsive_RWs_name / attractive_SAWs_name / repulsive_SAWs_name (dataset names or None) • walks_layout = "time_leading_3d"
Embeddings FRAME_EMBEDDINGS(T, N, D), float32 created_at (ISO) • frame_embeddings_name = "FRAME_EMBEDDINGS"time_stamp_count = T • node_count = N • embedding_dim = D • model_base = "torch" or "pureml"embedding_kind = `"in"

Notes


Quick Start

from pathlib import Path
from sawnergy.logging_util import configure_logging
from sawnergy.rin import RINBuilder
from sawnergy.walks import Walker
from sawnergy.embedding import Embedder

import logging
configure_logging("./logs", file_level=logging.WARNING, console_level=logging.INFO)

# 1. Build a Residue Interaction Network archive
rin_path = Path("./RIN_demo.zip")
rin_builder = RINBuilder()
rin_builder.build_rin(
    topology_file="system.prmtop",
    trajectory_file="trajectory.nc",
    molecule_of_interest=1,
    frame_range=(1, 100),
    frame_batch_size=10,
    prune_low_energies_frac=0.85,
    output_path=rin_path,
    include_attractive=True,
    include_repulsive=False
)

# 2. Sample walks from the RIN
walker = Walker(rin_path, seed=123)
walks_path = Path("./WALKS_demo.zip")
walker.sample_walks(
    walk_length=16,
    walks_per_node=100,
    saw_frac=0.25,
    include_attractive=True,
    include_repulsive=False,
    time_aware=False,
    output_path=walks_path,
    in_parallel=False
)
walker.close()

# 3. Train embeddings per frame (PyTorch backend)
import torch

embedder = Embedder(walks_path, seed=999)
embeddings_path = embedder.embed_all(
    RIN_type="attr",
    using="merged",
    num_epochs=10,
    negative_sampling=False,
    window_size=4,
    device="cuda" if torch.cuda.is_available() else "cpu",
    model_base="torch",
    output_path="./EMBEDDINGS_demo.zip"
)
print("Embeddings written to", embeddings_path)

For the PureML backend, set model_base="pureml" and pass the optimizer / scheduler classes inside model_kwargs.


Visualization

from sawnergy.visual import Visualizer

v = Visualizer("./RIN_demo.zip")
v.build_frame(1,
    node_colors="rainbow",
    displayed_nodes="ALL",
    displayed_pairwise_attraction_for_nodes="DISPLAYED_NODES",
    displayed_pairwise_repulsion_for_nodes="DISPLAYED_NODES",
    show_node_labels=True,
    show=True
)

Visualizer lazily loads datasets and works even in headless environments (falls back to the Agg backend).

from sawnergy.embedding import Visualizer

viz = Visualizer("./EMBEDDINGS_demo.zip", normalize_rows=True)
viz.build_frame(1, show=True)

Advanced Notes


Project Structure

├── sawnergy/
│   ├── rin/           # RINBuilder and cpptraj integration helpers
│   ├── walks/         # Walker class and shared-memory utilities
│   ├── embedding/     # Embedder + SG/SGNS backends (PureML / PyTorch)
│   ├── visual/        # Visualizer and palette utilities
│   │
│   ├── logging_util.py
│   └── sawnergy_util.py
│
└── README.md

Acknowledgments

SAWNERGY builds on the AmberTools cpptraj ecosystem, NumPy, Matplotlib, Zarr, and PyTorch (for GPU acceleration if necessary; PureML is available by default).
Big thanks to the upstream communities whose work makes this toolkit possible.