I’m an undergraduate from Ukraine studying Computer Science and Mathematics at Wesleyan University in Middletown, CT. I split my research time between ThayerLab at Wesleyan and BonhamLab at Tufts University School of Medicine. My work sits at the intersection of machine learning, software engineering, and computational biology. At ThayerLab, I develop methods that turn molecular dynamics simulations into machine-learning-ready representations, enabling large-scale analysis of protein allostery. At BonhamLab, I build deep-learning and graph-based models for functional annotation of proteins in the human gut microbiome.
Toolkit for transforming molecular dynamics (MD) trajectories into rich graph representations, sampling random and self-avoiding walks, learning node embeddings, and visualising residue interaction networks (RINs). Sawnergy keeps the full workflow -- from cpptraj output to skip-gram embeddings -- inside Python.
SAWNERGY is a high-performance toolkit for turning molecular-dynamics trajectories into machine-learning–ready graph data. The project is open-source and available though . Documentation is hosted on my website.
SAWNERGY builds residue interaction networks (RINs) from cpptraj outputs, extracts attractive and repulsive interaction channels, prunes and normalizes them into transition matrices, samples massive random and self-avoiding walks in shared memory, and trains skip-gram/SGNS embeddings for every frame. The entire pipeline is reproducible, Zarr-backed, and designed for HPC workloads. The result: clean, compact vector representations of protein conformational dynamics -- ready for downstream tasks like structural clustering, frame classification, and functional prediction. What you are seeing in this short clip above is the residue interaction network of the full length p53 protein (on the right) constructed and visualized by SAWNERGY, and on the left is its embedding, learned from interaction patterns and visualized by SAWNERGY. Note the visual resemblance of both images despite the fact that the right one is x,y,z coordinates of amino acid residues and the left one is network embedding deduced entirely from interactions data and no spacial information.
Transparent, NumPy-only deep learning framework for teaching, small-scale projects, prototyping, and reproducible experiments. No CUDA, no giant dependency tree. Batteries included: VJP autograd, layers, activations, losses, optimizers, Zarr checkpoints, and more!
PureML is a lightweight deep-learning framework built entirely on NumPy. It’s designed to feel instantly familiar to PyTorch users while staying transparent, minimal, and easy to read. The library includes a complete autodiff engine, modular layers, activations, losses, optimizers, data utilities, and a clean neural-network API — everything you need to prototype models without extra overhead.
It’s fast to learn, simple to extend, and built for teaching, experimenting, and understanding models at every level rather than relying on a black box. PureML is open-source and available through . Documentation can be found on my website or through PyPI.
Unlike TensorFlow and PyTorch, which rely on massive multi-language codebases, PureML is intentionally 100% NumPy-based and fully transparent. Despite that simplicity, it remains very fast thanks to extensive vectorization throughout the system. It’s ideal for rapid prototyping, education, and situations where you want full clarity into every stage of the computation -- and the codebase is straightforward to read, modify, and extend.
A practical pipeline for mining UniProt, cleaning protein annotations, and turning biology into machine-learnable features. This project was developed during my summer internship at BonhamLab, Tufts University School of Medicine (2025). Documentation is available on this website (see `docs` above) and the code is on my github (also accessible thru the docs).
M2F (microbiome-to-function) is a modular toolkit that turns messy UniProt records into clean, machine-learning-ready data. It automates large-scale UniProtKB mining with efficient, rate-limited REST calls, cleans and normalizes free-text annotations using targeted regex extraction, and encodes both sequences and ontology terms into meaningful numerical representations. All results are stored compactly in a single Zarr ZipStore, making datasets easy to share and reconstruct. The entire pipeline -- from HUMAnN outputs or accession IDs to tidy feature tables -- is fully reproducible and designed for seamless use in ML models. Code is available at this repo.
Developed a pipeline to construct residue interaction networks from MD trajectories and sample random walks, enabling amino acid co-occurrence analysis and comparison between systems (e.g., mutant vs. wild-type) to characterize mutation- and effector-induced changes in allosteric signaling.
This poster, created for the Summer 2024 research symposium at Wesleyan, represents the starting point of my work at ThayerLab. Since then, I've made significant progress and refined the project's direction. See my project named "SAWNERGY".
Implemented a web-based real-time tracking system for reusable food containers, including holder details, pickup/dropoff times and locations, etc. The system uses dynamic QR codes in dining halls to facilitate easy pickup and dropoff via an associated iOS app. The project was started as a sustainability initiative for Wesleyan University dining system.
My three friends and I developed the entire system from scratch over an intense 24-hour period for a hackathon held at Wesleyan. We recorded this demo video literally 10 minutes before the projects were judged, which is why the quality isn't the best. That said, focusing all our effort on project development rather than cosmetics paid off: we won the internship prize track!
(link to the hackathon)
Developed an album covers classifier using ResNet-50 (a pre-trained CNN) and transfer learning. The network's architecture was slightly adjusted, and multiple layers were unfrozen. It was then fine-tuned on a dataset of approximately 26,000 labeled album covers scraped from the internet. The model achieved 74% accuracy in classifying rap, country, jazz, and classical albums.
I worked on this project as part of my audio-visual machine learning class. Code as well as the final write-up can be found here.
I added paper pre-prints (see "papers" tab). For some reason if you try to view the PDFs from your phone, there are some issues with scrolling, but if you click download (there's no auto-download, don't worry), it will look fine and you can see the paper in full :)
Me and the ThayerLab (not everybody is present in the picture)
New poster I presented at the 26th Annual Molecular Biophysics Retreat at Wesleyan University on October 29 and at the ACS Northeast Regional Discussion at Worcester State University on November 1, 2025.
Hello again! Very excited to let you all know that I have been working on two python packages which are out now and YOU CAN ACCESS THE DOCUMENTATION ON MY WEBISTE! Just go to the docs page (see above) :)
Hey! I'm excited to share that I have given my website a major update! I now have an admin panel through which I can make posts and add items to my portfolio and such :)
I will be sharing some relevant updates about my work here every now and then.
Hey! I have finally finished my website!