Achyuthan Sivasankar

About

Adaptive Computation · MoE Systems · Efficient Architectures

## About

I build and study adaptive routing in sparse neural architectures — from function-basis routing in KAN layers to expert-collapse dynamics in large MoE language models.

## Research highlights

            +1,722
            FSD lead steps before grokking (9 configs)
          

            +6.8%
            CIFAR-100 vs MLP (KAN-Multi)
          

            99.67%
            SWELL-KW stress detection (NUS)
          

MoE-Bench Open-source · Apache 2.0 arXiv Grokking paper · 2606.12966

            JEPA
            AD-LiST-JEPA · Waymo LiDAR (NYU lab)
          

## Research interests

PyTorch · JAX · HuggingFace
Mechanistic interpretability · Grokking
MoE architectures · KAN layers
LoRA / PEFT · RAG · LLM fine-tuning
Self-supervised learning · Computer Vision
Python · Docker · AWS · FastAPI

## Education & affiliations

MS Computer Science

New York University

Graduating Fall 2027 · NYC
Research Assistant — Prof. Anna Choromanska's Lab

NYU · AD-LiST-JEPA · Waymo Open Dataset

May 2026 – Present
Research Intern — Prof. Sunil Chandran

IISc Bangalore · GNN routing · EEG / BCI

Jun – Aug 2024
AI Research Intern

National University of Singapore

Dec 2023 – Feb 2024

## Currently

MS CS student at NYU · Graduating Fall 2027 · Research Assistant in Prof. Anna Choromanska's lab · Targeting PhD programs in core ML

Research

Active Projects

Circuit Synchronization Precedes Generalization: Fourier Structure in Grokking Transformers

arXiv preprint

Introduces the Frequency Synchronization Degree (FSD) — a permutation-tested metric for Fourier circuit synchronisation that requires no prior circuit knowledge. FSD predicts grokking 500–3,000 steps in advance across nine modular-addition configs (mean lead +1,722 steps). Weight-decay intervention at the FSD ceiling causally confirms Phase 2 is regularisation, not computation: ∆t ∝ 1/λ across three primes.

9/9 configs FSD leads grokking

+1,722 mean lead steps

∆t ∝ 1/λ causal timing law

R² ≥ 0.99 cross-prime fit

arXiv:2606.12966 →

KAN-Multi: Adaptive Multi-Basis Routing with Emergent Specialization

Manuscript in prep

A statistics-driven routing layer that selects among 6 function bases (Fourier, Wavelet, Chebyshev, RBF, Rational, Sigmoid) using distributional input statistics — mean, variance, skewness, kurtosis, entropy, sparsity — with zero routing supervision. Emergent specialization measured via per-layer Shannon entropy dynamics across training.

+6.8% over MLP on CIFAR-100

+1.2% on Adult tabular (p < 0.001)

92.42% CIFAR-10 (ResNet-18 + KAN head)

5–6× theoretical FLOP reduction

NeurIPS FinML Workshop 2026 (target)

MoE-Bench: Open Benchmark for Expert Collapse in Sparse MoE LLMs

Shipped · Apache 2.0

Pip-installable toolkit for routing entropy, expert utilisation, and collapse across OLMoE, JetMoE, and Qwen1.5-MoE. Includes domain-matched prompts (math/code/general), bootstrap confidence intervals, L_div LoRA training, MMLU evaluation, and a Gradio demo. Key finding: collapse is architecture-dependent, not universal.

50% layer-collapse on OLMoE (math)

0% collapse on JetMoE / Qwen

+0.33pp MMLU (no degradation)

3 validated architectures

GitHub →

AD-LiST-JEPA: Self-Supervised LiDAR Perception for Autonomous Driving

NYU Lab · In progress

Research assistant in Prof. Anna Choromanska's lab (NYU) working with postdoc Haoran Zhu on AD-LiST-JEPA — a Joint Embedding Predictive Architecture for automotive LiDAR object detection and occupancy completion/forecasting on the Waymo Open Dataset. Completed SSL pretraining pipeline; running downstream OCF finetuning experiments.

Waymo Open Dataset

30-epoch SSL pretrain complete

OCF finetuning in progress

AutoMoE: Meta-Learning Expert Topologies for Robotic Manipulation

Research Project

Evolutionary REINFORCE meta-learner over discrete MoE topology space (1–3 tiers × 2–5 experts/tier), eliminating hand-designed architectures. Discovered topology (3,3,2) achieving 100% task success at 11.5M params vs. 23.3M for fixed Hierarchical MoE — 2× parameter efficiency at equal performance.

2× parameter efficiency

100% task success (ML10)

Meta-World ML10 validated

Open Source

GitHub · Contributions

Contribution activity

Loading contributions…

Notable upstream contributions

vLLM · NVIDIA NeMo AutoModel

NVIDIA Merged

NeMo/Automodel #2732

Resolve tie_word_embeddings top-level-first to match Hugging Face tying

View pull request →

vLLM Merged

vllm-project/vllm #44795

Fix nightly Docker ImportError: AnthropicOutputConfig

View pull request →

NVIDIA Merged

NeMo/Automodel #2601

Re-tie lm_head to active embed_tokens on Gemma4 MoE path

View pull request →

NVIDIA Merged

NeMo/Automodel #2709

Cherry-pick #2601 into r0.5.0 release branch

View pull request →

Experience

Research & Industry

Research Assistant — Prof. Anna Choromanska's Lab

NYU

May 2026 – Present

Working on self-supervised representation learning for autonomous driving with postdoc Haoran Zhu. Contributing to AD-LiST-JEPA on the Waymo Open Dataset; completed full SSL pretraining pipeline and running OCF downstream finetuning experiments.

PyTorch LiDAR JEPA Waymo NYU HPC

Research Intern — Prof. Sunil Chandran

IISc Bangalore

Jun – Aug 2024

Applied GNN-based graph-theoretic optimisation to large-scale RF mesh networks (1k+ nodes), improving routing efficiency 42% over shortest-path baseline. Improved EEG classification for neurodegenerative disease diagnosis by 18% over CNN baseline on 500k-signal BCI dataset.

GNN Graph Theory EEG BCI

AI Research Intern

NUS Singapore

Dec 2023 – Feb 2024

Built real-time multimodal stress-detection system achieving 99.67% accuracy on SWELL-KW (prior SOTA: ~97%). Prototyped VR-driven therapeutic platform integrating generative AI, real-time emotion recognition, and dynamic music recommendation (126k+ tracks).

Multimodal ML Real-time Systems Generative AI

Publications & Patents

Research Output

arXiv

Circuit Synchronization Precedes Generalization: Causal Evidence from Fourier Structure in Grokking Transformers

First author · NYU · arXiv:2606.12966 · Jun 2026

Prep

KAN-Multi: Adaptive Multi-Basis Routing with Emergent Specialization

First author · Targeting NeurIPS FinML Workshop 2026

Toolkit

MoE-Bench: Open Benchmark for Expert Collapse and Routing Efficiency in Sparse MoE LLMs

First author · Open-source · Apache 2.0 · github.com/Achyuthan-S/moe-bench

Patent

ML-Enhanced Intrusion Detection for Blackhole Attacks in IoT RPL Networks

VIT IPRTT Cell, India · 2025

IEEE

Video Steganography with AES-256/RSA-4096 · CNN Acoustic Event Detection · Retail ML Demand Forecasting

3 IEEE publications · 2024

Contact

Let's talk
research.

I'm open to research collaborations, PhD inquiries, and ML engineering roles. If you're working on adaptive computation, efficient architectures, or autonomous perception — reach out.

Currently based in New York City · MS CS at NYU · Graduating Fall 2027

achyuthan.sivasankar@gmail.com

github.com/Achyuthan-S

linkedin.com/in/achyuthan-s

achyuthan-s.github.io

MS CS · NYU · Graduating Fall 2027

Achyuthan |

About

Achyuthan Sivasankar

## About

## Research highlights

## Research interests

## Education & affiliations

## Currently

Research

Circuit Synchronization Precedes Generalization: Fourier Structure in Grokking Transformers

KAN-Multi: Adaptive Multi-Basis Routing with Emergent Specialization

MoE-Bench: Open Benchmark for Expert Collapse in Sparse MoE LLMs

AD-LiST-JEPA: Self-Supervised LiDAR Perception for Autonomous Driving

AutoMoE: Meta-Learning Expert Topologies for Robotic Manipulation

Open Source

Contribution activity

Notable upstream contributions

Experience

Research Assistant — Prof. Anna Choromanska's Lab

Research Intern — Prof. Sunil Chandran

AI Research Intern

Skills

Hover or tap for details

ML / Research

Languages

MLOps & Infra

Frameworks & Tools

Projects

Publications & Patents

Contact

Let's talk
research.

MS CS · NYU · Graduating Fall 2027

Achyuthan |

About

Achyuthan Sivasankar

## About

## Research highlights

## Research interests

## Education & affiliations

## Currently

Research

Circuit Synchronization Precedes Generalization: Fourier Structure in Grokking Transformers

KAN-Multi: Adaptive Multi-Basis Routing with Emergent Specialization

MoE-Bench: Open Benchmark for Expert Collapse in Sparse MoE LLMs

AD-LiST-JEPA: Self-Supervised LiDAR Perception for Autonomous Driving

AutoMoE: Meta-Learning Expert Topologies for Robotic Manipulation

Open Source

Contribution activity

Notable upstream contributions

Experience

Research Assistant — Prof. Anna Choromanska's Lab

Research Intern — Prof. Sunil Chandran

AI Research Intern

Skills

Hover or tap for details

ML / Research

Languages

MLOps & Infra

Frameworks & Tools

Projects

Publications & Patents

Contact

Let's talkresearch.

Let's talk
research.