Research Projects
My research projects focus on designing secure and high-performance branch and memory dependence predictors, compiler and hardware co-design, transient execution attacks defenses, secure memory, and FPGA-based acceleration.
SSMR: Statically Detecting Speculation Safe Memory Regions
Designed a hardware software co-design defense against Spectre and Meltdown that allows a modified load-store unit to detect unsafe speculative memory operations based on target addresses.
- Implemented the defense in the cycle-level accurate gem5 simulator.
- Extended Clang/LLVM with custom RISC-V instructions.
- Designed LLVM analysis and transformation passes to identify expected memory operations at compile time.
- Evaluated performance using custom microbenchmarks and SPEC CPU2017 workloads.
- Achieved approximately 7% performance overhead.
SCPC: Securing Cross Process Collision Based Transient Attacks
Built a functional-level model of AMD's Speculative Store Bypass Predictor and integrated it into gem5 to evaluate a virtualization-based defense that isolates prediction structures across processes and privilege levels.
- Applied the same defense to the TAGE-SC-L branch predictor to demonstrate generalizability.
- Introduced self-invalidation to prevent resource monopolization in predictor structures.
- Designed selective sharing policies for safe sharing across benign processes.
- Created synthetic workloads to stress predictor structures and identify bottlenecks.
Baobab Merkle Tree for Efficient Secure Memory
Co-designed and implemented the Baobab Merkle Tree, a secure memory integrity structure that improves upon the Bonsai Merkle Tree by reducing metadata overhead while preserving replay attack protection and data integrity.
- Designed an on-chip counter memoization table for efficient counter sharing.
- Reduced metadata overhead by 2x to 4x.
Autoprune: A Stochastic Candidate Pruning Strategy for Souper
Developed a machine-learning-based pruning strategy for Souper, a synthesis-driven superoptimization compiler, to reduce compilation time by pruning optimization candidates in LLVM intermediate representation.
- Created a custom dataset for candidate pruning.
- Trained machine learning models in Python.
- Deployed inference into Souper's C++ codebase using ONNX Runtime.
SPAR-2: A SIMD Processor Array for Machine Learning in IoT Devices
Aided in designing and building a custom array processor for FPGA-based IoT edge devices to accelerate matrix operations for machine learning workloads.
- Implemented the array processor as a processor-in-memory design.
- Used 16,384 processing elements for concurrent computation.
- Achieved 24.51x speedup over an HLS-based design and 1.75x over similar custom designs.