"When something is important enough, you do it even if the odds are not in your favor."
krxgu@macbook-pro:~
whoami
Krish Gupta
cat role.json
"role": "HPC + Compilers Engineer",
"focus": ["Parallel Runtimes", "Profiling", "Correctness"],
"stack": ["LLVM/Flang", "C++23", "CUDA", "MPI"]

Krish Gupta

Passionate about parallel computing, optimizing hardware, and building efficient AI infrastructure.

"How can we make inference hardware that is 10, 20, 50, 1,000 times more efficient than what we have today?"
— Jeff Dean (Google Chief Scientist)

Selected Engineering

GPU Roofline Benchmark

Performance Engineering

A cross-platform benchmarking tool to identify bandwidth vs. compute bottlenecks across diverse hardware (CPU, CUDA, Metal).

  • Built micro-benchmarks (SAXPY/Triad/SGEMM) for peak performance measurement.
  • Automated device selection and roofline plotting logic.
  • Supports heterogeneous backends (CUDA, Metal, OpenMP).
C++17 CUDA Python CMake
[Image Slot]
Add roofline.png to assets/images/

Slurm Mini-Cluster

HPC Operations & Infrastructure

Local simulation of a multi-node HPC cluster for job scheduling and parallel workload validation.

  • Deployed multi-node Ubuntu cluster with Slurm + Munge authentication.
  • Validated scheduling policies with real MPI/OpenMP workloads.
  • Created comprehensive ops runbook for job triage and service recovery.
Linux Slurm MPI Bash
[Image Slot]
Add slurm.png to assets/images/

GNU Radio 4.0 Expansion GSoC '25

Signal Processing & C++23

Ported and modernized legacy DSP blocks to the new modular 4.0 architecture, enabling high-throughput signal chains.

  • Implemented Math/Analog/Digital families using C++23 template-registry patterns.
  • Achieved ~2x throughput in targeted benchmarks vs legacy blocks.
  • Added 75+ GoogleTest units and established CI regression baselines.
C++23 SIMD GoogleTest
[Image Slot]
Add gnuradio.png to assets/images/

Building

ai-agent-guardrails ↗

Safety middleware for autonomous agents. Enforces deterministic output constraints and behavioral bounds for LLM-driven actions.

Reliability
vercel-fluid-mvp ↗

Request coalescing backend. Implements Redis-based locking to reduce redundant upstream LLM calls and improve tail latency.

Performance

Open Source

LLVM / Flang (OpenMP) ↗

Contributing to the Flang frontend for OpenMP support. Focused on semantics correctness and diagnostics.

  • Fixed atomic ops on complex types.
  • Added FileCheck regression coverage.
GNU Radio 4.0 ↗

Core contributor to the next-gen SDR runtime (GSoC '25).

  • Migrated 4000+ LOC to C++23.
  • Optimized memory handling for block buffers.

Notes

View all on Medium →

Inspiration

Elon Musk

"Be a net contributor to society."

— Elon Musk
Steve Jobs

"I’m convinced that about half of what separates the successful entrepreneurs from the non-successful ones is pure perseverance."

— Steve Jobs (1995)
Thomas Edison

"Our greatest weakness lies in giving up. The most certain way to succeed is always to try just one more time."

— Thomas Edison