Brian Won (Injong)

Brian Won

Computer Science student at University of Toronto specializing in focused on ml inference and low-level system optimization

About

Hey! I'm Brian; recent cs grad from UofT. I'm interested in ml inference and low-level network system.

my focus is on inference optimization to large-scale ML infrastructure with with strong interest in understanding how technical performance impacts AI system reliability and user experience.

Technical Skills

Languages

  • Python (OOP, multithreading)
  • C/C++
  • Java
  • JavaScript
  • SQL
  • HTML/CSS

Frameworks & Tools

  • React, Next.js
  • Express, Node.js
  • Git, Docker
  • REST APIs
  • Linux, XML

Skill

  • Deep learning Inference
  • Latency Optimization
  • Distributed Systems
  • Linear Programming
  • Network Security

Selected Projects

Mini ML Inference Serving System
FastAPI · Python · Docker · Redis
2025
  • HTTP Inference API: Built a production-style inference service that accepts prediction requests over HTTP and returns structured outputs with end-to-end latency tracking.
  • Queueing + Backpressure: Implemented an asyncio-based request queue with overload protection (429 + rate limits) to prevent tail-latency collapse during traffic spikes.
  • Batched Inference Engine: Added dynamic batching (max batch size + max wait window) to improve throughput while controlling p95/p99 latency tradeoffs.
  • Metrics + Benchmarks: Exposed latency/throughput metrics (p50/p95/p99, QPS, queue wait) and ran benchmarks comparing single vs batched serving before/after optimizations.
  • Autoscaling Simulation: Implemented CPU/QPS-based worker scaling logic and evaluated stability (hysteresis) to avoid oscillation under variable load.
  • Dockerized Deployment: Packaged the server into Docker with env-configurable batching/worker parameters for reproducible local runs and load testing.

Experience

2025

University of Toronto

Developed concurrent user contest platform for nationwide programming challenges, improving platform scalability.

2024

Vector Institute

Research Assistant focusing on networked ML inference, optimizing distributed systems for lower noise and latency.

2023

RBC Capital Markets

Developed latency optimization tools for trading infrastructure using stochastic modeling and data-driven analytics.

2019-2020

IBM Watson

Enhanced integration testing platform for the DB2 team, improving reliability and coverage of automated test pipelines.

Teaching Assistant
University of Toronto
Multiple Terms
  • CSC165: Mathematical Foundations for Computer Science
  • CSC343: Database Management Systems
  • CSC2209: Networking Systems (Graduate Network-ML)
  • Network Algorithms, Operating Systems

Education

Bachelor of Science in Computer Science

University of Toronto • Class of 2025

Specialization: ML & Systems

Role: Teaching Assistant for Network Algorithms, Operating Systems, and other courses