Brian Won

Brian Won

About

Hey! I'm Brian; cs grad from UofT. I'm interested in ml-inference and LLM optimization

My focus is on inference optimization to large-scale ML infrastructure with strong interest in understanding how technical performance impacts AI system reliability and user experience.

Technical Skills

Languages

  • Python (OOP, multithreading)
  • C / C++
  • Java
  • JavaScript
  • SQL
  • HTML / CSS

Frameworks & Tools

  • React, Next.js
  • Express, Node.js
  • Git, Docker
  • REST APIs
  • Linux, XML

Specialties

  • Deep Learning Inference
  • Latency Optimization
  • Distributed Systems
  • Linear Programming
  • Network Security

Featured Projects

EdgeRAG

Interactive Demo Available
  • Edge-optimized RAG system for financial analysis, focusing on retrieval accuracy, latency, and response grounding.
  • Iterating on chunking heuristics and metadata-aware retrieval for financial text.
  • Comparing standalone LLM outputs vs retrieval-augmented responses for factual consistency.
  • Exploring memory usage, model size trade-offs, and inference performance at the edge.
  • Investigating real-time data integration, hybrid dense–sparse retrieval, and prompt orchestration.
Python Vector DB RAG FastAPI React

Performance Bid-Ask Order Book

Interactive Demo Available
  • Sub-500ns price-time priority matching engine in C++17 with lock-free order ingestion and cache-line aligned structs.
  • Profiled via Google Benchmark and perf to eliminate L1 cache bottlenecks on the critical match path.
  • Dynamic order book with fixed-size array + linked list for efficient maintenance and retrieval.
C++ Prop Trading L2 Order Book WebSockets Market Signals

Experience

2026

Linkedin

Built fine-tuning and RLHF pipeline, establishing evals and regression tests for agentic workflows.

2025

University of Toronto

Architected end-to-end ML pipeline observability using Prometheus and Grafana for platform serving 18k+ concurrent users p>

2023

RBC Capital Markets

Developed latency optimization tools for trading infrastructure using stochastic modeling and data-driven analytics.

2019–2020

IBM Watson

Enhanced integration testing platform for the DB2 team, improving reliability and coverage of automated test pipelines.

Teaching Assistant
University of Toronto
Multiple Terms
  • CSC373: Algorithm Design, Analysis & Complexity
  • CSC343: Database Management Systems
  • CSC2209: Networking Systems (Graduate Network-ML)
  • Network Topology, Operating Systems

Education

Bachelor of Science in Computer Science

University of Toronto · Class of 2025

Specialization: ML & Systems

Role: Teaching Assistant — Network Topology, Operating Systems, and more