Email: JerryChee [at]



I am a third year Ph.D. student in Computer Science at Cornell University. I am interested in scalability and resource efficiency in machine learning, particularly through stochastic methods and compression methods. I am also interested in the intersection of statistics and machine learning. I am fortunate to be advised by Chris De Sa.

During the Summer of 2021 I worked in IC3-AI at Microsoft, on using pruning to speed-up real-time audio enhancement models for Microsoft Teams.

In previous years I was a Research Intern at the Baidu Cognitive Computing Lab in Bellevue, WA in the Spring and Summer of 2019 Additionally, I worked with Panos Toulis at UChicago Booth on statistically motivated topics in stochastic gradient descent (SGD).

In my previous professional life I worked as a data scientist consultant at McKinsey & Company. I graduated from the University of Chicago in 2017 with a degree in Computational and Applied Mathematics, with internships in data science at Nielsen and Uptake.


  • Convergence diagnostics for stochastic gradient descent with constant step size, with Panos Toulis, AISTATS 2018, oral presentation.
    We focus on detecting the convergence of SGD with constant learning rate to its convergence phase. Borrowing from the theory of stopping times in stochastic approximation, we developed a simple diagnostic that uses inner products of successive gradients to detect convergence. Theoretical and empirical results suggest that the diagnostic reliably detects the phase transition, which can speed up classical procedures.

In Submission

  • Pruning Neural Networks with Interpolative Decompositions.
    A principled approach to pruning by low-rank matrix approximation, using a novel application of the interpolative decomposition to approximate the activation output of a layer.


  • AISTATS, Lanzarote, Canary Islands, 2018
    Oral Presentation
    Convergence Diagnostics for SGD with constant stepsize.

  • JSM, Denver, USA, 2019
    Topic-contributed paper session
    Statistical properties of stochastic gradient descent.