Projects

TVM: Tensor IR Stack for Deep Learning Systems

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Haichen Shen, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

tvm arch TVM is a Tensor intermediate representation(IR) stack for deep learning systems. It is designed to close the gap between the productivity-focused deep learning frameworks, and the performance- and efficiency-focused hardware backends. TVM works with deep learning frameworks to provide end to end compilation to different backends.

MinPy: NumPy Interface Deep Learning Framework with Mixed Backend Execution

Minjie Wang, Yutian Li, Ziheng Jiang, Yihe Tang, Haoran Wang, Tianjun Xiao, Jinyang Li, Zheng Zhang [GTC 2017]

tvm arch MinPy aims at providing a high performing and flexible deep learning platform, by prototyping a pure NumPy interface above MXNet backend. In one word, you get the following automatically with your NumPy code: 1. Operators with GPU support will be ran on GPU; 2. Graceful fallback for missing operations to NumPy on CPU; 3. Automatic gradient generation with Autograd support; 4. Seamless MXNet symbol integration.

MXNet-Autograd: Automatic Differentiation for Imperative Programming

autograd

As the author of MXNet-Autograd, I come up with a way to unify the gradient definition of symbolic and imperative in MXNet. It allows users to differentiate a graph of NDArray operations with the chain rule. This is called define-by-run, i.e., the network is defined on-the-fly by running forward computation. Users can define exotic network structures and differentiate them, and each iteration can have a totally different network structure.

NNVM-Fusion: Automatic Kernel Fusion and Runtime Compilation for Computational Graph

lenet-perf

NNVM-Fusion is a module which implements automatic GPU kernel fusion and runtime compilation based on NNVM. It can be easily used as a plugin of NNVM in different deep learning systems to gain a boost on performance.

As the author of this project, I explored how to achieve automatic kernel fusion on deep learning framework, a cutting edge technique at that time.

Accelerate: DSL of Array Computations for High-Performance Computing in Haskell

Accelerate defines an embedded domain-specific language(DSL) of array computations for high-performance computing in Haskell. Computations on multi-dimensional, regular arrays are expressed in the form of parameterised collective operations (such as maps, reductions, and permutations). These computations are online-compiled and executed on a range of architectures.

As one of eight selected students in Summer of Haskell 2016, I implemented some collective operations of Accelerate-LLVM. You can find more detail about my work in the below proposal.