1 | Cheng Li

MLModelScope: Evaluate and Introspect Cognitive Pipelines

The current landscape of cognitive pipelines exercises many Machine Learning (ML) and Deep Learning (DL) building blocks. These ML and DL building blocks leverage non-uniform frameworks, models, and system stacks. Currently, there is no end-to-end …

Accelerating Reduction and Scan Using Tensor Core Units

Driven by deep learning, there has been a surge of specialized processors for matrix multiplication, referred to as Tensor Core Units (TCUs). These TCUs are capable of performing matrix multiplications on small matrices (usually 4 X 4 or 16 X 16) to …

TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep Learning Inference in Function as a Service Environments

Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines. Cloud computing, as the de-facto backbone of modern computing infrastructure, has to be able to handle …

1

MLModelScope: Evaluate and Introspect Cognitive Pipelines

Accelerating Reduction and Scan Using Tensor Core Units

TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep Learning Inference in Function as a Service Environments

Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects (Best Paper Award)

Accelerating Reduction Using Tensor Core Units

Matrix Factorization on GPUs with Memory Optimization and Approximate Computing

RAI: A Scalable Project Submission System for Parallel Programming Courses

KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism

DjiNN and Tonic: DNN as a Service and Its Implications for Future Warehouse Scale Computers

Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers