about
Hi! I’m an associate professor at the Electrical and Computer Engineering department at UBC.
Before joining UBC, I was a graduate student at the Electrical Engineering and Computer Science from MIT, working with Prof. Srini Devadas (PhD) and Prof. Arvind (MEng).
I also spent a number of years in industry, working on Bluespec and high-speed network processors.
research
My recent research has spanned:
-
Attacks on DNN accelerators. We developed new techniques for attacking sparse DNN accelerators that allow attackers to recover secret DNN model structures based on very limited side-channel information and in the presence of countermeasures like encryption.
-
Efficient DNN inference and training. We developed an algorithm and a hardware architecture that trains modern networks to iso-accuracy using 4× less time and energy. We also developed a technique to fuse adjacent convolutional layers, improving performance by up to 6×. We showed how to use resistive memories with standard DNN accelerators and improve energy up to 6×. Finally, we developed scheduling algorithms for running DNNs and other tensor workloads on PE array accelerators; the resulting scheduler produces energetically efficient schedules in far less time than prior tools.
-
Accelerator programmability. We demonstrated an efficient implementation of strong memory semantics in GPUs, and developed an efficient GPU hardware transactional memory.
-
Memory hierarchy optimization. We used locality-sensitive hashing to significantly improve cache compression ratios, demonstrated complementary compressibility within and across cachelines, and showed that cache approximation and compression are best decoupled. We also designed an abstraction and toolchain that factors data structures to reduce cache misses and reduce performance. Our machine-learning-based prefetcher won second place in the 2021 ML-Based Data Prefetching Competition.
-
3D graphics acceleration. We developed an efficient, scalable approach to split-frame rendering that takes advantage of mathematical properties of image composition.
-
Accelerators for 5G. We designed an efficient FPGA MAP equalizer to support Faster-than-Nyquist signalling.
-
Program analysis and optimization. We combined static and dynamic analysis to build a fast, accurate slicers for tracking down bugs in Android apps and in standalone Java programs. We also designed an abstraction and toolchain that factors data structures to reduce cache misses and reduce performance.
Before joining UBC, I worked on:
-
Scalable cache coherence. We built the Execution Migration Machine (EM²), a 110-core chip in 45nm ASIC flow that implements unified shared memory by migrating the execution contexts to the data they need to access. We also developed ways to support dependable and resilient cache coherence.
-
Network-on-Chip interconnects. We developed the scalable NoC simulator HORNET, demonstrated ways to randomize oblivious routing, optimize NoCs for bursts, support in-order delivery across multiple paths and across VCs, and efficiently schedule routing and allocate VCs.
-
Biochemical reaction networks. We built SSC, a domain-specific language and compiler for modeling biochemical reaction networks. We used it to study immunology interactions between CD4/CD8 and MHC, and to demonstrate spatial coordination in cell-membrane signalling. SSC has also enabled the nanomedicine online game NanoDoc.
-
Protein and nucleic acid structure modeling. We developed Spanner, a tool that threads a protein sequence onto an existing 3D template by filling indels from an existing naturally-occurring fragment database. We also designed AmyloidMutants to model the mutational landscape of disease-causing protein amyloids, and ways to design artificial RNA sequences optimized for a given structure.
-
High-level hardware synthesis. I worked on Bluespec, a high-level language for digital hardware design. Unlike other HDLs, Bluespec has strong high-level semantics and a strict typesystem. This allows Bluespec to advanced features, such as automatically verifying the safety of clock-domain crossings, synthesis of verification assertions into hardware, and in general permits faster design with high QoR.
teaching
Courses I teach / have taught at UBC:
- CPEN 212 (Computer Systems II)
- CPEN 211 (Computer Systems I)
- CPEN 291 (Computer Engineering Design Studio I)
- CPEN 311 (Digital Systems Design)
- CPEN 411 (Computer Architecture)
- CPEN 511 / EECE 527 (Advanced Computer Architecture)
students
Current graduate students:
- Dingqing Yang (PhD)
- MohammadHossein Olyaiy (PhD)
- Avilash Mukherjee (PhD), co-advised with Prof. Shekhar
- Khaled Ahmed (PhD), co-advised with Prof. Rubin
- Wenyi Gong (MASc) / (→ Apple)
- John Deppe (MASc), co-advised with Prof. Lemieux
Former students:
- Christopher Ng, MASc 2022 (→ d-Matrix)
- Muchen He, MASc 2022, co-advised with Prof. Nair (→ Solidigm)
- Amin Ghasemazar, PhD 2021, co-advised with Prof. Nair (→ Novarc)
- Xiaowei Ren, PhD 2020 / PDF 2021 (→ NVIDIA)
- Mohamed Omran Matar, MASc 2020 (→ AMD / Andes)
- Maximilian Golub, MASc 2018 (→ Mercedez-Benz Research / Microsoft), co-advised with Prof. Lemieux
- Mohammad Ewais, MASc 2018 (→ PhD@UofT)
- Peter Deutsch, co-advised with Prof. Nair, USRA 2019/2020 (→ PhD@MIT)
- Xianda Sun, USRA 2021
- Zefan Sramek, USRA 2017/2018 (→ PhD@東大/UTōkyō)
- Ellis Su, USRA 2016/2017 (→ Apple)
service
On Technical Program Committees (*ERC):
- ISCA 2023*, 2022*
- MICRO 2022, 2021, 2020, 2019, 2017*
- HPCA 2024, 2021
- WAX 2020
- OSDA 2024, 2023, 2020, 2019
- ICS 2024, 2019*, 2018, 2017, 2016, 2015
- ASAP 2017
- NAS 2017
- ISPASS 2016
- CCECE 2016
- ICCD 2015
- IISWC 2015
Conference organization:
publications
- K Ahmed, Y Wang, M Lis, J Rubin (2023). ViaLin: Path-Aware Dynamic Taint Analysis for Android. In FSE 2023. 〈 paper | code 〉
- M Olyaiy*, C Ng*, A Fedorova, M Lis (2023). Sunstone: A Scalable and Versatile Scheduler for Mapping Tensor Algebra on Spatial Accelerators. In ISPASS 2023. (*equal contribution). 〈 paper | code 〉
- D Yang, P Nair, M Lis (2023). HuffDuff: Stealing Pruned DNNs from Sparse Accelerators. In ASPLOS 2023. 〈 paper 〉
- K Ahmed, M Lis, J Rubin (2021). Slicer4J: A Dynamic Slicer for Java. In ESEC/FSE 2021. 〈 paper | code 〉
- A Asgari, A Gunter, M Saeidi, M Lis, P Nair (2021). MLMLP: A Case for Multi-Page Multi-Layer Perceptron Prefetcher. In MLArchSys 2021. Second place in the 2021 ML-Based Data Prefetching Competition.〈 paper 〉
- M Olyaiy, C Ng, M Lis (2021). Accelerating DNN Inference with Predictive Layer Fusion. In ICS 2021.〈 paper | ML code | sim code 〉
- K Ahmed, M Lis, J Rubin (2021). MANDOLINE: Dynamic Slicing of Android Applications with Trace-Based Alias Analysis. In ICST 2021. Distinguished paper award. 〈 paper | code 〉
- X Ren, M Lis (2021). CHOPIN: Scalable Graphics Rendering in Multi-GPU Systems via Parallel Image Composition. In HPCA 2021. 〈 paper | code 〉
- A Mukherjee, K Saurav, P Nair, S Shekhar, M Lis (2021). A Case for Emerging Memories in DNN Accelerators. In DATE 2021. 〈 paper 〉
- D Yang, A Ghasemazar*, X Ren*, M Golub, G Lemieux, M Lis (2020). Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training. In MICRO 2020. (*equal contribution). 〈 paper | ML code | sim code 〉
- MO Matar, M Jana, J Mitra, L Lampe, M Lis (2020). A Turbo Maximum-a-Posteriori Equalizer for Faster-than-Nyquist Applications. In FCCM 2020. 〈 paper 〉
- A Ghasemazar, P Nair, M Lis (2020). Thesaurus: Efficient Cache Compression via Dynamic Clustering. In ASPLOS 2020. 〈 paper | code 〉
- A Ghasemazar, M Ewais, M Lis (2020). Decoupling Approximation and Cache Compression. In WAX 2020. 〈 paper 〉
- A Ghasemazar, M Ewais, P Nair, M Lis (2020). 2DCC: Cache Compression in Two Dimensions. In DATE 2020. 〈 paper | code 〉
- L Ye, M Lis, A Fedorova (2019). A unifying abstraction for data structure splicing. In MEMSYS 2019. 〈 paper 〉
- A Ghasemazar, M Lis (2019). 2DCC: Cache Compression in Two Dimensions. Poster in ICCD 2019.
- M Golub, G Lemieux, M Lis (2019). Full Deep Neural Network training on a pruned weight budget. In SysML 2019 (a.k.a. MLSys 2019). 〈 paper | code 〉
- X Ren, M Lis (2018). High-performance GPU Transactional Memory via Eager Conflict Detection. In HPCA 2018. 〈 paper | code 〉
- A Ghasemazar, M Lis (2017). Gaussian Mixture Error Estimation for Approximate Circuits. In DATE 2017. 〈 paper | code 〉
- X Ren, M Lis (2017). Efficient Sequential Consistency in GPUs via Relativistic Cache Coherence. In HPCA 2017. 〈 paper | code 〉
- KS Shim, M Lis, O Khan, S Devadas (2015). The Execution Migration Machine: Directoryless Shared-Memory Architecture. IEEE Computer 48:50–59.
- KS Shim*, M Lis*, MH Cho, I Lebedev, S Devadas (2013). Design Tradeoffs for Simplicity and Efficient Verification in the Execution Migration Machine. In ICCD 2013. (*equal contribution)
- M Kinsy, MH Cho, KS Shim, M Lis, GE Suh, S Devadas (2013). Optimal and Heuristic Application-Aware Oblivious Routing. IEEE Transactions on Computers 62:59–73.
- KS Shim, M Lis, O Khan, S Devadas (2012). Thread Migration Prediction for Distributed Shared Caches. IEEE Computer Architecture Letters, RapidPosts September 2012.
- P Ren, M Lis, MH Cho, KS Shim, CW Fletcher, O Khan, N Zheng, S Devadas (2012). HORNET: A Cycle-Level Multicore Simulator. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31:890–903. 〈 code 〉
- KS Shim, M Lis, O Khan, S Devadas (2012). Judicious Thread Migration When Accessing Distributed Shared Caches. In CAOS 2012.
- A Levin*, M Lis*, Y Ponty, CW O’Donnell, S Devadas, B Berger, J Waldispühl (2012). A global sampling approach to designing and reengineering RNA secondary structures. Nuclear Acids Research 40:10041–10052. (*equal contribution)
- M Lis, KS Shim, MH Cho, O Khan, S Devadas (2011). Directoryless Shared Memory Coherence Using Execution Migration. In PDCS 2011. Best paper award.
- M Lis, KS Shim, MH Cho, S Devadas (2011). Memory coherence in the age of multicores. In ICCD 2011.
- O Khan, H Hoffmann, M Lis, F Hijaz, A Agarwal, S Devadas (2011). ARCc: A Case for an Architecturally Redundant Cache-coherence Architecture for Large Multicores. In ICCD 2011.
- M Lis, KS Shim, MH Cho, C Fletcher, M Kinsy, I Lebedev, O Khan, S Devadas (2011). Brief Announcement: Distributed Shared Memory based on Computation Migration. In SPAA 2011.
- MH Cho, KS Shim, M Lis, O Khan, S Devadas (2011). Deadlock-Free Fine-Grained Thread Migration. In NOCS 2011. Best Paper Award.
- M Lis, KS Shim, O Khan, S Devadas (2011). Shared Memory via Execution Migration. In the ASPLOS 2011 Ideas and Perspectives Session.
- O Khan, M Lis, Y Sinangil, S Devadas (2011). DCC: A Dependable Cache Coherence Multicore Architecture. IEEE Computer Architecture Letters 10:12–15.
- M Lis, P Ren, MH Cho, KS Shim, CW Fletcher, O Khan, S Devadas (2011). Scalable, accurate multicore simulation in the 1000-core era. In ISPASS 2011. 〈 code 〉
- KS Shim, MH Cho, M Lis, O Khan, S Devadas (2011). System-level Optimizations for Memory Access in the Execution Migration Machine (EM²). In CAOS 2011.
- M Lis, T Kim, J Sarmiento, D Kuroda, H Dinh, AR Kinjo, S Devadas, H Nakamura, DM Standley (2011). Bridging the gap between single-template and fragment based protein structure modeling using Spanner. Immunome Research 7:1–8.
- CW O’Donnell, J Waldispühl, M Lis, R Halfmann, S Devadas, S Lindquist, B Berger (2011). A method for probing the mutational landscape of amyloid structure. Bioinformatics 27:i34–i42.
- O Khan, M Lis, S Devadas (2010). Instruction-Level Execution Migration. CSAIL Technical Report TR-2010-019.
- M Lis, MH Cho, KS Shim, S Devadas (2010). Path-Diverse Inorder Routing. In ICGCS 2010.
- M Lis, KS Shim, MH Cho, P Ren, O Khan, S Devadas (2010). DARSIM: a parallel cycle-level NoC simulator. In MoBS 2010.
- MN Artyomov, M Lis, S Devadas, MM Davis, AK Chakraborty (2010). CD4 and CD8 binding to MHC molecules primarily acts to enhance Lck delivery. Proceedings of the National Academy of Sciences 107:16916–16921.
- MN Artyomov, M Lis, AK Chakraborty (2009). Spatial coordination in membrane proximal signaling in T-cells. Bulletin of the American Physical Society 54.
- MH Cho, M Lis, KS Shim, M Kinsy, S Devadas (2009). Path-Based, Randomized, Oblivious Routing. In NoCArc 2009.
- MH Cho, M Lis, M Kinsy, KS Shim, T Wen, S Devadas (2009). Oblivious Routing in On-Chip Bandwidth-Adaptive Networks. In PACT 2009.
- M Lis, MH Cho, KS Shim, S Devadas (2009). Guaranteed in-order packet delivery using Exclusive Dynamic Virtual Channel Allocation. CSAIL Technical Report TR-2009-036.
- KS Shim, MH Cho, M Kinsy, T Wen, M Lis, GE Suh, S Devadas (2009). Static Virtual Channel Allocation in Oblivious Routing. In NOCS 2009.
- M Lis, MN Artyomov, S Devadas, AK Chakraborty (2009). Efficient stochastic simulation of reaction–diffusion processes via direct compilation. Bioinformatics 25:2289–2291.
- DM Standley, M Lis, AR Kinjo, H Nakamura (2009). Protein Function Annotation from Sequences and Structures with Tools at PDBj. In AsCA 2009.
- DM Standley, AR Kinjo, M Lis, M van der Giezen, H Nakamura (2008). Structure-based functional annotation of protein sequences guided by comparative models. In Optimization and Systems Biology 2008.
- M Pellauer, M Lis, D Baltus, RS Nikhil (2005). Synthesis of synchronous assertions with guarded atomic actions. In MEMOCODE 2005.
- W Ecker, V Esen, T Steininger, M Lis (2005). A Case Study in Rule-Based Modeling. In IP/SoC 2005.