Technical lead: Strengthening the ecosystem for HiSilicon Kunpeng AArch64 CPU and HiSilicon Ascend AI/ML processors: HPC / Arm SVE / scientific computing software enabling, performance characterization (profiling, benchmarking, analysis), architecture evaluation on real hardware (ARM64 and x86-64 variants) and by architecture simulation (gem5 & internal simulators).
Determining compute and communication demands of computer vision (perception by sensors) and vehicle dynamics (path planning, trajectory control) methods while considering requirements for reliability, cost and functional safety based on IP blocks from HiSilicon Kunpeng and Ascend chip series.
Technical lead: Evaluation of the hardware feasibility, programming effort and performance of Near-Data Processing (NDP) on memory modules for server applications (DIMM-NDP). Building on standard IP, NDP units enhance the MediaController (MedC) on a memory module. The MedC is a discrete buffer chip positioned side-by-side the DRAM devices on the module and needed for forthcoming interface standards like JEDEC NVDIMM-P and Gen-Z (now CXL). DIMM-NDP employs unmodified standard DRAM chips and exploits unused rank-level bandwidth on DIMM, such that we follow the economy of scale of manufacturing standard DRAM, such as DDR4/DDR5. The memory module appears as normal Load-Reduced DIMM if NDP is switched off.
Simulation results show up to 6.3x better performance for bandwidth-limited applications, representing 79% of the theoretical peak of the evaluated configuration. We complement the evaluation with feasibility checks for DIMM-like form factors to offer 32GB to 128GB capacity per DIMM, hardware overhead costs (below 20%), and power envelopes for standard (13W) and custom DIMMs (40W).
Benchmarking and microarchitecture analysis of HiSilicon Hi1612/1610 generation of ARM-based multicore server chipsets with elementary tests (e.g., stream, lmbench) and app-level benchmarks (SPEC CPU, OMP, jbb2015) to determine microarchitectural improvements with respect to fairness, latency and utilization of the uncore (DDR3/DDR4 memory subsystem and interconnect) for future HiSilicon Kunpeng products. I also assessed the feasibility of integrating forthcoming interface standards (Gen-Z, JEDEC NVDIMM-P, DDR5) for use with near and far memory, as well as hybrid memory solutions (DRAM plus NVM).
L. Stanisic, R. Mijakovic, M. Gries: Performance Evaluation of the Ginkgo Sparse Linear Solver Framework on Arm, talk only, Arm HPC User Group (AHUG) workshop at ISC High Performance conference, Hamburg, Germany, May 2023, pdf at AHUG github: ISC23-AHUG_Luka-Stanisic.pdf
P. Falk, M. Gries, F. Herold, Qinfei Liu, M. Marchenko, Troy Patterson: Performance Evaluation of the BeeGFS File System on the Arm AArch64 Architecture, whitepaper, May 2022; available at ThinkParQ (external link)
M. Gries, P. Cabré, J. Gago: Performance Evaluation and Feasibility Study of Near-data Processing on DRAM Modules (DIMM-NDP) for Scientific Applications, Technical Report MRC-2019-04-15-R1, Munich Research Center, Huawei Technologies Duesseldorf GmbH, April 2019. pdf available at HAL archive