Summary
Overview
Work History
Education
Skills
Timeline
Generic

Chenxu Shi

Beijing

Summary

Dynamic AI Architect with a proven track record at Baidu, specializing in AI model deployment and system architecture. Successfully designed a unified microservice framework, optimizing GPU usage and enhancing resource efficiency. Adept at team collaboration and C++ programming, driving impactful solutions in cloud computing environments.

Overview

8
8
years of professional experience

Work History

Al Architect

PipeChina | Beijing Zhi Wang Digital Technology Co., Ltd.
Beijing
09.2024 - Current

Key Responsibilities:

Spearhead the adaptation and deployment of AI models in PipeChina’s private cloud intelligent computing environment.

Ÿ Develop an Al model inference service framework to deliver scalable, user-friendly Al capabilities for business applications.

Ÿ Build a unified large-scale model service platform to bridge Al capabilities from the computing environment to business value.

Achievements:

Ÿ Design the architecture and detailed plan of the MaaS platform from scratch, independently write the bidding plan for the MaaS platform software, complete all work from initiating procurement to contract signing

Discuss various technical details of the platform implementation with the winning bidder to ensure the successful release of the platform.

Ÿ Develop C++ inference service framework from scratch based on the internal Ascend machines, implement configurable architecture, DVPP hardware acceleration, NPU multi stream acceleration and other technologies to ensure the application of 10+visual inspection algorithms in the safety monitoring of pipe

Technologists

Alibaba, Intelligence Engine
Beijing
09.2023 - 08.2024

Key Responsibilities:

Ÿ Supported the construction of high-performance resource pool for LLM training to ensure the supply of heterogeneous training computing power

Ÿ Enhanced multi tenant scheduling capability and scheduling strategy construction, and improved the effective utilization of resource pool

Ÿ Optimized training problem detection (hardware/driver/training framework) and automatic recovery to improve the efficiency of large-scale model development

Achievements:

Ÿ Led the integration of NV-H800 GPUs into Alibaba’s LLM training system, coordinated the training framework, scheduling system and resource platform

Ÿ Established the automatic fault detection and recovery process to achieve unmanned pre-training tasks

Developed network topology aware scheduling/sorting based on the training framework to ensure the communication stability during large model training

Senior R&D Engineer

Baidu
Beijing
06.2017 - 09.2023

Baidu, Search Large Model Deployment Technical Support 01.2023-09.2023

Key Responsibilities:

Ÿ Cooperated with all content technology parties to upgrade and renovate the existing system, constructed complete production pathway covering data acquisition, sample management, training optimization, and model deployment

Ÿ Supported LLM training and AI-native system exploration for generative AI applications

Achievements:

Ÿ Supported the deployment of one LLM model in key scenarios, supported the deployment of the Text-to-Image Generation Model in the scenario of generating images via Baidu’s search box

Baidu, MEG Content Understanding Platform Architecture 11.2018-12.2022

Key Responsibilities:

Ÿ Architected engineering solutions for hundreds of deep learning models (content analysis, security and generation)

Achievements:

Ÿ Implemented the underlying unified model microservice framework, implemented scheduling layer’s supports in both real-time stream-based and batch-based feature computation

Ÿ Supported various services including upper-level webpage/text understanding, image understanding, and video understanding

Ÿ Saved nearly a thousand GPUs by GPU model/service optimization and retraining

Ÿ Patent: A Feature Calculation Method and System Based on Microservices and DAG (CN202010157440.3)

Ÿ Awards: GPU Cost Optimization Special Award, Baidu Thumbs (Individual+Team)

Baidu, FEED Online Recommendation Service Architecture 06.2017-10.2018

Key Responsibilities:

Ÿ Engineered 10+ vertical recommendation systems (e.g., image galleries, celebrity/news feeds)

Achievements:

Ÿ Led the development of a general framework for vertical recommendation, abstracted UMS (User Model) workflows and operators for reverse recall, forward access, filtering, sorting, and display control

Ÿ Reduced CPU usage by 10-15% through architectural optimizations

Education

MEng - Control Theory And Control Engineering

Xidian University
Xi'an
06-2017

Bachelor of Engineering - Electrical Engineering And Automation

HENAN UNIVERSITY OF SCIENCE AND TECHNOLOGY
Luoyang
06-2014

Skills

  • AI model deployment
  • Cloud computing
  • Machine learning
  • C programming
  • System architecture
  • Resource optimization
  • Team collaboration
  • Architectural building systems

Timeline

Al Architect

PipeChina | Beijing Zhi Wang Digital Technology Co., Ltd.
09.2024 - Current

Technologists

Alibaba, Intelligence Engine
09.2023 - 08.2024

Senior R&D Engineer

Baidu
06.2017 - 09.2023

MEng - Control Theory And Control Engineering

Xidian University

Bachelor of Engineering - Electrical Engineering And Automation

HENAN UNIVERSITY OF SCIENCE AND TECHNOLOGY
Chenxu Shi