CHEAH ZHONG ZHI
My name is Zhong Zhi. I joined Kerry Logistics Network (KLN) in 2020 as the Software Engineer Intern. During my internship at KLN, I worked closely with team and helped the company in saving license fee of existing EDI software by developing a new EDI platform from scratch.
I am a graduate of USM School of Computer Sciences, where I majored in Intelligent Systems. I am also an AWS Certified Solution Architect - Associate. Currently, I reside in Penang. Feel free to connect with me.
Email: zzcheah@live.com
LinkedIn: www.linkedin.com/in/zzcheah
Matrix No:
137053
Student Email:
Supervisor:
PM. Dr. Chan Huah Yong
Supervisor Email:
NS005
TensorFlow Dockers on GPU Cloud for AI Applications
TensorFlow Dockers on GPU Cloud for AI Applications, a.k.a. GPU Yard is a distributed system that accepts workload requests from users to be processed on distributed GPU machines.
GPU Yard aims to help in reducing the hassle of setting up GPU-accelerated environment for GPU-intensive tasks such as training a machine learning model. GPU Yard serves as a black box that receives input from the user and produce the output back to the user.
Use Case Example
Take training machine learning model as example, a user does not need to set up any GPU environment to train the model. The user can have the model trained by simply specifying the task and providing inputs like hyperparameters and datasets to GPU Yard. The output of the task would be trained model (e.g., TensorFlow model) that can be loaded and used right away.
GPU Yard Design
GPU Yard is a distributed system made up of 2 main components which are the main server and the remote workers.
The main server handles the job/task scheduling, queueing mechanism, and read/write operation to the database (MongoDB).
The remote workers are the actual processing units of the system. The remote workers poll job/task from the main server, process it and produce the output of the users’ requests. All jobs/tasks are processed within Docker container to provide isolations between users’ requests. Processing in container also provides benefits such as the ability to run tasks using libraries of different versions (e.g., TensorFlow v1.0 and TensorFlow v2.0).
Technologies / Frameworks / Libraries
Spring Boot, MongoDB, Docker
React, Redux, REST API, GraphQL
Cloud Computing, AWS, S3, EC2
Java, Python, JavaScript
Web Development, Material UI
More information: http://gpuyard.s3-website-us-east-1.amazonaws.com/