What You’ll Learn
- Project Overview:
- Efficient AI inference system with data compression and AI accelerator hardware for large-scaled pre-trained large language models.
- Skills You’ll Learn:
- Inference for large language models
- AI storage systems
- Hardware-based AI Inference
What You’ll Do
This summer internship aims to create new efficient AI inference system for large-scaled pre-trained LLMs, which leverages data compression and AI accelerator hardware. AI accelerator logic is implemented on simulation environment. Data compression logic is implemented into device emulator, which considers data transfer pattern between storage and AI accelerator hardware. This position requires to design highly efficient AI inference system to choose most effective data compression for pre-trained LLMs.
Location: Working onsite at our San Jose headquarters at least 4 days per week on Monday – Thursday, with the flexibility to work remotely the remainder of your time
Reports to: Director of AI/ML Software Engineering
- Design AI inference system with data compression
- Build AI inference system with device emulator
- Integrate data compression algorithm into AI inference system
- Measure performance of AI inference system
- Optimize performance of AI inference system
- Complete other responsibilities as assigned.
What You Bring
- Pursuing PhD with 3+ years in Computer Science preferred.
- Must have at least 1 academic quarter/semester remaining
- Required knowledge: NLP/LLM, LLM model compression
- Required skillsets: PyTorch, Python/C++
- You’re inclusive, adapting your style to the situation and diverse global norms of our people.
- An avid learner, you approach challenges with curiosity and resilience, seeking data to help build understanding.
- You’re collaborative, building relationships, humbly offering support and openly welcoming approaches.
- Innovative and creative, you proactively explore new ideas and adapt quickly to change