Efficient and Autonomous AI Inference (TURING Project)
Scalable, trustworthy, and deployable AI inference for real-world systems
Overview
Modern AI models are powerful but computationally heavy.
Deploying them in real-world systems — especially industrial and edge environments — requires careful optimization of latency, memory, and energy consumption.
Under the Horizon Europe TURING project, this research focuses on:
- Efficient model inference
- Autonomous AI execution
- Scalable and trustworthy deployment pipelines
- Real-world industrial constraints
The work is conducted at NTUA (ICCS – Institute of Communication and Computer Systems).
Problem
Large AI models face several deployment challenges:
- High latency during inference
- Large memory footprint
- Energy inefficiency at the edge
- Limited hardware resources
- Trust and reliability concerns
We ask:
How can AI models adapt dynamically to resource constraints while maintaining performance and reliability?
Research Directions
This project explores:
- Efficient inference architectures
- Model compression and optimization
- Dynamic computation (e.g., conditional execution, early exits)
- Scalable deployment strategies
- Edge and industrial AI systems
- Reproducible AI pipelines for real-world environments
The broader goal is to move from static AI models to adaptive and autonomous AI systems.
Contributions (Ongoing)
- Studying scalable inference strategies for large models
- Exploring compute-aware AI design
- Investigating deployment-aware model optimization
- Aligning AI systems with real-world latency and resource constraints
Impact
- Enables deployable AI for industrial systems
- Reduces inference latency and computational cost
- Supports trustworthy and scalable AI deployment
- Advances edge intelligence for connected environments
Vision
The long-term objective is to design AI systems that:
- Understand their execution environment
- Adapt computation dynamically
- Maintain reliability under constraints
- Bridge theory and deployable systems engineering