logo inner

AI Engineer

CompanyStrattmont
LocationRiyadh, Saudi Arabia
TypeOnsite
We are looking for an AI Engineer with strong experience in modern vision and multimodal models, especially on the Nvidia ecosystem, who can also build small services around these models to prove and validate product flows.
The ideal candidate is someone who:

  • Understands vision-language models (VLMs) and world foundation models
  • Knows how to work with cameras and camera control (PTZ, streams, RTSP, frame handling, etc.)
  • Can wrap these capabilities into a simple, robust service to demonstrate end-to-end AI flows in real use cases

This role is very hands-on and experimental, focused on prototyping, validating ideas quickly, and then hardening the most promising ones.

Key Responsibilities


  • Explore, evaluate, and integrate vision-language models (VLMs) and world foundation models for understanding real-world scenes from cameras.
  • Design and run experiments for scene understanding, spatial reasoning, and multimodal interaction (image/video + text).
  • Optimize and deploy models on the Nvidia stack (CUDA, TensorRT, DeepStream, or related Nvidia SDKs and tools).
  • Work with camera streams (e.g. PTZ, RTSP/IP cameras): frame capture, streaming, basic processing, and latency/performance tuning.
  • Prototype AI-driven flows that connect camera input → model inference → actionable output or API.

  • Build small backend services or microservices around AI models to test and demonstrate real product flows.
  • Expose models through APIs or internal tools so they can be consumed by other teams.
  • Write clean, maintainable code with basic testing, monitoring, and logging.
  • Collaborate with other engineers and product stakeholders to move from prototype → proof of concept → production-ready solution.

Must-Have Qualifications


  • Strong hands-on experience with computer vision or multimodal AI, especially:
  • Vision-language models (e.g. Gemini, GPT-4V, LLaVA, QWIN, or similar)
  • Scene understanding, detection, or tracking from camera feeds
  • Practical experience with Nvidia software & tools, such as:
  • CUDA, TensorRT, DeepStream, Nvidia SDKs / frameworks, or similar
  • Experience working with camera systems:
  • RTSP or IP cameras, video streams, frame processing, camera configuration/control
  • Solid programming skills in at least one of:
  • Python
  • Ability to design and implement a small service around a model:
  • REST/gRPC APIs, background workers, or simple pipelines
  • Comfortable working in a fast-paced environment with rapid prototyping and iteration.

Nice-to-Have


  • Experience with world models or world foundation models for spatial/scene reasoning, like COSMOS.
  • Experience deploying models on edge devices or GPU-based systems.
  • Familiarity with containers and infrastructure: Docker, Kubernetes, and a major cloud provider (AWS/GCP/Azure).

Background in robotics, autonomous systems, or real-time perception.

Your tracker settings

We use cookies and similar methods to recognize visitors and remember their preferences. We also use them to measure ad campaign effectiveness, target ads and analyze site traffic. To learn more about these methods, including how to disable them, view our Cookie Policy or Privacy Policy.

By tapping `Accept`, you consent to the use of these methods by us and third parties. You can always change your tracker preferences by visiting our Cookie Policy.

logo innerThatStartupJob
Discover the best startup and their job positions, all in one place.
Copyright © 2025