OpenSquilla Uses Local Classifier to Route AI Tasks and Cut Token Costs

OpenSquilla, an open-source AI agent framework launched on GitHub on May 6, 2026, has gained 157 stars by addressing token waste in agentic workflows through local classification and smart model routing. The framework uses an on-device LightGBM classifier with ONNX to direct tasks across four model tiers without exposing prompts to external services.

Local Classifier Enables Privacy-Preserving Model Routing

OpenSquilla's SquillaRouter component analyzes prompts using hybrid features including length, language, code blocks, and semantic embeddings to determine the appropriate model tier. This local classification ensures the "cheapest model that can handle the turn" while maintaining privacy—prompts never leave the user's device during routing decisions.

The framework implements adaptive reasoning that activates reasoning tokens only when deep thinking is necessary, and scales system prompts by task complexity to avoid unnecessary costs for simple queries. Selective skill loading reduces context overhead by loading only required tools from the 16+ bundled skills per task, rather than maintaining all capabilities in context permanently.

Four-Tier Memory and Unified Gateway Support

OpenSquilla operates with a four-tier cognitive memory system spanning working, episodic, semantic, and raw levels. Hybrid search combines full-text indexing with vector embeddings via sqlite-vec, while layered security sandboxing enforces permission tiers and policy controls.

The framework provides a unified gateway across web UI, CLI, and chat platforms including Slack, Telegram, Discord, and Feishu. It supports 20+ LLM providers including OpenAI, Anthropic, DeepSeek, and Ollama, operating as a microkernel system combining smart routing, persistent memory, secure sandboxing, web search, and local embeddings.

Token Efficiency Addresses Cost and Privacy Concerns

By routing tasks locally and loading skills selectively, OpenSquilla solves multiple problems in agentic workflows: token waste from always using frontier models for simple tasks, unnecessary context overhead, privacy concerns from external routing services, and cost optimization challenges. The Python-based framework is available under open-source licensing.

Key Takeaways

OpenSquilla uses a local LightGBM classifier to route AI tasks across four model tiers without exposing prompts externally
The framework gained 157 GitHub stars after launching on May 6, 2026
Adaptive reasoning activates reasoning tokens only when necessary, reducing costs for simple queries
Selective skill loading reduces context overhead by loading only required tools from 16+ bundled capabilities
Supports 20+ LLM providers with hybrid memory combining full-text indexing and vector embeddings

Local Classifier Enables Privacy-Preserving Model Routing

Four-Tier Memory and Unified Gateway Support

Token Efficiency Addresses Cost and Privacy Concerns

Key Takeaways

OpenSquilla uses a local LightGBM classifier to route AI tasks across four model tiers without exposing prompts externally

The framework gained 157 GitHub stars after launching on May 6, 2026

Adaptive reasoning activates reasoning tokens only when necessary, reducing costs for simple queries

Selective skill loading reduces context overhead by loading only required tools from 16+ bundled capabilities

Supports 20+ LLM providers with hybrid memory combining full-text indexing and vector embeddings