ToolNeuron is the most advanced offline-first AI assistant for Android, featuring complete on-device processing with enterprise-grade encryption, intelligent document understanding through RAG (Retrieval-Augmented Generation), text-to-speech, an extensible plugin system, and sophisticated memory management. Your data never leaves your device. No cloud dependencies. No subscriptions. True digital sovereignty.
Download APK · Join Discord · Report Issue
Complete Privacy: Hardware-backed AES-256-GCM encryption. Zero telemetry. All processing happens on your device.
Sophisticated RAG System: Inject and query documents (PDF, Word, Excel, EPUB) with semantic search and encrypted knowledge bases.
Secure Memory Vault: Crash-recoverable encrypted storage with Write-Ahead Logging, LZ4 compression, and content deduplication.
Offline-First: Works completely offline after model downloads. No internet required for AI inference.
On-Device TTS: Text-to-speech with 10 voices, 5 languages, adjustable speed and quality — all processed locally.
Plugin System: Built-in web search, calculator, and developer utilities — extensible with custom plugins.
Advanced Features: Function calling, multi-modal generation, customizable inference parameters, and concurrent model downloads.
- Features Overview
- Text Generation
- Image Generation
- Text-to-Speech (TTS)
- Plugin System
- RAG System (Document Intelligence)
- Memory Vault (Secure Storage)
- Document Processing
- Model Management
- Privacy & Security
- Installation
- Quick Start
- Technical Details
- Use Cases
- Building from Source
- Roadmap
- FAQ
| Feature | Description |
|---|---|
| Text Generation | Run any GGUF model locally (Llama, Mistral, Gemma, Phi, etc.) with streaming output |
| Image Generation | Stable Diffusion 1.5 with censored & uncensored variants, inpainting support |
| Text-to-Speech | On-device TTS with 10 voices, 5 languages, adjustable speed and denoising steps |
| Plugin System | Web search, calculator, and dev utils plugins with tool calling integration |
| RAG System | Document injection with semantic search, encrypted knowledge bases, multi-source support |
| Memory Vault | Hardware-backed AES-256-GCM encryption, WAL crash recovery, LZ4 compression |
| Document Processing | Parse PDF, Word (.doc/.docx), Excel (.xls/.xlsx), EPUB, and plain text |
| Model Store | Browse and download models from HuggingFace repositories in-app |
| Function Calling | Tool/function calling with grammar-based optimization |
| Secure Storage | Content deduplication, three-tier caching, automatic defragmentation |
| No Permissions | Load models without storage permissions using Android SAF |
- Format: Any GGUF model (Llama 3, Mistral, Gemma, Phi, Qwen, etc.)
- Size Range: 500MB (1B models) to 20GB+ (70B models)
- Quantization: All GGUF quantizations supported (Q2_K, Q4_K_M, Q5_K_S, Q6_K, Q8_0, F16, etc.)
- Model Categories: General, Medical, Research, Coding, Uncensored, Business, Cybersecurity
| Device Tier | RAM | Model Size | Speed |
|---|---|---|---|
| Budget | 6GB | 1-3B Q4 | 2-4 tokens/sec |
| Mid-Range | 8GB | 7-8B Q4 | 4-8 tokens/sec |
| Flagship | 12GB+ | 8B Q6 | 8-15 tokens/sec |
Reports from users: 7-second response times for 8B Q6 models on flagship devices
- Streaming Output: Real-time token-by-token generation with
Flow<GenerationEvent> - Custom Parameters: Temperature, top-k, top-p, min-p, repeat penalty, context length
- System Prompts: Configure per-model system prompts
- Function Calling: Tool/function calling with grammar-based JSON schema enforcement
- Model Configuration: Save and manage configurations per model
- Device Optimization: Auto-detect device tier and recommend optimal parameters
- Memory Management: Memory-mapped model loading, automatic RAM optimization
onToken(token: String) // Real-time token streaming
onToolCall(toolCall: ToolCall) // Function call detection
onDone(metrics: Metrics) // Generation completion
onError(error: Throwable) // Error handling
onMetrics(metrics: Metrics) // Performance metrics- Models: Censored and uncensored variants
- Engine: LocalDream integration with NPU/CPU support
- Generation Time: 30-90 seconds depending on device hardware
- Text-to-Image: Generate images from text prompts
- Inpainting: Edit specific regions with mask support
- Custom Parameters:
- Resolution: 512x512, 768x768, 1024x1024
- Steps: 10-50 inference steps
- CFG Scale: Prompt adherence control
- Seed: Reproducible generation
- Negative Prompts: Exclude unwanted elements
- Denoise Strength: Inpainting intensity
- Schedulers: DPM and other scheduler support
- Intermediate Results: View generation progress with intermediate images
- Safety Checker: Optional NSFW content filtering
- Pony Model Support: Specialized anime/cartoon models
- Backend Control: Start, stop, restart generation backend
- State Monitoring: Real-time backend and generation state tracking
On-device speech synthesis powered by Supertonic TTS. All processing happens locally — no cloud APIs, no data leaves your device.
- 10 Voices: 5 female (F1–F5) and 5 male (M1–M5)
- 5 Languages: English, Korean, Spanish, Portuguese, French
- Speed Control: 0.5x to 2.0x playback speed
- Denoising Steps: 1–8 steps (higher = better quality, slower synthesis)
- Auto-speak: Automatically read assistant responses aloud after generation
- On-demand Loading: TTS model loads automatically on first use if not preloaded
- Load on App Start: Optionally preload the TTS model at launch for instant speech
- NNAPI Acceleration: Hardware acceleration on supported devices
- Playback Controls: Play, pause, resume, stop with real-time synthesis progress
- Per-message TTS: Tap the speak button on any assistant message to hear it
All TTS preferences are persisted and configurable from the Settings screen:
- Voice, language, speed, denoising steps
- Auto-speak toggle
- NNAPI hardware acceleration toggle
- Load on app start toggle
Extensible plugin architecture that integrates with LLM tool calling. Plugins execute locally and render custom UI for results.
- Engine: DuckDuckGo search with configurable result count (5–10)
- Web Scraping: CSS selector-based content extraction from search results
- Safe Search: Optional safe search filtering
- Custom UI: Rich display of search results and scraped content
- Expressions: Supports +, -, *, /, ^, %, parentheses
- Functions: sqrt, sin, cos, tan, asin, acos, atan, log, log10, ln, abs, ceil, floor, round
- Constants: pi, e
- Unit Conversion: Length (m, km, mi, ft, in...), weight (kg, lb, oz...), time (s, min, h, day), data (b, kb, mb, gb, tb), temperature (C, F, K)
- Text Transforms: uppercase, lowercase, reverse, title_case, snake_case, camel_case, trim
- Hashing: MD5, SHA-1, SHA-256, SHA-512
- UUID Generation: Bulk generation up to 10 at once
- Text Statistics: Character, word, line, and sentence counts
- JSON: Formatting and validation
- Base64: Encoding and decoding
- Tool Calling Integration: Plugins register as tools the LLM can invoke via grammar-based JSON schema enforcement (LAZY/STRICT modes)
- Custom UI Rendering: Each plugin provides Compose UI for displaying results
- Enable/Disable: Toggle individual plugins from the Settings screen
- Execution Metrics: Tracks execution time, success/failure per plugin call
The RAG (Retrieval-Augmented Generation) system enables Tool-Neuron to inject external knowledge into conversations, allowing AI to answer questions based on your documents with semantic understanding.
Create knowledge bases from plain text input:
- Paste or type text content
- System chunks and embeds automatically
- Instant semantic search capability
Parse and embed documents:
- Supported Formats: PDF, Word (.doc/.docx), Excel (.xls/.xlsx), EPUB, TXT
- Multi-Sheet Excel: Each sheet embedded separately with metadata
- Table Extraction: Word tables preserved with structure
- Automatic Chunking: Intelligent text segmentation
- Metadata Tracking: File name, MIME type, source tracking
Convert conversations into queryable knowledge:
- Export chat history as RAG
- Enable AI to reference past conversations
- Preserve conversation context across sessions
Import pre-built RAG files:
.neuronpacket format- Encrypted RAG sharing
- Version control and metadata
Enterprise-grade encrypted knowledge bases:
- Admin Password Protection: Master password for RAG access
- Read-Only Users: Grant limited access without admin privileges
- Hardware-Backed Encryption: AES-256-GCM with Android KeyStore
- User Management: Add/remove read-only users
- Access Control: Fine-grained permission system
Query System:
- Semantic Search: Embedding-based similarity search (cosine similarity)
- Top-K Results: Return most relevant chunks
- Context Injection: Automatically augment prompts with relevant knowledge
- Multi-RAG Support: Query across multiple loaded RAGs simultaneously
RAG Management:
- Enable/Disable: Control which RAGs are active for queries
- Lazy Loading: Load RAGs into memory on demand
- Status Tracking: INSTALLED, LOADED, LOADING, ERROR states
- Metadata: Domain, language, version, tags, embedding model info
- Size Management: Track RAG file size, compression ratio
- Delete/Export: Remove or share RAG files
Loading Modes:
- Embedded: RAG stored within app data (persistent)
- Transient: Temporary loading from external files
NeuronGraph Integration:
- Node-based knowledge representation
- Graph traversal for related concepts
- Serialization/deserialization support
- Model: all-MiniLM-L6-v2-Q5_K_M (768-dimensional embeddings)
- Auto-Download: Fetches embedding model from HuggingFace on first use
- Batch Processing: Efficient batch embedding generation
- Normalization: Optional L2 normalization for cosine similarity
- RAG Overlay: Transparent overlay shows retrieved context during chat
- RAG Data Explorer: Browse all chunks, edit metadata, view embeddings
- RAG Statistics: Size, chunk count, embedding coverage
- Search & Filter: Full-text search within RAGs
- Category & Tag Management: Organize RAG content
The Memory Vault is Tool-Neuron's sophisticated encrypted storage system, providing crash-recoverable, compressed, deduplicated storage with enterprise-grade security.
Hardware-Backed Encryption:
- Algorithm: AES-256-GCM with 96-bit IV
- Key Storage: Android KeyStore (hardware-backed on supported devices)
- Key Migration: Automatic re-encryption on key rotation
- Auth-Tagged: GCM mode provides authentication and integrity
Write-Ahead Logging (WAL):
- Crash Recovery: Automatic recovery from crashes/power loss
- Transaction Safety: ACID-compliant operations
- Checkpoint System: Periodic index checkpointing
- Rollback Support: Restore from checkpoints on corruption
LZ4 Compression:
- Fast Compression: Real-time compression/decompression
- Ratio Tracking: Monitor compression efficiency
- Block-Level: Compress individual blocks for efficient I/O
- Configurable: Adjust compression level
Content Deduplication:
- SHA-256 Hashing: Identify duplicate content
- Reference Counting: Track shared content usage
- Automatic Cleanup: Remove unreferenced blocks
- Storage Efficiency: Reduce redundant encrypted data
- Full-text indexed conversation messages
- Tokenization for search
- Timestamp tracking
- Category and tag support
- Binary file storage with MIME type tracking
- Image, document, and arbitrary file support
- Metadata preservation
- Content deduplication
- 768-dimensional vector storage
- Semantic search with cosine similarity
- Batch embedding support
- Normalization options
- JSON-serialized custom structures
- Schema-flexible storage
- Queryable metadata
Three-Tier Architecture:
- L1 Hot Cache: In-memory cache for frequently accessed items (< 1MB)
- L2 Memory-Mapped: Memory-mapped file access for warm data (< 5MB)
- L3 On-Demand: Disk-based access for cold data
Cache Metrics:
- Hit/miss rates
- Eviction tracking
- Memory usage monitoring
- Performance optimization
Search Capabilities:
- Full-Text Search: Tokenized text search across messages
- Semantic Search: Embedding-based similarity search
- Category Filter: Filter by predefined categories
- Tag Filter: Multi-tag filtering support
- Time Range: Search within date/time ranges
- Content Type Filter: Filter by data type
Maintenance:
- Defragmentation: Reclaim wasted space from deleted items
- Index Rebuilding: Reconstruct search indices
- Validation: Integrity checking and corruption detection
- Backup: Export vault with compression
- Restore: Import from backup files
Monitor storage health:
- Total Items: Count by type (messages, files, embeddings, custom)
- Size Metrics: Compressed vs uncompressed sizes
- Compression Ratio: Efficiency tracking
- Wasted Space: Identify fragmentation
- Time Range: Earliest and latest item timestamps
- Content Type Breakdown: Distribution across data types
- Vault Dashboard: Overview of all vault contents
- Statistics Screen: Detailed metrics and graphs
- Data Explorer: Browse, search, filter all items
- Metadata Editor: Edit categories, tags, search text
- User Management: Manage vault access credentials (admin, read-only)
- Logger Screen: Debug logs with operation timing and encryption metrics
Comprehensive document parsing with format detection and content extraction.
- Engine: PDFBox-Android
- Capabilities: Text extraction, metadata parsing
- Streaming: Efficient I/O for large files
- Engine: EpubLib
- Capabilities: E-book text extraction, chapter navigation
- HTML Cleanup: Automatic tag removal for plain text
- Formats: .docx (Office Open XML), .doc (binary)
- Engine: Apache POI
- Capabilities:
- Paragraph extraction
- Table parsing with structure preservation
- Formatting metadata
- Formats: .xlsx (Office Open XML), .xls (binary)
- Engine: Apache POI
- Capabilities:
- Multi-sheet support with sheet names
- Cell type detection (string, numeric, boolean, formula)
- Formula evaluation
- Comprehensive cell formatting
- Encodings: UTF-8, UTF-16, ASCII
- Line Ending Support: Unix, Windows, Mac
- MIME Type Detection: Automatic format detection with fallback to file extension
- Error Handling: Informative error messages on parse failures
- Progress Tracking: Real-time parsing progress for large documents
- Metadata Extraction: Title, author, creation date, modification date
- Logging: Comprehensive debug logging for troubleshooting
In-App HuggingFace Integration:
- Browse HuggingFace model repositories
- Search with filters (model type, size, tags)
- Add custom repositories by username/org
- View model metadata (size, quantization, tags, downloads)
Download Management:
- Concurrent Downloads: Download multiple models simultaneously
- Progress Tracking: Real-time progress notifications
- WorkManager Integration: Robust background task management
- Resume Capability: Resume interrupted downloads
- Foreground Service: Persistent downloads (Android 14+ compliant)
Model Categories:
- General: General-purpose conversational models
- Medical: Healthcare and medical domain models
- Research: Academic and research-focused models
- Coding: Programming and code generation models
- Uncensored: Unfiltered and uncensored models
- Business: Professional and business domain models
- Cybersecurity: Security and penetration testing models
Per-Model Settings:
Loading Parameters:
- Thread count (auto-detect or manual)
- Context size (512 to 32768+ tokens)
- Quantization options
- GPU layers (if supported)
Inference Parameters:
- Temperature (0.0 - 2.0)
- Top-k sampling
- Top-p (nucleus) sampling
- Min-p sampling
- Repeat penalty
- Frequency/presence penalty
- System prompt
- Seed (for reproducibility)
Configuration Storage:
- Database-backed persistence
- JSON serialization
- Import/export configurations
- Default configurations per model category
- Grid/list view of installed models
- Search and filter
- Model details (size, format, loaded status)
- Quick load/unload
- Delete models
ZERO DATA COLLECTION. No telemetry, analytics, crash reporting, or tracking of any kind.
- All conversations and chat history
- Generated images and files
- Speech synthesis and TTS audio
- Plugin execution results
- Model configurations
- User preferences
- RAG knowledge bases
- Memory vault contents
- Document parsing results
Storage Encryption:
- Algorithm: AES-256-GCM
- IV: 96-bit unique per-block
- Key Derivation: Android KeyStore hardware-backed
- Authentication: GCM auth tag for integrity verification
Vault Security:
- Write-Ahead Logging for crash recovery
- Content deduplication prevents re-encryption overhead
- Secure key migration on rotation
- Automatic encryption of all stored data
RAG Security:
- Optional encryption for RAG packets
- Admin password protection
- Read-only user access control
- Hardware-backed key storage
- Storage Access Framework (SAF): Load models via file picker
- Scoped Storage: Modern Android storage compliance
- No Broad Access: App cannot access arbitrary files
Fully open source. Audit the code or review community security assessments.
Minimum (Text Only):
- Android 12+ (API 31)
- 6GB RAM
- 4GB free storage
- ARM64 or x86_64 processor
Recommended (Text + Image + TTS + RAG):
- Android 13+
- 8GB RAM (12GB preferred)
- 10GB free storage
- Snapdragon 8 Gen 1 or equivalent
- Hardware-backed encryption support
Google Play Store (Recommended): Get it on Play Store
Direct APK: Download from GitHub Releases
- Open ToolNeuron
- Navigate to Model Store (drawer menu)
- Add HuggingFace repository:
- Example:
QuantFactory/Meta-Llama-3-8B-GGUF - Example:
bartowski/Phi-3.5-mini-instruct-GGUF
- Example:
- Browse models and tap to download
- Return to home screen and select your model
- Visit Hugging Face GGUF Models
- Download a model file (e.g.,
Llama-3-8B-Q4_K_M.gguf) - Open ToolNeuron
- Use model picker to load from file
- Grant file access via Android file picker
Recommended Models:
| Use Case | Model | Size | Description |
|---|---|---|---|
| Budget/Testing | TinyLlama-1.1B-Q4_K_M | 669MB | Fast, low resource |
| Balanced | Llama-3-8B-Q4_K_M | 4.5GB | Best quality/performance |
| Maximum Quality | Mistral-7B-Q6_K | 6GB | Highest quality 7B |
| Coding | DeepSeek-Coder-6.7B-Q4 | 4GB | Code generation |
| Medical | Bio-Medical-Llama-3-8B | 4.5GB | Healthcare domain |
- Launch ToolNeuron
- Select or import GGUF model (wait for loading progress)
- Model loads automatically (status bar shows progress)
- Start typing your prompt
- AI streams response in real-time
Pro Tips:
- Adjust temperature in model config (0.7 = balanced, 0.3 = focused, 1.0 = creative)
- Increase context size for longer conversations (but uses more RAM)
- Use system prompts to set AI behavior
- Download Stable Diffusion 1.5 model from HuggingFace:
- Search for "stable-diffusion-v1-5"
- Download
.safetensorsor.ckptfile
- Import into ToolNeuron via model picker
- Switch to Image generation mode (toggle in chat screen)
- Enter your prompt (e.g., "a serene mountain landscape at sunset, 4k, photorealistic")
- Optional: Add negative prompt (e.g., "blurry, low quality, distorted")
- Tap generate and wait 30-90 seconds
- Image appears in chat with save option
- Navigate to RAG menu (drawer)
- Tap "Create New RAG"
- Select "From File"
- Choose document type (PDF, Word, Excel, EPUB, TXT)
- Pick file via file picker
- Set RAG name and metadata (optional: enable encryption)
- Wait for document parsing and embedding
- RAG appears in RAG list
- Tap "Create New RAG" → "From Text"
- Paste or type your content
- Set metadata (name, category, tags)
- Tap "Create"
- System chunks and embeds automatically
- Enable desired RAGs in RAG management screen
- Return to chat
- RAG overlay button appears (tap to view retrieved context)
- AI automatically uses relevant RAG content in responses
- Retrieved chunks show in overlay for transparency
Core Stack:
- Language: Kotlin (Android), C++ (inference engines via JNI)
- UI Framework: Jetpack Compose (declarative UI)
- Text Inference: llama.cpp (GGUF engine)
- Image Inference: LocalDream (Stable Diffusion 1.5)
- Database: Room (SQLite) with AES-256-GCM encryption
- Async: Kotlin Coroutines + Flow
- Dependency Injection: Dagger Hilt
- Navigation: Jetpack Navigation Compose
- Serialization: Kotlinx Serialization + Gson
Custom Modules:
memory-vault: Encrypted storage with WAL and compressionneuron-packet: RAG packet format with encryption and access controlai_gguf-release.aar: Native GGUF inference libraryai_sd-release.aar: Native Stable Diffusion libraryai_supertonic_tts: On-device TTS with ONNX Runtime
GGUF Engine (GGUFEngine.kt):
- Native JNI bindings to llama.cpp
- Loading: File path or file descriptor (SAF/content:// URIs)
- Streaming: Token-by-token generation with
Flow<GenerationEvent> - Callbacks:
onToken,onToolCall,onDone,onError,onMetrics - Device Detection: Auto-detect device tier (LOW_END, MID_RANGE, HIGH_END)
- Optimization: Automatic thread/context recommendations
Diffusion Engine (DiffusionEngine.kt):
- Integration with StableDiffusionManager (LocalDream)
- NPU/CPU backend support
- Text embedding size configuration
- Pony model support
- Intermediate result streaming
- Safety checker toggle
Embedding Engine (EmbeddingEngine.kt):
- Model: all-MiniLM-L6-v2-Q5_K_M
- Dimensions: 768
- Operations: Single/batch embedding, normalization
- Auto-download on first use
TTS Engine (TTSManager.kt):
- Supertonic TTS with ONNX Runtime Android
- 10 voices (F1–F5, M1–M5), 5 languages
- Configurable denoising steps (1–8) and speed (0.5x–2.0x)
- Optional NNAPI hardware acceleration
- StateFlow-based reactive state management
Plugin Engine (PluginManager.kt):
- Plugin registration with tool schemas for LLM integration
- Grammar modes: LAZY (flexible) and STRICT (enforced JSON schema)
- Execution metrics tracking (timing, success/failure)
- Compose UI rendering per plugin result
- Enable/disable per plugin at runtime
Memory Vault:
- Block-based storage with headers
- Three-tier caching (L1 hot, L2 memory-mapped, L3 on-demand)
- Write-Ahead Logging for crash recovery
- LZ4 compression with ratio tracking
- SHA-256 content deduplication
- Full-text and semantic search indices
Database Schema:
- Models table (GGUF/SD metadata)
- ModelConfig table (loading + inference params)
- InstalledRAGs table (RAG metadata with status)
- Chat/Message tables (conversation history)
- DataStore (preferences)
Text Generation (8B Q4_K_M on flagship device):
- Model Load Time: 5-15 seconds
- First Token: 1-3 seconds
- Generation Speed: 8-15 tokens/sec
- Context Processing: 500+ tokens/sec
Image Generation (SD 1.5):
- Mid-Range (SD 8 Gen 1): 60-90 seconds
- Flagship (SD 8 Gen 3): 30-50 seconds
- Resolution: 512x512 (fastest), 1024x1024 (slower)
RAG Query:
- Single RAG (1000 chunks): < 100ms
- Multiple RAGs (5000+ chunks): 100-500ms
- Embedding generation: 50-200ms per chunk
Memory Usage:
- Idle: 200-500MB
- 8B Model Loaded: 5-6GB
- With RAGs: +100-500MB
- Vault Cache: 3-5MB
- Medical Professionals: HIPAA-compliant patient data handling
- Legal Professionals: Confidential document analysis
- Journalists: Protecting source anonymity
- Therapists: Private session notes and analysis
- Researchers: Sensitive data processing
- Anyone Valuing Digital Sovereignty
- Air travel (no WiFi required)
- Remote locations (rural, wilderness)
- Areas with unreliable internet
- Avoiding mobile data costs
- Military/government secure environments
- Knowledge Base Creation: Convert company docs, manuals, research papers into queryable RAGs
- Study Aid: Embed textbooks and lecture notes for AI-assisted learning
- Research: Query across multiple PDFs and papers simultaneously
- Legal Document Review: Search case law and contracts with semantic understanding
- Medical Reference: Embed medical literature for clinical decision support
- Writing and brainstorming assistance
- Code generation and debugging
- Image generation for content creation
- Learning and research
- Creative storytelling with AI
- Android Studio Ladybug (2024.2.1) or newer
- JDK 17
- Android SDK 36+
- Android NDK 26.x
- Git
# Clone repository
git clone https://github.com/Siddhesh2377/ToolNeuron.git
cd ToolNeuron
# Open in Android Studio
# File → Open → Select ToolNeuron folder
# Sync Gradle dependencies (Android Studio will prompt)
# Create local.properties (optional, for signing)
# Add: ALIAS="your_keystore_alias"
# Build release APK
./gradlew assembleRelease
# Install on connected device
./gradlew installRelease
# Or build debug version
./gradlew assembleDebug
./gradlew installDebug- Release APK:
app/build/outputs/apk/release/app-release.apk - Debug APK:
app/build/outputs/apk/debug/app-debug.apk
- NDK Issues: Ensure NDK 26.x is installed via SDK Manager
- JNI Build Failures: Check
local.propertieshas correct NDK path - Memory Issues: Increase Gradle heap size in
gradle.propertiesorg.gradle.jvmargs=-Xmx4096m
- ✅ Text generation with any GGUF model
- ✅ Image generation with SD 1.5
- ✅ HuggingFace repository integration
- ✅ Encrypted Memory Vault with WAL
- ✅ RAG system with document injection
- ✅ Secure RAG creation with encryption
- ✅ Document processing (PDF, Word, Excel, EPUB)
- ✅ Model configuration editor
- ✅ Concurrent model downloads
- ✅ Function calling support
- ✅ Inpainting support
- ✅ Text-to-Speech (10 voices, 5 languages, NNAPI, auto-speak)
- ✅ Plugin system with UI (web search, calculator, dev utils)
- ✅ Settings screen with persistent preferences
- ✅ On-demand TTS model loading
- 🚧 Speech-to-Text (STT) support
- 📋 Multi-modal support (vision models like LLaVA, BakLLaVA)
- 📋 Code execution plugin with sandboxing
- 📋 Advanced memory clustering and insights
- 📋 Conversation summarization
- 📋 Thread-based conversation organization
- Additional model formats (ONNX, TFLite, CoreML)
- Desktop companion app (Windows, macOS, Linux)
- Cloud sync with end-to-end encryption (optional)
- Plugin marketplace
- Advanced RAG features (graph-based reasoning)
| Feature | ToolNeuron | Cloud AI Apps | Other Local AI Apps |
|---|---|---|---|
| Text Generation | Any GGUF model | Cloud only | Limited models |
| Image Generation | SD 1.5 offline | Cloud only | Rare |
| Text-to-Speech | On-device, 10 voices | Cloud-based | Rare |
| Plugin System | Web search, calc, dev utils | Cloud-based | None |
| RAG System | Full offline RAG | Cloud-based | Basic or none |
| Document Processing | PDF/Word/Excel/EPUB | Cloud upload | Limited |
| Privacy | Complete offline | Server logging | Varies |
| Encryption | AES-256-GCM + WAL | N/A | Rare |
| Cost | Free (one-time) | $20-50+/month | Varies |
| Internet Required | No (after models) | Yes | Varies |
| Open Source | Apache 2.0 | Proprietary | Varies |
| Storage Permissions | Not needed (SAF) | N/A | Usually needed |
| Function Calling | Yes | Yes | Rare |
| Model Store | In-app HF browser | N/A | Manual download |
"The only LLM frontend capable of running 8B Q6 models on my hardware with lightspeed loading. I'm in military healthcare and privacy is critical. ToolNeuron is the only app that meets my requirements." — Senior Healthcare Professional, Netherlands
"I use ToolNeuron for legal document analysis. The RAG system with encrypted storage gives me confidence that client data stays confidential. No other app comes close." — Attorney, United States
"As a journalist, I can't risk my sources being exposed through cloud AI services. ToolNeuron's offline-first approach is exactly what I needed." — Investigative Journalist, Germany
Q: Does this really work completely offline? A: Yes. After downloading models and the embedding model, all AI processing (text generation, image generation, RAG queries, document parsing) happens entirely on your device with zero internet dependency.
Q: How much storage do I need? A: Minimum 4GB for a single 7B model. Recommended 10GB for multiple models, SD 1.5, and RAGs. Large setups with many models can use 20GB+.
Q: Will this drain my battery? A: Local AI is computationally intensive. During active generation, battery drain is significant. Keep your device charged during extended use. Idle usage is minimal.
Q: Is my data actually private? A: Yes. Nothing leaves your device. All processing is local. The code is open source - you can verify yourself or review community audits.
Q: Can I use custom models? A: Yes. Any GGUF text model works. For image generation, Stable Diffusion 1.5 checkpoints are supported (.safetensors or .ckpt).
Q: Why don't you need storage permissions? A: Android's Storage Access Framework (SAF) allows file picker access without broad storage permissions. Users explicitly select files, granting app access only to chosen files.
Q: Why is image generation slow? A: Stable Diffusion 1.5 requires 20-50 inference steps, each computationally expensive. Mobile hardware is slower than desktop GPUs. 30-90 seconds is normal and optimized.
Q: Can I run 13B or 70B models? A: Depends on device RAM. 13B Q4 needs ~10GB RAM (requires 12GB device). 70B models are impractical on current mobile hardware (need 40GB+ RAM).
Q: What quantization should I use? A: For 8B models:
- Q4_K_M: Best balance (4.5GB, good quality)
- Q5_K_S: Higher quality (5GB)
- Q6_K: Maximum quality (6GB, slower)
- Q2_K: Ultra-compressed (2.5GB, lower quality)
Q: Does RAG work without internet? A: Yes. RAG embedding and querying are completely offline after initial embedding model download (~100MB).
Q: How secure is the encryption? A: AES-256-GCM with hardware-backed keys (Android KeyStore) is military-grade encryption. On supported devices, keys are stored in Trusted Execution Environment (TEE) or Secure Element (SE), making extraction extremely difficult.
Q: Can I share encrypted RAGs?
A: Yes. Export RAG as .neuron packet with encryption enabled. Share the file and password separately. Recipients can import and decrypt with the password.
Q: How does Text-to-Speech work? A: TTS uses the Supertonic TTS engine running entirely on-device via ONNX Runtime. Download the TTS model from the Model Store or directly from the Settings screen under the TTS section, then tap the speak button on any assistant message. You can also enable auto-speak in Settings to have responses read aloud automatically.
Q: Can I use TTS offline? A: Yes. After downloading the TTS model (~100MB), all speech synthesis happens locally with no internet required.
Q: What do "denoising steps" control in TTS? A: Higher steps produce clearer, more natural speech but take longer to synthesize. The default of 2 steps provides a good balance. Increase to 4–8 for maximum quality.
Q: App crashes on model load? A: Likely out of memory. Try:
- Close other apps
- Use smaller model (Q4 instead of Q6, or 1B/3B instead of 7B/8B)
- Reduce context size in model config
- Restart device to free RAM
Q: Image generation fails or crashes? A: SD 1.5 requires significant RAM. Ensure:
- Device has 8GB+ RAM
- No other heavy apps running
- Try lower resolution (512x512)
- Restart app and try again
Q: Models download but won't load? A: Check:
- File is complete (compare size to HuggingFace listing)
- File is valid GGUF format
- Model isn't corrupted (re-download if suspicious)
- Sufficient RAM available
Q: RAG queries return no results? A: Verify:
- RAG is enabled in RAG management screen
- RAG loaded successfully (check status)
- Embedding model downloaded
- Query is semantically related to RAG content
Contributions are welcome! Focus areas:
- Bug fixes and stability improvements
- Performance optimizations (inference speed, memory usage)
- Device compatibility testing (especially mid-range devices)
- Documentation improvements and translations
- UI/UX enhancements (accessibility, dark theme refinements)
- Speech-to-Text (STT) integration
- Multi-modal model support
- Additional plugins and plugin marketplace
- Additional document format support
- Advanced RAG features
- Fork the repository
- Create feature branch:
git checkout -b feature/your-feature-name - Make changes with clear, focused commits
- Test on real devices (emulators don't reflect real performance)
- Write clear commit messages explaining "why" not just "what"
- Submit Pull Request with description of changes and testing done
- Follow Kotlin coding conventions
- Use meaningful variable/function names
- Comment complex logic
- Prefer immutability where possible
- Use Jetpack Compose best practices for UI
- Test on multiple devices (low-end, mid-range, flagship)
- Verify memory usage doesn't regress
- Test offline functionality
- Check encryption/decryption operations
- Validate UI on different screen sizes
Apache License 2.0
See LICENSE for full details.
- ✅ Commercial use permitted
- ✅ Modification permitted
- ✅ Distribution permitted
- ✅ Patent use permitted
- ✅ Private use permitted
⚠️ Trademark use not permitted⚠️ Liability and warranty disclaimer
Use ToolNeuron in commercial products, modify it, distribute it, all without restrictions. Attribution appreciated but not required.
ToolNeuron stands on the shoulders of giants:
- llama.cpp - Efficient LLM inference by Georgi Gerganov
- LocalDream - Stable Diffusion on Android
- Jetpack Compose - Modern Android UI framework
- ONNX Runtime - TTS model inference
- Jsoup - HTML parsing for web search plugin
- Apache POI - Microsoft document parsing
- PDFBox-Android - PDF processing
- EpubLib - EPUB support
- Room - Database abstraction
- Dagger Hilt - Dependency injection
- OkHttp - HTTP client
- Retrofit - Type-safe HTTP client
- Privacy-first movement
- Open source community
- User feedback and feature requests
- Discord Community: Join Server - Active community for questions, tips, and discussions
- GitHub Issues: Report Bug/Request Feature
- Email: siddheshsonar2377@gmail.com
- ⭐ Star this repository if you find it useful
- 🐛 Report bugs to help improve stability
- 💡 Suggest features to guide development
- 📖 Improve documentation for better onboarding
- 🔧 Contribute code to add features
- 💬 Help others in Discord community
If you discover a security vulnerability, please:
- Do NOT open a public GitHub issue
- Email siddheshsonar2377@gmail.com with details
- Include steps to reproduce
- Allow reasonable time for fix before public disclosure
We take security seriously and will respond promptly.
Built with ❤️ by Siddhesh Sonar
Privacy-first AI for everyone. Own your data. Own your AI.
⭐ Star this repository · 🐛 Report Bug · 💡 Request Feature · 💬 Join Discord
Installation · Quick Start · Features · RAG System · Memory Vault · FAQ
License: Apache 2.0 · Version: 1.2.0 · Platform: Android 12+