ToolNeuron

Privacy-First AI Assistant for Android - Complete On-Device Intelligence

ToolNeuron is the most advanced offline-first AI assistant for Android, featuring complete on-device processing with enterprise-grade encryption, intelligent document understanding through RAG (Retrieval-Augmented Generation), text-to-speech, an extensible plugin system, and sophisticated memory management. Your data never leaves your device. No cloud dependencies. No subscriptions. True digital sovereignty.

Download APK · Join Discord · Report Issue

Why ToolNeuron?

Complete Privacy: Hardware-backed AES-256-GCM encryption. Zero telemetry. All processing happens on your device.

Sophisticated RAG System: Inject and query documents (PDF, Word, Excel, EPUB) with semantic search and encrypted knowledge bases.

Secure Memory Vault: Crash-recoverable encrypted storage with Write-Ahead Logging, LZ4 compression, and content deduplication.

Offline-First: Works completely offline after model downloads. No internet required for AI inference.

On-Device TTS: Text-to-speech with 10 voices, 5 languages, adjustable speed and quality — all processed locally.

Plugin System: Built-in web search, calculator, and developer utilities — extensible with custom plugins.

Advanced Features: Function calling, multi-modal generation, customizable inference parameters, and concurrent model downloads.

Features Overview

Core Capabilities

Feature	Description
Text Generation	Run any GGUF model locally (Llama, Mistral, Gemma, Phi, etc.) with streaming output
Image Generation	Stable Diffusion 1.5 with censored & uncensored variants, inpainting support
Text-to-Speech	On-device TTS with 10 voices, 5 languages, adjustable speed and denoising steps
Plugin System	Web search, calculator, and dev utils plugins with tool calling integration
RAG System	Document injection with semantic search, encrypted knowledge bases, multi-source support
Memory Vault	Hardware-backed AES-256-GCM encryption, WAL crash recovery, LZ4 compression
Document Processing	Parse PDF, Word (.doc/.docx), Excel (.xls/.xlsx), EPUB, and plain text
Model Store	Browse and download models from HuggingFace repositories in-app
Function Calling	Tool/function calling with grammar-based optimization
Secure Storage	Content deduplication, three-tier caching, automatic defragmentation
No Permissions	Load models without storage permissions using Android SAF

Text Generation

Model Support

Format: Any GGUF model (Llama 3, Mistral, Gemma, Phi, Qwen, etc.)
Size Range: 500MB (1B models) to 20GB+ (70B models)
Quantization: All GGUF quantizations supported (Q2_K, Q4_K_M, Q5_K_S, Q6_K, Q8_0, F16, etc.)
Model Categories: General, Medical, Research, Coding, Uncensored, Business, Cybersecurity

Performance

Device Tier	RAM	Model Size	Speed
Budget	6GB	1-3B Q4	2-4 tokens/sec
Mid-Range	8GB	7-8B Q4	4-8 tokens/sec
Flagship	12GB+	8B Q6	8-15 tokens/sec

Reports from users: 7-second response times for 8B Q6 models on flagship devices

Advanced Features

Streaming Output: Real-time token-by-token generation with Flow<GenerationEvent>
Custom Parameters: Temperature, top-k, top-p, min-p, repeat penalty, context length
System Prompts: Configure per-model system prompts
Function Calling: Tool/function calling with grammar-based JSON schema enforcement
Model Configuration: Save and manage configurations per model
Device Optimization: Auto-detect device tier and recommend optimal parameters
Memory Management: Memory-mapped model loading, automatic RAM optimization

Supported Callbacks

onToken(token: String)           // Real-time token streaming
onToolCall(toolCall: ToolCall)   // Function call detection
onDone(metrics: Metrics)         // Generation completion
onError(error: Throwable)        // Error handling
onMetrics(metrics: Metrics)      // Performance metrics

Image Generation

Stable Diffusion 1.5

Models: Censored and uncensored variants
Engine: LocalDream integration with NPU/CPU support
Generation Time: 30-90 seconds depending on device hardware

Capabilities

Text-to-Image: Generate images from text prompts
Inpainting: Edit specific regions with mask support
Custom Parameters:
- Resolution: 512x512, 768x768, 1024x1024
- Steps: 10-50 inference steps
- CFG Scale: Prompt adherence control
- Seed: Reproducible generation
- Negative Prompts: Exclude unwanted elements
- Denoise Strength: Inpainting intensity
- Schedulers: DPM and other scheduler support

Advanced Features

Intermediate Results: View generation progress with intermediate images
Safety Checker: Optional NSFW content filtering
Pony Model Support: Specialized anime/cartoon models
Backend Control: Start, stop, restart generation backend
State Monitoring: Real-time backend and generation state tracking

Text-to-Speech (TTS)

On-device speech synthesis powered by Supertonic TTS. All processing happens locally — no cloud APIs, no data leaves your device.

Voice Options

10 Voices: 5 female (F1–F5) and 5 male (M1–M5)
5 Languages: English, Korean, Spanish, Portuguese, French
Speed Control: 0.5x to 2.0x playback speed
Denoising Steps: 1–8 steps (higher = better quality, slower synthesis)

Features

Auto-speak: Automatically read assistant responses aloud after generation
On-demand Loading: TTS model loads automatically on first use if not preloaded
Load on App Start: Optionally preload the TTS model at launch for instant speech
NNAPI Acceleration: Hardware acceleration on supported devices
Playback Controls: Play, pause, resume, stop with real-time synthesis progress
Per-message TTS: Tap the speak button on any assistant message to hear it

Settings

All TTS preferences are persisted and configurable from the Settings screen:

Voice, language, speed, denoising steps
Auto-speak toggle
NNAPI hardware acceleration toggle
Load on app start toggle

Plugin System

Extensible plugin architecture that integrates with LLM tool calling. Plugins execute locally and render custom UI for results.

Built-in Plugins

Web Search

Engine: DuckDuckGo search with configurable result count (5–10)
Web Scraping: CSS selector-based content extraction from search results
Safe Search: Optional safe search filtering
Custom UI: Rich display of search results and scraped content

Calculator

Expressions: Supports +, -, *, /, ^, %, parentheses
Functions: sqrt, sin, cos, tan, asin, acos, atan, log, log10, ln, abs, ceil, floor, round
Constants: pi, e
Unit Conversion: Length (m, km, mi, ft, in...), weight (kg, lb, oz...), time (s, min, h, day), data (b, kb, mb, gb, tb), temperature (C, F, K)

Dev Utils

Text Transforms: uppercase, lowercase, reverse, title_case, snake_case, camel_case, trim
Hashing: MD5, SHA-1, SHA-256, SHA-512
UUID Generation: Bulk generation up to 10 at once
Text Statistics: Character, word, line, and sentence counts
JSON: Formatting and validation
Base64: Encoding and decoding

Plugin Architecture

Tool Calling Integration: Plugins register as tools the LLM can invoke via grammar-based JSON schema enforcement (LAZY/STRICT modes)
Custom UI Rendering: Each plugin provides Compose UI for displaying results
Enable/Disable: Toggle individual plugins from the Settings screen
Execution Metrics: Tracks execution time, success/failure per plugin call

RAG System (Document Intelligence)

The RAG (Retrieval-Augmented Generation) system enables Tool-Neuron to inject external knowledge into conversations, allowing AI to answer questions based on your documents with semantic understanding.

RAG Creation Methods

1. From Text

Create knowledge bases from plain text input:

- Paste or type text content
- System chunks and embeds automatically
- Instant semantic search capability

2. From Files

Parse and embed documents:

Supported Formats: PDF, Word (.doc/.docx), Excel (.xls/.xlsx), EPUB, TXT
Multi-Sheet Excel: Each sheet embedded separately with metadata
Table Extraction: Word tables preserved with structure
Automatic Chunking: Intelligent text segmentation
Metadata Tracking: File name, MIME type, source tracking

3. From Chat History

Convert conversations into queryable knowledge:

Export chat history as RAG
Enable AI to reference past conversations
Preserve conversation context across sessions

4. From Neuron Packets

Import pre-built RAG files:

.neuron packet format
Encrypted RAG sharing
Version control and metadata

5. Secure RAG Creation

Enterprise-grade encrypted knowledge bases:

Admin Password Protection: Master password for RAG access
Read-Only Users: Grant limited access without admin privileges
Hardware-Backed Encryption: AES-256-GCM with Android KeyStore
User Management: Add/remove read-only users
Access Control: Fine-grained permission system

RAG Features

Query System:

Semantic Search: Embedding-based similarity search (cosine similarity)
Top-K Results: Return most relevant chunks
Context Injection: Automatically augment prompts with relevant knowledge
Multi-RAG Support: Query across multiple loaded RAGs simultaneously

RAG Management:

Enable/Disable: Control which RAGs are active for queries
Lazy Loading: Load RAGs into memory on demand
Status Tracking: INSTALLED, LOADED, LOADING, ERROR states
Metadata: Domain, language, version, tags, embedding model info
Size Management: Track RAG file size, compression ratio
Delete/Export: Remove or share RAG files

Loading Modes:

Embedded: RAG stored within app data (persistent)
Transient: Temporary loading from external files

NeuronGraph Integration:

Node-based knowledge representation
Graph traversal for related concepts
Serialization/deserialization support

Embedding Engine

Model: all-MiniLM-L6-v2-Q5_K_M (768-dimensional embeddings)
Auto-Download: Fetches embedding model from HuggingFace on first use
Batch Processing: Efficient batch embedding generation
Normalization: Optional L2 normalization for cosine similarity

RAG UI Features

RAG Overlay: Transparent overlay shows retrieved context during chat
RAG Data Explorer: Browse all chunks, edit metadata, view embeddings
RAG Statistics: Size, chunk count, embedding coverage
Search & Filter: Full-text search within RAGs
Category & Tag Management: Organize RAG content

Memory Vault (Secure Storage)

The Memory Vault is Tool-Neuron's sophisticated encrypted storage system, providing crash-recoverable, compressed, deduplicated storage with enterprise-grade security.

Core Architecture

Hardware-Backed Encryption:

Algorithm: AES-256-GCM with 96-bit IV
Key Storage: Android KeyStore (hardware-backed on supported devices)
Key Migration: Automatic re-encryption on key rotation
Auth-Tagged: GCM mode provides authentication and integrity

Write-Ahead Logging (WAL):

Crash Recovery: Automatic recovery from crashes/power loss
Transaction Safety: ACID-compliant operations
Checkpoint System: Periodic index checkpointing
Rollback Support: Restore from checkpoints on corruption

LZ4 Compression:

Fast Compression: Real-time compression/decompression
Ratio Tracking: Monitor compression efficiency
Block-Level: Compress individual blocks for efficient I/O
Configurable: Adjust compression level

Content Deduplication:

SHA-256 Hashing: Identify duplicate content
Reference Counting: Track shared content usage
Automatic Cleanup: Remove unreferenced blocks
Storage Efficiency: Reduce redundant encrypted data

Data Types

Messages

Full-text indexed conversation messages
Tokenization for search
Timestamp tracking
Category and tag support

Files

Binary file storage with MIME type tracking
Image, document, and arbitrary file support
Metadata preservation
Content deduplication

Embeddings

768-dimensional vector storage
Semantic search with cosine similarity
Batch embedding support
Normalization options

Custom Data

JSON-serialized custom structures
Schema-flexible storage
Queryable metadata

Caching System

Three-Tier Architecture:

L1 Hot Cache: In-memory cache for frequently accessed items (< 1MB)
L2 Memory-Mapped: Memory-mapped file access for warm data (< 5MB)
L3 On-Demand: Disk-based access for cold data

Cache Metrics:

Hit/miss rates
Eviction tracking
Memory usage monitoring
Performance optimization

Storage Operations

Search Capabilities:

Full-Text Search: Tokenized text search across messages
Semantic Search: Embedding-based similarity search
Category Filter: Filter by predefined categories
Tag Filter: Multi-tag filtering support
Time Range: Search within date/time ranges
Content Type Filter: Filter by data type

Maintenance:

Defragmentation: Reclaim wasted space from deleted items
Index Rebuilding: Reconstruct search indices
Validation: Integrity checking and corruption detection
Backup: Export vault with compression
Restore: Import from backup files

Vault Statistics

Monitor storage health:

Total Items: Count by type (messages, files, embeddings, custom)
Size Metrics: Compressed vs uncompressed sizes
Compression Ratio: Efficiency tracking
Wasted Space: Identify fragmentation
Time Range: Earliest and latest item timestamps
Content Type Breakdown: Distribution across data types

Vault UI Features

Vault Dashboard: Overview of all vault contents
Statistics Screen: Detailed metrics and graphs
Data Explorer: Browse, search, filter all items
Metadata Editor: Edit categories, tags, search text
User Management: Manage vault access credentials (admin, read-only)
Logger Screen: Debug logs with operation timing and encryption metrics

Document Processing

Comprehensive document parsing with format detection and content extraction.

Supported Formats

PDF

Engine: PDFBox-Android
Capabilities: Text extraction, metadata parsing
Streaming: Efficient I/O for large files

EPUB

Engine: EpubLib
Capabilities: E-book text extraction, chapter navigation
HTML Cleanup: Automatic tag removal for plain text

Microsoft Word

Formats: .docx (Office Open XML), .doc (binary)
Engine: Apache POI
Capabilities:
- Paragraph extraction
- Table parsing with structure preservation
- Formatting metadata

Microsoft Excel

Formats: .xlsx (Office Open XML), .xls (binary)
Engine: Apache POI
Capabilities:
- Multi-sheet support with sheet names
- Cell type detection (string, numeric, boolean, formula)
- Formula evaluation
- Comprehensive cell formatting

Plain Text

Encodings: UTF-8, UTF-16, ASCII
Line Ending Support: Unix, Windows, Mac

Processing Features

MIME Type Detection: Automatic format detection with fallback to file extension
Error Handling: Informative error messages on parse failures
Progress Tracking: Real-time parsing progress for large documents
Metadata Extraction: Title, author, creation date, modification date
Logging: Comprehensive debug logging for troubleshooting

Model Management

Model Store

In-App HuggingFace Integration:

Browse HuggingFace model repositories
Search with filters (model type, size, tags)
Add custom repositories by username/org
View model metadata (size, quantization, tags, downloads)

Download Management:

Concurrent Downloads: Download multiple models simultaneously
Progress Tracking: Real-time progress notifications
WorkManager Integration: Robust background task management
Resume Capability: Resume interrupted downloads
Foreground Service: Persistent downloads (Android 14+ compliant)

Model Categories:

General: General-purpose conversational models
Medical: Healthcare and medical domain models
Research: Academic and research-focused models
Coding: Programming and code generation models
Uncensored: Unfiltered and uncensored models
Business: Professional and business domain models
Cybersecurity: Security and penetration testing models

Model Configuration

Per-Model Settings:

Loading Parameters:

Thread count (auto-detect or manual)
Context size (512 to 32768+ tokens)
Quantization options
GPU layers (if supported)

Inference Parameters:

Temperature (0.0 - 2.0)
Top-k sampling
Top-p (nucleus) sampling
Min-p sampling
Repeat penalty
Frequency/presence penalty
System prompt
Seed (for reproducibility)

Configuration Storage:

Database-backed persistence
JSON serialization
Import/export configurations
Default configurations per model category

Model Picker

Grid/list view of installed models
Search and filter
Model details (size, format, loaded status)
Quick load/unload
Delete models

Privacy & Security

Data Collection

ZERO DATA COLLECTION. No telemetry, analytics, crash reporting, or tracking of any kind.

What Stays Local

All conversations and chat history
Generated images and files
Speech synthesis and TTS audio
Plugin execution results
Model configurations
User preferences
RAG knowledge bases
Memory vault contents
Document parsing results

Encryption Details

Storage Encryption:

Algorithm: AES-256-GCM
IV: 96-bit unique per-block
Key Derivation: Android KeyStore hardware-backed
Authentication: GCM auth tag for integrity verification

Vault Security:

Write-Ahead Logging for crash recovery
Content deduplication prevents re-encryption overhead
Secure key migration on rotation
Automatic encryption of all stored data

RAG Security:

Optional encryption for RAG packets
Admin password protection
Read-only user access control
Hardware-backed key storage

No Permissions Required

Storage Access Framework (SAF): Load models via file picker
Scoped Storage: Modern Android storage compliance
No Broad Access: App cannot access arbitrary files

Verification

Fully open source. Audit the code or review community security assessments.

Installation

System Requirements

Minimum (Text Only):

Android 12+ (API 31)
6GB RAM
4GB free storage
ARM64 or x86_64 processor

Recommended (Text + Image + TTS + RAG):

Android 13+
8GB RAM (12GB preferred)
10GB free storage
Snapdragon 8 Gen 1 or equivalent
Hardware-backed encryption support

Download

Google Play Store (Recommended): Get it on Play Store

Direct APK: Download from GitHub Releases

Quick Start

1. Get Models

Option A: In-App Model Store (Recommended)

Open ToolNeuron
Navigate to Model Store (drawer menu)
Add HuggingFace repository:
- Example: QuantFactory/Meta-Llama-3-8B-GGUF
- Example: bartowski/Phi-3.5-mini-instruct-GGUF
Browse models and tap to download
Return to home screen and select your model

Option B: Manual Download

Visit Hugging Face GGUF Models
Download a model file (e.g., Llama-3-8B-Q4_K_M.gguf)
Open ToolNeuron
Use model picker to load from file
Grant file access via Android file picker

Recommended Models:

Use Case	Model	Size	Description
Budget/Testing	TinyLlama-1.1B-Q4_K_M	669MB	Fast, low resource
Balanced	Llama-3-8B-Q4_K_M	4.5GB	Best quality/performance
Maximum Quality	Mistral-7B-Q6_K	6GB	Highest quality 7B
Coding	DeepSeek-Coder-6.7B-Q4	4GB	Code generation
Medical	Bio-Medical-Llama-3-8B	4.5GB	Healthcare domain

2. Generate Text

Launch ToolNeuron
Select or import GGUF model (wait for loading progress)
Model loads automatically (status bar shows progress)
Start typing your prompt
AI streams response in real-time

Pro Tips:

Adjust temperature in model config (0.7 = balanced, 0.3 = focused, 1.0 = creative)
Increase context size for longer conversations (but uses more RAM)
Use system prompts to set AI behavior

3. Generate Images

Download Stable Diffusion 1.5 model from HuggingFace:
- Search for "stable-diffusion-v1-5"
- Download .safetensors or .ckpt file
Import into ToolNeuron via model picker
Switch to Image generation mode (toggle in chat screen)
Enter your prompt (e.g., "a serene mountain landscape at sunset, 4k, photorealistic")
Optional: Add negative prompt (e.g., "blurry, low quality, distorted")
Tap generate and wait 30-90 seconds
Image appears in chat with save option

4. Create RAG Knowledge Base

From Documents

Navigate to RAG menu (drawer)
Tap "Create New RAG"
Select "From File"
Choose document type (PDF, Word, Excel, EPUB, TXT)
Pick file via file picker
Set RAG name and metadata (optional: enable encryption)
Wait for document parsing and embedding
RAG appears in RAG list

From Text

Tap "Create New RAG" → "From Text"
Paste or type your content
Set metadata (name, category, tags)
Tap "Create"
System chunks and embeds automatically

Using RAGs in Chat

Enable desired RAGs in RAG management screen
Return to chat
RAG overlay button appears (tap to view retrieved context)
AI automatically uses relevant RAG content in responses
Retrieved chunks show in overlay for transparency

Technical Details

Architecture

Core Stack:

Language: Kotlin (Android), C++ (inference engines via JNI)
UI Framework: Jetpack Compose (declarative UI)
Text Inference: llama.cpp (GGUF engine)
Image Inference: LocalDream (Stable Diffusion 1.5)
Database: Room (SQLite) with AES-256-GCM encryption
Async: Kotlin Coroutines + Flow
Dependency Injection: Dagger Hilt
Navigation: Jetpack Navigation Compose
Serialization: Kotlinx Serialization + Gson

Custom Modules:

memory-vault: Encrypted storage with WAL and compression
neuron-packet: RAG packet format with encryption and access control
ai_gguf-release.aar: Native GGUF inference library
ai_sd-release.aar: Native Stable Diffusion library
ai_supertonic_tts: On-device TTS with ONNX Runtime

Inference Engines

GGUF Engine (GGUFEngine.kt):

Native JNI bindings to llama.cpp
Loading: File path or file descriptor (SAF/content:// URIs)
Streaming: Token-by-token generation with Flow<GenerationEvent>
Callbacks: onToken, onToolCall, onDone, onError, onMetrics
Device Detection: Auto-detect device tier (LOW_END, MID_RANGE, HIGH_END)
Optimization: Automatic thread/context recommendations

Diffusion Engine (DiffusionEngine.kt):

Integration with StableDiffusionManager (LocalDream)
NPU/CPU backend support
Text embedding size configuration
Pony model support
Intermediate result streaming
Safety checker toggle

Embedding Engine (EmbeddingEngine.kt):

Model: all-MiniLM-L6-v2-Q5_K_M
Dimensions: 768
Operations: Single/batch embedding, normalization
Auto-download on first use

TTS Engine (TTSManager.kt):

Supertonic TTS with ONNX Runtime Android
10 voices (F1–F5, M1–M5), 5 languages
Configurable denoising steps (1–8) and speed (0.5x–2.0x)
Optional NNAPI hardware acceleration
StateFlow-based reactive state management

Plugin Engine (PluginManager.kt):

Plugin registration with tool schemas for LLM integration
Grammar modes: LAZY (flexible) and STRICT (enforced JSON schema)
Execution metrics tracking (timing, success/failure)
Compose UI rendering per plugin result
Enable/disable per plugin at runtime

Storage System

Memory Vault:

Block-based storage with headers
Three-tier caching (L1 hot, L2 memory-mapped, L3 on-demand)
Write-Ahead Logging for crash recovery
LZ4 compression with ratio tracking
SHA-256 content deduplication
Full-text and semantic search indices

Database Schema:

Models table (GGUF/SD metadata)
ModelConfig table (loading + inference params)
InstalledRAGs table (RAG metadata with status)
Chat/Message tables (conversation history)
DataStore (preferences)

Performance Benchmarks

Text Generation (8B Q4_K_M on flagship device):

Model Load Time: 5-15 seconds
First Token: 1-3 seconds
Generation Speed: 8-15 tokens/sec
Context Processing: 500+ tokens/sec

Image Generation (SD 1.5):

Mid-Range (SD 8 Gen 1): 60-90 seconds
Flagship (SD 8 Gen 3): 30-50 seconds
Resolution: 512x512 (fastest), 1024x1024 (slower)

RAG Query:

Single RAG (1000 chunks): < 100ms
Multiple RAGs (5000+ chunks): 100-500ms
Embedding generation: 50-200ms per chunk

Memory Usage:

Idle: 200-500MB
8B Model Loaded: 5-6GB
With RAGs: +100-500MB
Vault Cache: 3-5MB

Use Cases

Privacy-Critical Applications

Medical Professionals: HIPAA-compliant patient data handling
Legal Professionals: Confidential document analysis
Journalists: Protecting source anonymity
Therapists: Private session notes and analysis
Researchers: Sensitive data processing
Anyone Valuing Digital Sovereignty

Offline Scenarios

Air travel (no WiFi required)
Remote locations (rural, wilderness)
Areas with unreliable internet
Avoiding mobile data costs
Military/government secure environments

Document Intelligence

Knowledge Base Creation: Convert company docs, manuals, research papers into queryable RAGs
Study Aid: Embed textbooks and lecture notes for AI-assisted learning
Research: Query across multiple PDFs and papers simultaneously
Legal Document Review: Search case law and contracts with semantic understanding
Medical Reference: Embed medical literature for clinical decision support

Creative & Development

Writing and brainstorming assistance
Code generation and debugging
Image generation for content creation
Learning and research
Creative storytelling with AI

Building from Source

Prerequisites

Android Studio Ladybug (2024.2.1) or newer
JDK 17
Android SDK 36+
Android NDK 26.x
Git

Steps

# Clone repository
git clone https://github.com/Siddhesh2377/ToolNeuron.git
cd ToolNeuron

# Open in Android Studio
# File → Open → Select ToolNeuron folder

# Sync Gradle dependencies (Android Studio will prompt)

# Create local.properties (optional, for signing)
# Add: ALIAS="your_keystore_alias"

# Build release APK
./gradlew assembleRelease

# Install on connected device
./gradlew installRelease

# Or build debug version
./gradlew assembleDebug
./gradlew installDebug

Build Output

Release APK: app/build/outputs/apk/release/app-release.apk
Debug APK: app/build/outputs/apk/debug/app-debug.apk

Troubleshooting

NDK Issues: Ensure NDK 26.x is installed via SDK Manager
JNI Build Failures: Check local.properties has correct NDK path
Memory Issues: Increase Gradle heap size in gradle.properties
```
org.gradle.jvmargs=-Xmx4096m
```

Roadmap

Version 1.2.0 (Current - January 2026)

✅ Text generation with any GGUF model
✅ Image generation with SD 1.5
✅ HuggingFace repository integration
✅ Encrypted Memory Vault with WAL
✅ RAG system with document injection
✅ Secure RAG creation with encryption
✅ Document processing (PDF, Word, Excel, EPUB)
✅ Model configuration editor
✅ Concurrent model downloads
✅ Function calling support
✅ Inpainting support
✅ Text-to-Speech (10 voices, 5 languages, NNAPI, auto-speak)
✅ Plugin system with UI (web search, calculator, dev utils)
✅ Settings screen with persistent preferences
✅ On-demand TTS model loading

Version 1.3 (Q2 2026)

🚧 Speech-to-Text (STT) support
📋 Multi-modal support (vision models like LLaVA, BakLLaVA)
📋 Code execution plugin with sandboxing
📋 Advanced memory clustering and insights
📋 Conversation summarization
📋 Thread-based conversation organization

Version 1.4 (Q3 2026)

Additional model formats (ONNX, TFLite, CoreML)
Desktop companion app (Windows, macOS, Linux)
Cloud sync with end-to-end encryption (optional)
Plugin marketplace
Advanced RAG features (graph-based reasoning)

Comparison

Feature	ToolNeuron	Cloud AI Apps	Other Local AI Apps
Text Generation	Any GGUF model	Cloud only	Limited models
Image Generation	SD 1.5 offline	Cloud only	Rare
Text-to-Speech	On-device, 10 voices	Cloud-based	Rare
Plugin System	Web search, calc, dev utils	Cloud-based	None
RAG System	Full offline RAG	Cloud-based	Basic or none
Document Processing	PDF/Word/Excel/EPUB	Cloud upload	Limited
Privacy	Complete offline	Server logging	Varies
Encryption	AES-256-GCM + WAL	N/A	Rare
Cost	Free (one-time)	$20-50+/month	Varies
Internet Required	No (after models)	Yes	Varies
Open Source	Apache 2.0	Proprietary	Varies
Storage Permissions	Not needed (SAF)	N/A	Usually needed
Function Calling	Yes	Yes	Rare
Model Store	In-app HF browser	N/A	Manual download

Community Testimonials

"The only LLM frontend capable of running 8B Q6 models on my hardware with lightspeed loading. I'm in military healthcare and privacy is critical. ToolNeuron is the only app that meets my requirements." — Senior Healthcare Professional, Netherlands

"I use ToolNeuron for legal document analysis. The RAG system with encrypted storage gives me confidence that client data stays confidential. No other app comes close." — Attorney, United States

"As a journalist, I can't risk my sources being exposed through cloud AI services. ToolNeuron's offline-first approach is exactly what I needed." — Investigative Journalist, Germany

FAQ

General

Q: Does this really work completely offline? A: Yes. After downloading models and the embedding model, all AI processing (text generation, image generation, RAG queries, document parsing) happens entirely on your device with zero internet dependency.

Q: How much storage do I need? A: Minimum 4GB for a single 7B model. Recommended 10GB for multiple models, SD 1.5, and RAGs. Large setups with many models can use 20GB+.

Q: Will this drain my battery? A: Local AI is computationally intensive. During active generation, battery drain is significant. Keep your device charged during extended use. Idle usage is minimal.

Q: Is my data actually private? A: Yes. Nothing leaves your device. All processing is local. The code is open source - you can verify yourself or review community audits.

Q: Can I use custom models? A: Yes. Any GGUF text model works. For image generation, Stable Diffusion 1.5 checkpoints are supported (.safetensors or .ckpt).

Technical

Q: Why don't you need storage permissions? A: Android's Storage Access Framework (SAF) allows file picker access without broad storage permissions. Users explicitly select files, granting app access only to chosen files.

Q: Why is image generation slow? A: Stable Diffusion 1.5 requires 20-50 inference steps, each computationally expensive. Mobile hardware is slower than desktop GPUs. 30-90 seconds is normal and optimized.

Q: Can I run 13B or 70B models? A: Depends on device RAM. 13B Q4 needs ~10GB RAM (requires 12GB device). 70B models are impractical on current mobile hardware (need 40GB+ RAM).

Q: What quantization should I use? A: For 8B models:

Q4_K_M: Best balance (4.5GB, good quality)
Q5_K_S: Higher quality (5GB)
Q6_K: Maximum quality (6GB, slower)
Q2_K: Ultra-compressed (2.5GB, lower quality)

Q: Does RAG work without internet? A: Yes. RAG embedding and querying are completely offline after initial embedding model download (~100MB).

Q: How secure is the encryption? A: AES-256-GCM with hardware-backed keys (Android KeyStore) is military-grade encryption. On supported devices, keys are stored in Trusted Execution Environment (TEE) or Secure Element (SE), making extraction extremely difficult.

Q: Can I share encrypted RAGs? A: Yes. Export RAG as .neuron packet with encryption enabled. Share the file and password separately. Recipients can import and decrypt with the password.

Q: How does Text-to-Speech work? A: TTS uses the Supertonic TTS engine running entirely on-device via ONNX Runtime. Download the TTS model from the Model Store or directly from the Settings screen under the TTS section, then tap the speak button on any assistant message. You can also enable auto-speak in Settings to have responses read aloud automatically.

Q: Can I use TTS offline? A: Yes. After downloading the TTS model (~100MB), all speech synthesis happens locally with no internet required.

Q: What do "denoising steps" control in TTS? A: Higher steps produce clearer, more natural speech but take longer to synthesize. The default of 2 steps provides a good balance. Increase to 4–8 for maximum quality.

Troubleshooting

Q: App crashes on model load? A: Likely out of memory. Try:

Close other apps
Use smaller model (Q4 instead of Q6, or 1B/3B instead of 7B/8B)
Reduce context size in model config
Restart device to free RAM

Q: Image generation fails or crashes? A: SD 1.5 requires significant RAM. Ensure:

Device has 8GB+ RAM
No other heavy apps running
Try lower resolution (512x512)
Restart app and try again

Q: Models download but won't load? A: Check:

File is complete (compare size to HuggingFace listing)
File is valid GGUF format
Model isn't corrupted (re-download if suspicious)
Sufficient RAM available

Q: RAG queries return no results? A: Verify:

RAG is enabled in RAG management screen
RAG loaded successfully (check status)
Embedding model downloaded
Query is semantically related to RAG content

Contributing

Contributions are welcome! Focus areas:

Priority Areas

Bug fixes and stability improvements
Performance optimizations (inference speed, memory usage)
Device compatibility testing (especially mid-range devices)
Documentation improvements and translations
UI/UX enhancements (accessibility, dark theme refinements)

Development Areas

Speech-to-Text (STT) integration
Multi-modal model support
Additional plugins and plugin marketplace
Additional document format support
Advanced RAG features

Process

Fork the repository
Create feature branch: git checkout -b feature/your-feature-name
Make changes with clear, focused commits
Test on real devices (emulators don't reflect real performance)
Write clear commit messages explaining "why" not just "what"
Submit Pull Request with description of changes and testing done

Code Style

Follow Kotlin coding conventions
Use meaningful variable/function names
Comment complex logic
Prefer immutability where possible
Use Jetpack Compose best practices for UI

Testing

Test on multiple devices (low-end, mid-range, flagship)
Verify memory usage doesn't regress
Test offline functionality
Check encryption/decryption operations
Validate UI on different screen sizes

License

Apache License 2.0

See LICENSE for full details.

What This Means

✅ Commercial use permitted
✅ Modification permitted
✅ Distribution permitted
✅ Patent use permitted
✅ Private use permitted
⚠️ Trademark use not permitted
⚠️ Liability and warranty disclaimer

Use ToolNeuron in commercial products, modify it, distribute it, all without restrictions. Attribution appreciated but not required.

Acknowledgments

ToolNeuron stands on the shoulders of giants:

Core Technologies

llama.cpp - Efficient LLM inference by Georgi Gerganov
LocalDream - Stable Diffusion on Android
Jetpack Compose - Modern Android UI framework

Libraries

ONNX Runtime - TTS model inference
Jsoup - HTML parsing for web search plugin
Apache POI - Microsoft document parsing
PDFBox-Android - PDF processing
EpubLib - EPUB support
Room - Database abstraction
Dagger Hilt - Dependency injection
OkHttp - HTTP client
Retrofit - Type-safe HTTP client

Inspiration

Privacy-first movement
Open source community
User feedback and feature requests

Support

Get Help

Discord Community: Join Server - Active community for questions, tips, and discussions
GitHub Issues: Report Bug/Request Feature
Email: siddheshsonar2377@gmail.com

Support Development

⭐ Star this repository if you find it useful
🐛 Report bugs to help improve stability
💡 Suggest features to guide development
📖 Improve documentation for better onboarding
🔧 Contribute code to add features
💬 Help others in Discord community

Security

Reporting Vulnerabilities

If you discover a security vulnerability, please:

Do NOT open a public GitHub issue
Email siddheshsonar2377@gmail.com with details
Include steps to reproduce
Allow reasonable time for fix before public disclosure

We take security seriously and will respond promptly.

Built with ❤️ by Siddhesh Sonar

Privacy-first AI for everyone. Own your data. Own your AI.

⭐ Star this repository · 🐛 Report Bug · 💡 Request Feature · 💬 Join Discord

Quick Links

Installation · Quick Start · Features · RAG System · Memory Vault · FAQ

License: Apache 2.0 · Version: 1.2.0 · Platform: Android 12+

Name		Name	Last commit message	Last commit date
Latest commit History 603 Commits
.github		.github
.idea		.idea
app		app
docs		docs
gradle		gradle
libs		libs
memory-vault		memory-vault
neuron-packet		neuron-packet
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
TASK.MD		TASK.MD
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Uh oh!

License

Siddhesh2377/ToolNeuron

Folders and files

Latest commit

History

Repository files navigation

ToolNeuron

Privacy-First AI Assistant for Android - Complete On-Device Intelligence

Why ToolNeuron?

Table of Contents

Features Overview

Core Capabilities

Text Generation

Model Support

Performance

Advanced Features

Supported Callbacks

Image Generation

Stable Diffusion 1.5

Capabilities

Advanced Features

Text-to-Speech (TTS)

Voice Options

Features

Settings

Plugin System

Built-in Plugins

Web Search

Calculator

Dev Utils

Plugin Architecture

RAG System (Document Intelligence)

RAG Creation Methods

1. From Text

2. From Files

3. From Chat History

4. From Neuron Packets

5. Secure RAG Creation

RAG Features

Embedding Engine

RAG UI Features

Memory Vault (Secure Storage)

Core Architecture

Data Types

Messages

Files

Embeddings

Custom Data

Caching System

Storage Operations

Vault Statistics

Vault UI Features

Document Processing

Supported Formats

PDF

EPUB

Microsoft Word

Microsoft Excel

Plain Text

Processing Features

Model Management

Model Store

Model Configuration

Model Picker

Privacy & Security

Data Collection

What Stays Local

Encryption Details

No Permissions Required

Verification

Installation

System Requirements

Download

Quick Start

1. Get Models

Option A: In-App Model Store (Recommended)

Option B: Manual Download

2. Generate Text

3. Generate Images

Packages