SWIRL is an open-source unified search platform that brings ChatGPT-level intelligence to your company’s existing data stores—without copying data to the cloud. It’s designed for IT teams, knowledge workers, and developers who need fast, secure, context-aware search across siloed systems like SharePoint, Google Drive, and Jira. By eliminating the need for vector databases and complex ETL pipelines, SWIRL solves the problem of slow, fragmented internal search that wastes hours per employee weekly.
Built with Python and Django, SWIRL uses federated search and real-time query transformation to unify results from 100+ connectors. It supports synchronous and asynchronous search federation via REST APIs, stores results in SQLite3 or PostgreSQL, and leverages spaCy and NLTK for semantic relevance ranking. Deployable in minutes via Docker, it integrates with OpenAI for RAG and respects existing enterprise permissions.
What You Get
- Federated Search Across 100+ Connectors - Search SharePoint, Microsoft 365, Confluence, GitHub, Jira, Arxiv, Google News, and more—all without moving data. Each connector is configurable via SearchProvider configurations.
- No Vector Database Required - Uses semantic relevance ranking via spaCy and NLTK to re-rank results without storing embeddings or managing vector databases like Pinecone or Chroma.
- Real-Time Query Transformation - Automatically adapts search syntax (e.g., converts ‘NOT term’ to ‘-term’) and handles operators like AND, OR, and + across incompatible search APIs.
- Result Re-Ranking with Cosine Similarity - Uses spaCy’s large language model and NLTK to score and reorder results by semantic relevance, not just keyword matching.
- Duplicate Detection via Cosine Similarity - Identifies and removes duplicate results using configurable cosine similarity thresholds to avoid redundancy in search outputs.
- Built-in Query and Result Pipelining - Supports custom Processor stages to transform queries and results in real time, enabling dynamic filtering, enrichment, and formatting before display.
- Search Subscription & Real-Time Monitoring - Subscribe to searches to receive continuous updates when new results appear, ideal for monitoring tickets, documents, or news feeds.
- Result Mixers for Custom Ordering - Control result order with options like relevancy, date, or round-robin (stack) mixing, with filters for new items only in subscribe mode.
- Spell Correction with TextBlob - Automatically suggests corrections for misspelled queries to improve search accuracy without requiring user input.
- Search Expiration Service - Automatically deletes old search results to manage storage usage, configurable via admin settings for compliance and cost control.
Common Use Cases
- Knowledge Base Search in Enterprise Teams - A corporate knowledge manager connects SWIRL to SharePoint and Confluence to let employees instantly find internal docs with source links, reducing time spent hunting for information.
- Customer Support Automation - A support team uses SWIRL to search across help docs and ticket systems, enabling agents to draft accurate responses using internal content without leaving their workflow.
- Developer Productivity Assistant - Engineers query GitHub repositories and Jira tickets to find code examples, bug fixes, and documentation—accelerating onboarding and troubleshooting without switching tools.
- Unified Search for Compliance Teams - Legal and compliance teams use SWIRL to search across email archives, file shares, and CRM systems while preserving access controls and avoiding data duplication.
Under The Hood
Architecture
- Django-based monolithic backend with tightly coupled views, models, and serializers, lacking a clear service layer despite existing abstractions
- Dependency injection is absent, with components like Authenticator and MyConnector directly instantiated in views, violating inversion of control
- Search providers, result processors, and query transformers are hardcoded with no plugin system or extension interface
- Redis and Celery are used for async tasks but lack domain-specific handlers, resulting in mixed concerns
- Frontend and backend are deployed separately with no formal API contract or type safety, relying on undocumented JSON payloads
- Configuration is fragmented across environment variables, JSON files, and hardcoded paths, with no centralized validation or schema enforcement
Tech Stack
- Python 3.13 backend powered by Django and REST Framework, served via Daphne with OpenSearch and Elasticsearch integration
- Celery with Redis as message broker for asynchronous search processing, orchestrated through docker-compose and environment-driven routing
- Next.js frontend with server components enabling SSR and dynamic result rendering, integrated via static asset injection from a monorepo
- Docker-based deployment using multi-stage builds, pre-bundled NLP models, and automated configuration templating
- Comprehensive testing infrastructure with pytest, Django fixtures, and mocking for external services
- Infrastructure-as-code approach with Dockerfiles and docker-compose defining a full-stack platform including Redis, PostgreSQL, and LLM integrations
Code Quality
- Extensive test suite with pytest fixtures and Django TestCase covering authentication, API integrations, and query logic using realistic mock data
- Clear separation of test concerns through dedicated JSON payload files, enabling isolated and repeatable validation of search transformations
- Robust error handling with custom exception classes and comprehensive try/except blocks across authentication and HTTP layers
- Consistent naming conventions aligned with Django and Python standards, enhancing readability and maintainability
- Strong type safety and configuration validation via structured JSON mappings, reducing runtime parsing failures
- Extensive use of mocking and environment isolation to test external integrations without live dependencies
What Makes It Unique
- Integrates conversational search directly into Django ORM, translating natural language to SQL without external NLP pipelines
- Implements real-time relevance feedback in Next.js server components, enabling fluid query refinement during user interaction
- Features a context-aware query rewriting engine that adapts search intent using session history and document metadata
- Embeds semantic document clustering within the search index for automatic result grouping without post-processing
- Introduces a lightweight, schema-less log parser that auto-detects and structures application logs into searchable semantic events
- Combines chat-based query correction with result validation to create a closed-loop system that improves accuracy over time