Social Analyzer is a multi-platform OSINT tool designed for investigators, security researchers, and law enforcement to trace digital identities across 1000+ social media sites. It solves the challenge of fragmented profile discovery by automating username searches, extracting metadata, and applying detection algorithms to identify verified profiles with minimal false positives.
Built with Node.js and Python, it offers a web interface, CLI, and API integrations. It leverages Selenium, Tesseract OCR, Google/DuckDuckGo APIs, and Qeeqbox’s OSINT metadata engine to extract patterns, screenshots, and contextual data. Deployable on Linux, Windows, macOS, and via Docker, it supports both local analysis and integration into broader security workflows.
What You Get
- Multi-Platform Profile Detection - Searches 1000+ social media sites including Facebook, Gmail, and Google using username permutations and combinations to find matching profiles.
- Detection Rating System (0-100) - Uses a confidence-based rating (No-Maybe-Yes) to reduce false positives by combining multiple detection techniques like OCR, pattern matching, and metadata analysis.
- Metadata & Pattern Extraction - Extracts structured metadata from profiles using Qeeqbox OSINT engine, including embedded patterns, links, and contextual data.
- Visualized Profile Graphs - Generates force-directed graphs to map relationships between extracted metadata and detected profiles.
- Screenshot & Description Capture - Automatically captures profile screenshots, page titles, and website descriptions for forensic documentation.
- Custom Search Queries - Supports Google and DuckDuckGo API-based searches with custom queries to supplement social media findings.
- Filtering by Confidence & Type - Filters results by detection rating (good/maybe/bad), profile type (adult/music), country, or Alexa top rankings.
- JSON Output & Logging - Exports full analysis results as structured JSON and logs detailed output to file or terminal with prettified formatting.
- Multi-User & Bulk Search - Searches multiple usernames at once (e.g., “johndoe,janedoe”) and correlates profiles across platforms for identity linking.
- OCR & Text Extraction - Uses Tesseract.js to extract text from profile screenshots for non-textual content analysis.
- Proxy, Timeout & User-Agent Customization - Configurable HTTP headers, proxies, timeouts, and implicit waits to bypass restrictions and avoid detection.
- Web App & CLI Access - Provides both a browser-based GUI (http://0.0.0.0:9005/app.html) and command-line interface for automation and scripting.
Common Use Cases
- Investigating cyberbullying cases - A school counselor uses Social Analyzer to trace anonymous bullies by searching their username across platforms to identify real-world profiles and associated content.
- Law enforcement OSINT operations - Police in resource-limited regions use the tool to correlate suspect usernames with social media accounts, leveraging its detection ratings to prioritize high-confidence leads.
- Security researchers tracking threat actors - A red team analyst uses the tool to map a target’s digital footprint by searching for username variants and extracting metadata from detected profiles.
- Journalists verifying anonymous sources - A reporter uses Social Analyzer to confirm if a whistleblower’s claimed social profiles are legitimate by cross-referencing usernames and extracting metadata.
Under The Hood
Architecture
- Monolithic structure with tightly coupled CLI, API, and scanning logic in core files, lacking clear separation of concerns
- No dependency injection or service abstraction; modules initialize with side effects and direct imports
- Dual-language codebase (Node.js and Python) with parallel, non-integrated implementations causing duplication and maintenance friction
- Entry points conflate server bootstrapping with argument parsing, violating single-responsibility principles
- Web scrapers and API clients lack interfaces or abstraction layers, making extensions fragile and testing difficult
- Docker and Selenium integrations are hardcoded, with no environment-aware configuration or orchestration abstraction
Tech Stack
- Node.js with Express serving as the primary backend, configured for Dockerized deployment with version-pinned dependencies
- Selenium-based browser automation using Firefox ESR in containerized environments for headless social media scraping
- JavaScript ecosystem includes cheerio, tesseract.js, and wink-tokenizer for parsing, OCR, and text analysis
- Python backend leverages BeautifulSoup4, lxml, and langdetect for structural and linguistic analysis
- Docker Compose orchestrates Selenium Hub and Firefox nodes to enable scalable browser automation
- Linting and formatting tooling (ESLint, Prettier) enforce basic JavaScript code quality standards
Code Quality
- Minimal testing coverage, relying on shell script output checks rather than unit or integration tests
- Poor code organization with intertwined data loading, analysis, and CLI logic
- Generic error handling without custom exceptions or recovery strategies
- Inconsistent naming conventions across modules and files
- Absence of type safety, static analysis, or comprehensive linting rules
- No dependency validation or version pinning in Python components, increasing instability risk
What Makes It Unique
- Unified CLI and API interface with identical logic across Node.js and Python implementations for seamless cross-environment use
- Built-in WAF and CAPTCHA detection using heuristic pattern matching on HTML metadata, avoiding external service dependencies
- Dynamic website classification engine that categorizes platforms by type and region using structured configuration files
- Concurrent profile detection with adaptive request headers and timeouts to bypass anti-bot systems
- Metadata extraction layer that infers platform legitimacy through technical fingerprints and language patterns
- Extensible scan framework with pluggable site definitions and modular scan modes enabling community-driven expansion