BotCity Framework
Computer-vision powered Python framework for building RPA bots that click, type, and read any desktop, web, or terminal UI without needing an API.
BotCity Framework Core is the Python engine behind BotCity’s open-source robotic process automation (RPA) toolkit, giving developers a single API to see, click, type, and read any user interface — desktop, web, or terminal — the same way a human would. Rather than depending on fragile OS-level accessibility APIs, its DesktopBot class locates on-screen elements by matching reference images against live screenshots using an OpenCV template-matching engine adapted from the pyscreeze project, then drives the mouse and keyboard through pynput. On Windows, an optional pywinauto integration lets bots connect directly to native application windows and controls when image matching alone isn’t precise enough.
The framework ships as botcity-framework-core on PyPI and is one piece of a broader automation platform: a sibling botcity-framework-web-python package handles Selenium-driven browser automation with the same API shape, and botcity-maestro-sdk-python connects bots to BotCity Maestro, a hosted orchestrator for task queues, runtime environments, alerts, and execution logs. A companion desktop tool, BotCity Studio, lets developers select UI elements visually and generate the corresponding Python code, and camelCase method aliases (clickOn, findUntil, moveAndClick) are baked into DesktopBot for parity with BotCity’s Java framework, so teams can mix languages across the same bot fleet.
Because it works purely off pixels and window handles, the framework can automate legacy Windows applications, terminal sessions, and any GUI that predates or lacks a usable API — a common gap in enterprises still running decades-old line-of-business software. BotCity itself is backed by Y Combinator (W23), positioning the open-source framework as the developer on-ramp to its commercial Maestro orchestration and deployment platform rather than as a standalone product with paid tiers of its own.
What You Get
- A
DesktopBotbase class you subclass and implementaction()on to define your automation logic - Computer-vision element location via
find/find_until, matching saved reference images against live screenshots with configurable confidence thresholds - Full mouse and keyboard control (click, double-click, triple-click, drag, type, key combinations) through pynput, with retina-display coordinate correction on macOS
- Optional pywinauto integration to connect to and drive native Windows application windows and controls directly by selector, bypassing image matching entirely
- Clipboard access via pyperclip and screenshot/screen-cut utilities for capturing and saving regions of the screen
- Java-parity camelCase method aliases (
clickOn,findUntil,doubleClick,moveAndClick) so bots written against BotCity’s Java framework port over with minimal changes - Optional integration with the BotCity Maestro SDK for task queue, alerting, and execution-log connectivity when
botcity-maestrois installed
Common Use Cases
- Automating legacy Windows desktop applications that have no exposed API
- Data entry and form-filling across internal line-of-business tools
- Scraping or transcribing information displayed in terminal or console-based systems
- Cross-checking or migrating data between two UIs that don’t share a common backend
- Building RPA proof-of-concepts quickly with BotCity Studio’s point-and-click code generation
Under The Hood
Architecture - DesktopBot (subclassing a shared BaseBot/State pair imported from the companion botcity-framework-base package) is the composition root: it wires together a computer-vision search module (cv2find), an input-control layer built on pynput’s mouse and keyboard controllers, and an optional pywinauto-backed window-automation layer guarded behind an ImportError so the framework degrades gracefully on non-Windows platforms. Developers subclass DesktopBot and implement an action() method; at runtime, calls like find_until capture a screenshot, hand it to the OpenCV matcher alongside a saved reference image, and return element coordinates that subsequent mouse/keyboard calls act on, with a State object tracking the last match and a registered image map. A conditional import of BotMaestroSDK lets the same class report status back to BotCity’s hosted orchestrator when that companion package is present, without making it a hard dependency of the core framework.
Tech Stack - Built for Python 3.7+ and packaged with setuptools plus versioneer for git-tag-based version management, the framework leans on well-established libraries rather than novel infrastructure: OpenCV (opencv-python) and NumPy for image template matching, Pillow for screenshot capture, pynput for cross-platform mouse/keyboard control, pywinauto for native Windows window automation, psutil for process management, and pyperclip for clipboard access. Continuous integration runs a GitHub Actions matrix across multiple Python versions on Ubuntu with Xvfb for headless GUI testing, alongside separate flake8 linting and PyPI publishing workflows. The package is distributed as botcity-framework-core on PyPI and depends on a sibling botcity-framework-base package shared with the project’s web-automation counterpart.
Code Quality - Public methods are consistently documented with detailed, typed docstrings, and naming is disciplined snake_case throughout the core implementation, with a large parallel layer of camelCase aliases added purely for parity with BotCity’s Java framework. Test coverage is minimal: the only test present is a smoke test confirming the package imports successfully, with no unit tests exercising image matching, input control, or window automation behavior. Error handling favors quiet degradation over strict failures in several places — catching narrow exceptions and returning None, or printing a warning instead of raising for parameters that aren’t fully implemented yet — which keeps bots from crashing but can mask misconfiguration during development.
What Makes It Unique - The framework itself is a fairly conventional computer-vision-based desktop RPA library; its image-matching core is explicitly adapted from an existing open-source project rather than a novel technique. Its real differentiation is ecosystem positioning: a shared API surface across Python and Java implementations, a companion visual development tool (BotCity Studio) that generates this exact API from point-and-click UI selection, and tight integration with a hosted orchestration platform (Maestro) for queueing, monitoring, and deploying bots at scale. As a Y Combinator-backed (W23) company, BotCity’s open-source framework functions primarily as the developer entry point into that broader commercial platform, rather than as a standalone technical innovation in automation.
Self-Hosting
Licensing Model - botcity-framework-core-python is released under the Apache License 2.0, a permissive open-source license with no license key, telemetry gate, or usage restriction built into the framework code itself; no ee/, enterprise/, or pro/ directories, license-check functions, or feature flags were found anywhere in the repository.
Self-Hosting Restrictions - None found. The framework runs entirely on the machine executing the bot script (installed via pip install botcity-framework-core) with no server component, phone-home requirement, or network dependency for its core desktop-automation functionality.
Enterprise Features - The core framework does not gate any of its automation capabilities (image matching, input control, window automation) behind a paid tier; every DesktopBot method is available to any user of the open-source package.
Cloud vs Self-Hosted - BotCity offers a separate commercial product, BotCity Maestro, described in the README as an “All in One Platform” for deploying bots, managing a task queue, monitoring runtime environments, and generating alerts/reports. This orchestration layer is a distinct hosted (or customer-deployed) service accessed through the optional botcity-maestro SDK package — it is not part of, nor required by, this framework repository, and this repo’s README does not disclose Maestro’s specific pricing tiers.
License Key Required - Not for the framework covered by this record. A free community account is mentioned for BotCity’s broader platform tooling (e.g. BotCity Studio downloads), but no license key is required to install or use botcity-framework-core itself.
Related Apps
n8n
Automation · No Code Platforms
Code when you need it, UI when you don't — the workflow automation platform built for technical teams who refuse to choose.
n8n
OtherAutoGPT
Automation · Productivity · AI Assistants
Build, deploy, and run autonomous AI agents that automate complex multi-step workflows using a visual block-based graph editor.
AutoGPT
OtherOllama
AI Development · Developer Tools
Run Llama, Gemma, DeepSeek, and other open LLMs on your own machine with one command and an OpenAI-compatible API.