Banana Slides is an open-source AI-powered PPT generator built for users who want to create professional, design-rich presentations without manual layout work. It leverages the nano banana pro model to generate high-fidelity slides from text prompts, uploaded documents, or reference images, solving the pain points of template-bound AI tools that lack flexibility and visual quality. Built with TypeScript and deployed via Docker or cloud platforms, it integrates with Gemini, OpenAI, and other LLMs through AIHubMix, and supports advanced document parsing and editable PPTX export.
The application combines a modern frontend with a backend powered by Python and uv, using environment-configurable AI providers to generate slides, extract content from PDFs/DOCX/MD files, and render editable PPTX outputs with accurate typography and positioning. It supports dark mode, internationalization, and multi-architecture Docker builds for flexible deployment.
What You Get
- Vibe-based Natural Language Editing - Users can modify slides by typing natural language commands like “change this chart to a pie chart” or “make the title bolder”, and the AI re-generates the slide with precise visual adjustments.
- Editable PPTX Export (Beta) - Exports slides as fully editable PowerPoint files with preserved text styles (font size, color, bold), accurate positioning, and clean backgrounds, enabling post-generation edits in PowerPoint.
- Multi-Format Document Parsing - Automatically extracts text, images, and tables from uploaded PDF, DOCX, MD, and TXT files to use as content sources for slide generation.
- Style Reference via Image Upload - Users can upload any image as a visual style reference to guide the AI in matching fonts, colors, and layout aesthetics of the target design.
- No-Template Design Freedom - Unlike traditional AI PPT tools, Banana Slides doesn’t force pre-built templates; it generates unique layouts from scratch based on content and reference images.
- Multi-LLM Backend Support - Supports Gemini, OpenAI, Vertex AI, DeepSeek, Doubao, Qwen, and others via configurable API endpoints, allowing users to choose cost-effective or high-quality models.
Common Use Cases
- Creating investor pitch decks - A startup founder uploads a business plan PDF, generates a visually cohesive slide deck in minutes, then edits the layout with natural language to emphasize metrics before exporting to editable PPTX for final tweaks.
- Designing classroom presentations - A teacher writes a lesson outline in plain text, uploads a reference image of a textbook style, and generates a clean, illustrated slide deck for students without needing design skills.
- Producing product demos for sales teams - A product manager pastes a feature list and uploads competitor slide examples; Banana Slides generates branded, high-fidelity slides that sales reps can edit in PowerPoint before client meetings.
- Students preparing thesis defenses - A graduate student uploads their thesis draft in PDF, generates a slide summary with key findings, then uses Vibe editing to refine visuals for their defense presentation—all without touching PowerPoint manually.
Under The Hood
Architecture
- Monolithic backend structure centered on Flask with intertwined models and business logic, lacking explicit service-layer separation
- Modular frontend and backend isolation via Docker Compose, enabling independent deployment and scaling through containerized services
- Dependency injection handled implicitly via app factories and environment configs, without formal DI containers or interface abstractions
- File parsing and AI service logic encapsulated in dedicated services with clear contracts, though tightly bound to Flask request context
- Multi-provider AI integration abstracted behind a unified interface, but lacks dynamic strategy selection or pluggable provider patterns
- REST API between frontend and backend is containerized and health-checked, but lacks versioning and standardized response schemas
Tech Stack
- Python 3.10+ backend powered by Flask with SQLAlchemy and Migrate for ORM and schema management
- React and TypeScript frontend built with Docker, using npm for dependency management and multi-stage builds for optimization
- Full-stack orchestration via Docker Compose with environment-driven configuration, health checks, and persistent volumes for data
- AI capabilities integrated via multiple vendor SDKs with robust retry mechanisms for reliability
- Testing ecosystem includes pytest, Playwright, and uv with PyPI mirror support for reproducible builds
- Production-grade infrastructure with GCP service account integration and deterministic build pipelines
Code Quality
- Extensive test coverage spanning unit, integration, and E2E layers with well-isolated mocks and custom assertion utilities
- Strong error handling and graceful degradation for API failures, with comprehensive validation of edge cases
- Consistent, intent-driven naming conventions and bilingual labeling that reflect real-world user contexts
- Frontend E2E tests benefit from typed mocks and predictable state initialization, reducing flakiness
- Visual regression testing ensures UI consistency across key components like slide previews and editors
- CI/CD-ready test suites with snapshot testing and configuration fallback validation to prevent regressions
What Makes It Unique
- AI-powered image inpainting that intelligently removes obstructions while preserving background context for seamless slide editing
- Registry-based text attribute extraction system that reconstructs color, style, and LaTeX formatting from slide images
- Native Markdown editor with metadata-aware image handling and inline uploads that maintain visual fidelity
- Multi-provider AI abstraction enabling flexible integration of diverse vision models beyond a single vendor
- Semantic slide reconstruction pipeline that preserves design intent by bridging visual analysis with structural output formats
- WYSIWYG-like Markdown editing experience with imperatively controlled cursor behavior and dynamic image chips