Azure Di Showcase
Building a Better Azure Document Intelligence Playground: AI-Assisted Development in Production
Context: Document Intelligence in Swiss Insurance
At our firm, we develop LLM-based document data extraction solutions for the Swiss insurance market. Azure Document Intelligence is a critical component of our stack, handling structured extraction from diverse document types: claims forms, policy documents, medical reports, and correspondence across multiple languages (German, French, Italian).
The challenge with document intelligence systems isn’t the API—Azure’s REST interface is well-designed and comprehensive. The challenge is rapid iteration during development. We need to test various models against different document types, validate extraction accuracy, debug edge cases, and demonstrate capabilities to stakeholders—often within the same day.
The Tooling Gap
For experimentation and testing, we had two options:
Option 1: Azure Document Intelligence Studio
- Microsoft’s official web-based playground
- Excellent for basic testing, but limited customization
- Lacks workflow integration and automation capabilities
- No version control or programmatic access
Option 2: Form Recognizer Toolkit
- Open-source React application: microsoft/Form-Recognizer-Toolkit
- Full-featured but architecturally complex
- Heavy TypeScript/React stack with multiple dependencies
- Customization requires navigating React components, state management, and build configurations
- Setup time measured in hours, not minutes
Neither option aligned with our needs: a lightweight, customizable testing environment that could be deployed quickly, modified easily, and integrated into development workflows.
The solution: build a purpose-built alternative using Streamlit and Python.
The AI-Assisted Development Experiment
This project became an opportunity to evaluate modern AI coding assistants in a real-world scenario. I decided to build the same application using three different tools in parallel:
- Gemini CLI (free version, 2.5 Flash model)
- OpenAI Codex CLI (GPT-4.1 via Plus subscription)
- Claude Code CLI (Claude 3.7 Sonnet—this was before the 4.5 release)
The methodology was pure “vibe-coding”: natural language prompts, no manual code edits, iterative refinement through conversation. The AI writes the specification, generates the code, debugs issues, and produces documentation.
The Race: Three AIs, One Task
Initial Prompt
All three systems received the same instruction:
” I am writing a self-contained demo to showcase all the capabilities of azure document intelligence 4.0. build me a stream lit app. I envision it like this 1) in the sidebar, there will be a dropdown allowing to choose different Azure DI models (layout, general, receipts etc.). Later, we will add auto mode (choosing model automatically with LLM). When we choose a model in the sidebar, below will appear ui controls for all the parameters of that model available in the API. on the right of the sidebar, there will be an interface similar to what is available in Azure DI studio(see img.png). It will display the anotated document (all formats supported by azure DI) and also parsing results in three forms (nicely formatted, markdown if available from response, and json with ability to collapse. Azure Document Intelligence must be used via REST API (no python SDK). Start by writing a detailed specification and put it into SPEC.md “
The Divergence
All three AI assistants followed the instruction to write SPEC.md first. However, the quality and usability of the specifications varied significantly:
Gemini 2.5 Flash:
- Generated a basic specification document
- Implementation diverged from the spec during multi-file development
- Context loss during API integration discussions required repeated clarification
- Model configuration generation was incomplete
OpenAI GPT-4.1:
- Produced a reasonable specification
- Initial code structure was clean
- Implementation struggled with async polling patterns specific to Azure DI
- Lost momentum during multi-file coordination
Claude 3.7 Sonnet:
- Generated the most comprehensive SPEC.md (383 lines)
- Consistently referenced the spec during implementation
- Maintained architectural coherence across all modules
- UI implementation matched the specification from the first iteration
The Specification Quality Gap
Claude’s specification document included details that proved critical for implementation success:
- Complete definitions for all 20+ Azure DI models with feature mappings
- API parameter configurations with validation rules and UI widget types
- ASCII art wireframes showing exact UI layout
- REST API patterns including operation polling and error handling
- Credential management strategies
The key difference wasn’t just writing a specification—it was using the specification as a consistent reference throughout development. While all three AIs generated specs, Claude maintained specification fidelity during implementation, preventing architectural drift.
After multiple attempts with Gemini and GPT-4.1 that resulted in incomplete or inconsistent implementations, I focused exclusively on Claude’s output.
The Build: 4-5 Hours to Production
Phase 1: Architecture and Core (Hours 0-2)
Claude generated the foundational structure:
azure-di-showcase/
├── app.py # Main Streamlit application
├── config.py # Models and parameters configuration
├── azure_di_client.py # REST API client with async polling
├── ui_components.py # Reusable UI components
├── document_processor.py # Document handling utilities
├── logging_config.py # Centralized logging
└── SPEC.md # Technical specification
Key achievements:
- Async REST client with proper operation polling
- All 20+ models configured with correct parameters
- Dynamic UI generation based on selected model
- Type hints and docstrings throughout
Phase 2: Features and Integration (Hours 2-4)
Implemented complete feature set:
- Three upload methods: file upload, URL input, sample documents
- Document viewer with PDF page navigation
- Multi-tab results display: Fields, Markdown, Raw JSON
- Annotated document visualization (bounding boxes)
- Configurable logging (INFO/DEBUG/ERROR levels)
- Comprehensive error handling
Phase 3: Deployment and Documentation (Hour 4-5)
Prepared for Streamlit Community Cloud:
- Multiple credential sources: Streamlit secrets, environment variables, UI input
- Created deployment guide (STREAMLIT_DEPLOYMENT.md)
- Added MIT license with attribution
- Fixed system dependencies (poppler-utils for PDF processing)
- Generated QUICKSTART.md, README.md, LOGGING.md, CODE_QUALITY.md
Total development time: 4-5 hours, zero manual code edits.
The Result
Live Application: azure-di-showcase.streamlit.app Source Code: github.com/vykhand/azure-di-showcase
Capabilities
Supported Models (20+):
- Core: Read OCR, Layout Analysis
- Business: Receipts, Invoices, Contracts, Business Cards, ID Documents
- Financial: Bank checks, statements, pay stubs, credit cards
- Government: Tax forms (W-2, W-4, 1040, 1098, 1099, 1095), mortgage documents
- Healthcare: US Health Insurance Cards
Technical Features:
- Async REST API integration with operation polling
- Dynamic parameter configuration based on model selection
- Interactive document viewer with bounding box annotations
- Multi-format output: structured fields, markdown, raw JSON
- Configurable logging with DEBUG mode for API introspection
- Multiple credential sources: Streamlit secrets, environment variables, UI input
- System dependency management for cloud deployment
Use Cases:
- Rapid model testing during development
- Document extraction accuracy validation
- Stakeholder demonstrations
- API behavior debugging
- Training data generation
- Integration prototyping
What Made the Difference
Specification Fidelity
The winning factor was maintaining consistency between specification and implementation. Claude generated a detailed SPEC.md and then actually followed it during code generation. This prevented architectural drift across the multi-file Python project.
Context Retention
Complex projects require maintaining awareness across multiple files and conversations. The Azure DI integration required:
- Model-specific parameter configurations
- REST API polling patterns with operation IDs
- Async operation handling
- Multi-format document processing
- Cross-file dependency management
Claude maintained this context throughout the 4-5 hour development session, while the other assistants required re-explanation of previously established patterns.
UI Implementation Quality
The most visible difference was in the initial UI implementation. Claude’s first iteration produced a functional interface that closely matched the Azure DI Studio reference, with:
- Proper sidebar layout with dynamic controls
- Document viewer with annotation support
- Three-tab results display (Fields, Markdown, JSON)
- Working file upload and URL input
The other assistants required multiple iterations to achieve similar layouts.
Autonomous Problem-Solving
During deployment preparation, Claude proactively:
- Implemented multiple credential sources (secrets, env vars, UI input)
- Created comprehensive deployment documentation
- Identified system dependencies (poppler-utils) before deployment
- Generated troubleshooting guides
These weren’t explicitly requested but emerged naturally from understanding the deployment context.
Practical Implications for Development Workflows
When AI-Assisted Development Works
This experiment demonstrates that current-generation AI coding assistants (Claude 3.7+) are production-ready for:
Specification-First Projects:
- Well-defined requirements
- Standard architectural patterns
- Clear API contracts
- Documented third-party services
Rapid Prototyping:
- Internal tooling
- Testing interfaces
- Demo applications
- Proof-of-concept implementations
Documentation-Heavy Work:
- Deployment guides
- API references
- Technical specifications
- Troubleshooting documentation
When Human Oversight Remains Critical
AI-assisted development still requires human judgment for:
- Security review (credential handling, data validation)
- Performance optimization (caching strategies, async patterns)
- Business logic validation (domain-specific rules)
- Production monitoring (observability, error tracking)
The tool writes production-ready code, but production deployment requires human verification.
Future Experiments: Claude 4.5 vs GPT-5
With the recent release of Claude 4.5 Sonnet (February 2025) and the upcoming GPT-5, I plan to repeat this experiment with a different task. Areas of interest:
Complexity Factors:
- Multi-service orchestration (Azure + other APIs)
- State management in distributed systems
- Real-time data processing
- Custom training workflows
Evaluation Criteria:
- Code quality and maintainability
- Error handling comprehensiveness
- Documentation accuracy
- Deployment success rate
- Debug efficiency
Initial impressions suggest Claude 4.5’s improvements focus on extended context and reasoning depth. GPT-5’s capabilities remain to be evaluated at release.
Recommendations for Azure Document Intelligence Users
If you’re working with Azure Document Intelligence and find the existing tooling insufficient:
For Development Teams:
- Fork azure-di-showcase as a starting point
- Customize for your document types and workflows
- Extend with custom models if needed
- Deploy to internal infrastructure or Streamlit Cloud
For Evaluation/Testing:
- Use the live demo directly
- Test your documents against multiple models
- Compare extraction accuracy across model versions
- Export results for further processing
For Integration:
- Reference
azure_di_client.py
for REST API patterns - Adapt async polling logic for your applications
- Use
config.py
as a reference for model parameters - Leverage logging patterns for debugging
The MIT license permits commercial use with attribution.
Conclusion
This experiment demonstrates that AI-assisted development has reached a maturity level suitable for production tooling when:
- Requirements are well-defined: Clear scope and success criteria
- Specification precedes implementation: Architecture documented before code generation
- Tasks match AI strengths: Standard patterns, documented APIs, multi-file projects
- Human oversight is maintained: Security review, production validation
The comparative evaluation revealed that specification fidelity and context retention are critical differentiators for complex multi-file projects. While all three AI assistants could generate code, maintaining architectural coherence across a 4-5 hour development session separated successful implementations from incomplete ones.
For teams working with Azure Document Intelligence in LLM-based extraction workflows, this project demonstrates that custom tooling can be developed rapidly without sacrificing code quality or maintainability. The key is selecting AI tools that maintain context and follow architectural specifications consistently.
Resources
Project Links:
- Live Application: https://azure-di-showcase.streamlit.app/
- GitHub Repository: https://github.com/vykhand/azure-di-showcase
- Technical Specification: SPEC.md
- Deployment Guide: STREAMLIT_DEPLOYMENT.md
Azure Documentation:
AI Tools Used:
- Claude Code (Anthropic): https://claude.ai/code
- Gemini CLI (Google): https://ai.google.dev/
- OpenAI API (OpenAI): https://platform.openai.com/
License: MIT with attribution requirement Author: Andrey Vykhodtsev Tags: #AzureAI #DocumentIntelligence #LLM #MachineLearning #Python #Streamlit #AIAssistedDevelopment #Claude #InsurTech