Data Analytics – 💡Tech News & Insights

AutoDoc: The Tool I Developed That Finally Solves Power BI’s Documentation Issues

If you’ve ever worked with Power BI in an enterprise environment, you’ve faced the same frustrating challenge that has plagued data professionals for years: comprehensive documentation. You spend weeks building sophisticated reports with complex DAX measures, intricate data models, and carefully crafted visualizations, only to realize that documenting everything properly will take nearly as long as building the solution itself.

The documentation dilemma is a real and costly issue. Teams often skip it due to time constraints, resulting in knowledge silos when developers leave the organization. Stakeholders struggle to understand the report’s logic without proper documentation. Compliance requirements go unmet. New team members take months to understand existing models. Manual documentation becomes outdated the moment a model changes.

What if there were a way to generate comprehensive, professional Power BI documentation automatically, in minutes rather than hours? What if you could chat with an AI assistant about your report’s structure, ask questions about specific DAX measures, and get detailed explanations about table relationships—all based on your actual model data?

Enter AutoDoc—the AI-powered solution that finally solves Power BI’s documentation problem once and for all.

What is AutoDoc?

AutoDoc is a revolutionary documentation generator specifically designed for Power BI. It harnesses the power of artificial intelligence to create comprehensive, professional documentation automatically. Think of it as having a dedicated documentation specialist who never sleeps, never misses details, and can analyze your entire Power BI model in minutes.

AutoDoc is an open-source tool that offers complete flexibility for implementation, both in the cloud and locally, through the repository available on GitHub. The solution allows secure execution in a local environment, including with local LLM models via Ollama, or can be securely hosted on platforms such as Microsoft Azure AI Foundry or Amazon Bedrock.

The Multi-AI Advantage

What sets AutoDoc apart from other documentation tools is its integration with multiple leading AI providers, giving you the flexibility to choose the language model that best fits your needs and budget:

OpenAI GPT-4.1 models (nano and mini variants)
Azure OpenAI GPT-41 nano for enterprise environments
Anthropic Claude 3.7 Sonnet for advanced reasoning
Google Gemini 2.5 Pro for comprehensive analysis
Llama 4 for open-source flexibility

Core Capabilities

Intelligent File Processing: AutoDoc supports both .pbit (Power BI Template) and .zip files, automatically extracting and analyzing all components of your Power BI model regardless of complexity.

Comprehensive Analysis: The tool meticulously documents every aspect of your Power BI solution, including tables, columns, measures, calculated fields, data sources, relationships, and Power Query transformations.

Professional Output Formats: Generate documentation in both Excel and Word formats, ensuring compatibility with your organization’s documentation standards and workflows.

Interactive AI Chat: Perhaps the most groundbreaking feature is AutoDoc’s intelligent chat system that allows you to have conversations about your Power BI model, asking specific questions about DAX logic, table relationships, or data transformations.

Multi-Language: You can create Power BI documentation in multiple languages, including English, Portuguese, and Spanish.

How to Use AutoDoc

Using AutoDoc is remarkably straightforward, designed with busy data professionals in mind who need results quickly without a steep learning curve.

Getting Started

Step 1: Access AutoDoc. Visit https://autodoc.lawrence.eti.br/ to access the web-based version, or set up a local installation for enhanced security and control.

Step 2: Select Your AI Engine. Choose from the available AI models based on your specific requirements. Each model offers distinct strengths: GPT-4.1 for general use, Claude for complex reasoning, and Gemini for comprehensive analysis.

Step 3: Provide Your Power BI Model. You have two flexible options for getting your model into AutoDoc:

Option A: Direct Upload

Save your Power BI file as a .pbit template or export as .zip
Upload directly to the AutoDoc interface
The system automatically processes and analyzes your model

Option B: API Integration. For direct integration with Power BI Service:

Input your App ID in the sidebar
Provide your Tenant ID
Enter your Secret Value
AutoDoc connects directly to your Power BI workspace

Step 4: Review Interactive Preview. Before generating final documentation, AutoDoc provides an interactive visualization of your data model, allowing you to:

Verify the accuracy of the extracted information
Review table structures and relationships
Confirm DAX measures and calculations
Check data source connections

Step 5: Generate Documentation. Select your preferred output format (Excel or Word) and download professional documentation that includes:

Complete table inventory with column details
All DAX measures with expressions
Data source documentation
Relationship mappings
Power Query transformation logic

Step 6: Leverage AI Chat. After documentation generation, click the “💬 Chat” button to access the intelligent assistant. Ask questions like:

“Explain the logic behind the ‘Total Sales’ measure.”
“What relationships exist between the Customer and Orders tables?”
“Which columns in the Product table are calculated?”
“Show me all measures that reference the Date table.”

Token Configuration in AutoDoc

Depending on the size of your Power BI report, AutoDoc allows you to adjust the maximum number of input and output tokens to optimize processing.

What are tokens? Tokens are basic processing units of LLM models – they can be words, parts of words, or characters.

Input Tokens represent the amount of information the LLM model can process at once, including your report content and system instructions. This configuration allows you to:

Increase the value: Process more content simultaneously, reducing the number of required interactions
Decrease the value: Useful when the report is too large and exceeds model limits, forcing processing in smaller parts with more interactions.

Output Tokens: Define the maximum size of the response the model can generate. This configuration varies according to each LLM model’s capabilities and directly influences:

The length of the generated documentation
The completeness of the produced analyses
Processing time

Important: Each LLM model has specific token limitations. Refer to the documentation on this website to determine the exact limits and adjust these settings accordingly if necessary.

Free OpenAI & every-LLM API Pricing Calculator | Updated jun. 2025

How to Implement AutoDoc Locally

For organizations requiring enhanced security, compliance, or customization, AutoDoc offers complete local deployment capabilities. I created this open-source project, and you can find my GitHub repository here: https://github.com/LawrenceTeixeira/PBIAutoDoc

System Requirements

Operating System: Windows, macOS, or Linux Python Version: 3.10 or higher Network: Internet connection for AI model access API Access: Valid API keys for chosen AI providers

Installation Process

1. Repository Setup

Bash

git clone https://github.com/LawrenceTeixeira/PBIAutoDoc.git
cd AutoDoc

git clone https://github.com/LawrenceTeixeira/PBIAutoDoc.git
cd AutoDoc

2. Environment Configuration

Bash

# Create isolated Python environment
python -m venv .venv

# Activate environment
# Windows
.venv\Scripts\activate

# macOS/Linux  
source .venv/bin/activate

# Create isolated Python environment
python -m venv .venv

# Activate environment
# Windows
.venv\Scripts\activate

# macOS/Linux  
source .venv/bin/activate

3. Dependency Installation

Bash

# Install core requirements
pip install -r requirements.txt

# Install additional AI processing library
pip install --no-cache-dir chunkipy

# Install core requirements
pip install -r requirements.txt

# Install additional AI processing library
pip install --no-cache-dir chunkipy

4. Environment Variables Setup: Create a .env file in your project root:

Bash

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key

# Groq Configuration  
GROQ_API_KEY=your_groq_api_key

# Azure OpenAI Configuration
AZURE_API_KEY=your_azure_api_key
AZURE_API_BASE=https://<your-alias>.openai.azure.com
AZURE_API_VERSION=2024-02-15-preview

# Google Gemini Configuration
GEMINI_API_KEY=your_gemini_api_key

# Anthropic Claude Configuration
ANTHROPIC_API_KEY=your_anthropic_api_key

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key

# Groq Configuration  
GROQ_API_KEY=your_groq_api_key

# Azure OpenAI Configuration
AZURE_API_KEY=your_azure_api_key
AZURE_API_BASE=https://<your-alias>.openai.azure.com
AZURE_API_VERSION=2024-02-15-preview

# Google Gemini Configuration
GEMINI_API_KEY=your_gemini_api_key

# Anthropic Claude Configuration
ANTHROPIC_API_KEY=your_anthropic_api_key

5. Application Launch

Bash

# Standard launch
streamlit run app.py --server.fileWatcherType none

# Alternative for specific environments
python -X utf8 -m streamlit run app.py --server.fileWatcherType none

# Standard launch
streamlit run app.py --server.fileWatcherType none

# Alternative for specific environments
python -X utf8 -m streamlit run app.py --server.fileWatcherType none

Cloud Deployment Option

For scalable cloud deployment, AutoDoc supports Fly.io hosting:

Bash

# Install Fly CLI
curl -L https://fly.io/install.sh | sh
export PATH=/home/codespace/.fly/bin

# Authentication and deployment
flyctl auth login
flyctl launch
flyctl deploy

# Install Fly CLI
curl -L https://fly.io/install.sh | sh
export PATH=/home/codespace/.fly/bin

# Authentication and deployment
flyctl auth login
flyctl launch
flyctl deploy

What Are the Benefits?

AutoDoc delivers transformative benefits that address every central pain point in Power BI documentation:

Dramatic Time Savings

What traditionally takes hours or days now happens in minutes. Data professionals report saving 15-20 hours per week on documentation tasks, allowing them to focus on analysis and insights rather than administrative work.

Unmatched Accuracy and Completeness

Human documentation inevitably misses details or becomes outdated. AutoDoc captures every table, column, measure, and relationship automatically, ensuring nothing is overlooked and documentation remains current.

Professional Consistency

Every documentation output follows the same professional format and standard, regardless of who generates it or when. This consistency is crucial for enterprise environments and compliance requirements.

Enhanced Knowledge Transfer

The AI chat feature transforms documentation from static text into an interactive knowledge base. Team members can ask specific questions and get detailed explanations, dramatically reducing onboarding time for new staff.

Compliance and Audit Support

For heavily regulated industries, AutoDoc provides the comprehensive documentation required for compliance audits, with detailed tracking of data lineage, transformations, and business logic.

Improved Collaboration

Non-technical stakeholders can better understand Power BI solutions through clear, comprehensive documentation. The chat feature allows business users to ask questions about data definitions and calculations without requiring technical expertise.

Cost Efficiency

By automating documentation processes, organizations reduce the human resources required for documentation maintenance while improving quality and coverage.

Conclusion

AutoDoc represents more than just another documentation tool—it’s a paradigm shift that finally makes comprehensive Power BI documentation practical and sustainable. By combining cutting-edge AI technology with a deep understanding of Power BI architecture, AutoDoc solves the fundamental challenges that have made documentation a persistent pain point for data teams worldwide.

The tool’s multi-AI approach ensures flexibility and future-proofing, while its interactive chat capability transforms static documentation into a dynamic knowledge resource. Whether you’re a solo analyst struggling to document complex models or an enterprise data team managing hundreds of reports, AutoDoc adapts to your needs and scales with your organization.

The choice is clear: continue struggling with manual documentation processes that consume valuable time and often go incomplete, or embrace the AI-powered solution that makes comprehensive Power BI documentation effortless and automatic.

AutoDoc doesn’t just solve Power BI’s documentation problem—it eliminates it. The question isn’t whether you can afford to implement AutoDoc; it’s whether you can afford not to.

Should you have any questions or need assistance with AutoDoc, please don’t hesitate to contact me using the provided link: https://lawrence.eti.br/contact/

That´s it for Today!

Implementing Data Governance in Microsoft Fabric: A Step-by-Step Guide

Information is arguably an organization’s most valuable asset in today’s data-driven world. However, without proper management, this asset can quickly become a liability. Microsoft Fabric, a revolutionary unified analytics platform, integrates everything from data engineering and data science to data warehousing and business intelligence into a single, SaaS-based environment. It provides powerful tools to store, process, analyze, and visualize vast data. But with great power comes great responsibility. To maintain trust, ensure security, uphold data quality, and meet ever-increasing compliance demands, implementing a robust data governance framework within Fabric isn’t just recommended—it’s essential.

Effective data governance ensures that data remains accurate, secure, consistent, and usable throughout its entire lifecycle, aligning technical capabilities with strategic business goals and stringent regulatory requirements like GDPR, HIPAA, or CCPA. Within the Fabric ecosystem, this translates to leveraging its built-in governance features and its seamless integration with Microsoft Purview, Microsoft’s comprehensive data governance and compliance suite. The goal is to effectively manage and protect sensitive information while empowering users, from data engineers and analysts to business users and compliance officers, to confidently discover, access, and derive value from data within well-defined, secure guardrails.

A well-designed governance plan in Fabric strikes a critical balance between enabling user productivity and innovation and enforcing necessary controls for compliance and risk mitigation. It’s about establishing clear policies, defining roles and responsibilities, and implementing consistent processes so that, as the adage goes, “the right people can take the right actions with the right data at the right time”. This guide provides a practical, step-by-step approach to implementing such a framework within Microsoft Fabric, leveraging its native capabilities and Purview integration to build a governed, trustworthy data estate.

The Critical Importance of Data Governance

Data governance is more than just an IT buzzword or a compliance checkbox; it is a fundamental strategic imperative for any organization looking to leverage its data assets effectively and responsibly. The need for robust governance becomes even more pronounced in the context of a powerful, unified platform like Microsoft Fabric, which brings together diverse data workloads and user personas. Implementing strong data governance practices yields numerous critical benefits:

Ensuring Data Quality and Consistency: Governance establishes standards and processes for creation, maintenance, and usage, leading to more accurate, reliable, and consistent data across the organization. This is crucial for trustworthy analytics and informed decision-making. Poor data quality can lead to flawed insights, operational inefficiencies, and loss of credibility.
Enhancing Data Security and Protection: A core function of governance is to protect sensitive data from unauthorized access, breaches, or misuse. By defining access controls, implementing sensitivity labeling (using tools like Microsoft Purview Information Protection), and enforcing security policies, organizations can safeguard confidential information, protect intellectual property, and maintain customer privacy.
Meeting Regulatory Compliance Requirements: Organizations operate under a complex web of industry regulations and data privacy laws (such as GDPR, CCPA, HIPAA, SOX, etc.). Data governance provides the framework, controls, and audit trails necessary to demonstrate compliance, avoid hefty fines, and mitigate legal risks. Features like data lineage and auditing in Fabric, often powered by Purview, are essential.
Improving Data Discoverability and Usability: A well-governed data estate makes it easier for users to find the data they need. Features like the OneLake data hub, data catalogs, business glossaries, endorsements (certifying or promoting assets), and descriptive metadata help users quickly locate relevant, trustworthy data, fostering reuse and reducing redundant data preparation efforts.
Building Trust and Confidence: When users know that data is well-managed, secure, and accurate, they have greater confidence in the insights derived from it. This trust is foundational for fostering a data-driven culture where decisions are based on reliable evidence.
Optimizing Operational Efficiency: Governance helps streamline data-related processes, reduce data duplication, clarify ownership, and improve team collaboration. This leads to increased efficiency, reduced costs for managing poor-quality or redundant data, and faster time-to-insight.
Enabling Scalability and Innovation: While governance involves controls, it also provides the necessary structure to manage data effectively as volumes and complexity grow. A solid governance foundation allows organizations to innovate confidently, knowing their data practices are sound and scalable.

Data governance transforms data from a potential risk into a reliable, strategic asset, enabling organizations to maximize their value while minimizing associated risks within the Microsoft Fabric environment.

An Overview of Microsoft Fabric

Understanding the platform itself is helpful before diving into the specifics of governance implementation. Microsoft Fabric represents a significant evolution in the analytics landscape, offering an end-to-end, unified platform delivered as a Software-as-a-Service (SaaS) solution. It aims to simplify analytics for organizations by combining disparate data tools and services into a single, cohesive environment built around a central data lake called OneLake.

Fabric integrates various data and analytics workloads, often referred to as “experiences,” which traditionally required separate, usually complex, integrations:

Data Factory: Provides data integration capabilities for ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes, enabling data movement and transformation at scale.
Synapse Data Engineering: A Spark-based large-scale data transformation and preparation platform primarily uses notebooks.
Synapse Data Science: Provides an end-to-end workflow for data scientists to build, deploy, and manage machine learning models.
Synapse Data Warehousing: Delivers a next-generation SQL engine for traditional data warehousing workloads, offering high performance over open data formats.
Synapse Real-Time Analytics: This technology enables the real-time analysis of data streaming from various sources, such as IoT devices and logs.
Power BI: The well-established business intelligence and visualization service, fully integrated for reporting and analytics.
Data Activator: A no-code experience for monitoring data and triggering actions based on detected patterns or conditions.

Shortcuts allow your organization to easily share data between users and applications without unnecessarily moving and duplicating information. When teams work independently in separate workspaces, shortcuts enable you to combine data across different business groups and domains into a virtual data product to fit a user’s specific needs.

A shortcut is a reference to data stored in other file locations. These file locations can be within the same workspace or across different workspaces, within OneLake or external to OneLake in ADLS, S3, or Dataverse, with more target locations coming soon. No matter the location, shortcuts make files and folders look like you have stored locally. For more information on how to use shortcuts, see OneLake shortcuts.

Underpinning all these experiences is OneLake, Fabric’s built-in, tenant-wide data lake. OneLake eliminates data silos by providing a single, unified storage system for all data within Fabric, regardless of which experience created or uses it. It’s built on Azure Data Lake Storage Gen2. Still, it adds shortcuts (allowing data to be referenced without moving or duplicating it) and a unified namespace, simplifying data management and access.

This unified architecture has profound implications for governance. By centralizing data storage (OneLake) and providing a familiar administrative interface (Fabric Admin Portal), Fabric facilitates the application of consistent governance policies, security controls, and monitoring across the entire analytics lifecycle. Features like sensitivity labels and lineage can often propagate automatically across different Fabric items, simplifying the task of governing a complex data estate. Understanding this integrated nature is key to effectively implementing governance within the platform.

Understanding Microsoft Purview: The Governance Foundation

While Microsoft Fabric provides the unified analytics platform, Microsoft Purview is the overarching data governance, risk, and compliance solution that integrates deeply with Fabric to manage and protect the entire data estate. Understanding Purview’s role is crucial for implementing effective governance in Fabric.

Microsoft Purview is a family of solutions designed to help organizations govern, protect, and manage data across their entire landscape, including Microsoft 365, on-premises systems, multi-cloud environments, and SaaS applications like Fabric. Its key capabilities relevant to Fabric governance include:

Unified Data Catalog: Purview automatically discovers and catalogs Fabric items (like lakehouses, warehouses, datasets, reports) alongside other data assets. It creates an up-to-date map of the data estate, enabling users to easily find and understand data through search, browsing, and business glossary terms.
Data Classification and Sensitivity Labels: Through integration with Microsoft Purview Information Protection, Purview allows organizations to define sensitivity labels (e.g., Confidential, PII) and apply them consistently across Fabric items. This classification helps identify sensitive data and drives protection policies.

End-to-End Data Lineage: Purview provides visualization of data lineage, showing how data flows and transforms from its source through various Fabric processes (e.g., Data Factory pipelines, notebooks) down to Power BI reports. This is vital for impact analysis, troubleshooting, and demonstrating compliance.
Data Loss Prevention (DLP): Purview DLP policies can be configured (currently primarily for Power BI semantic models within Fabric) to detect sensitive information based on classifications or patterns (like credit card numbers) and prevent its unauthorized sharing or exfiltration, providing alerts and policy tips.
Auditing: All user and administrative activities within Fabric are logged and made available through Microsoft Purview Audit, providing a comprehensive trail for security monitoring and compliance investigations.
Purview Hub in Fabric: This centralized page within the Fabric experience provides administrators and governance stakeholders with insights into their Fabric data estate, including sensitivity labeling coverage, endorsement status, and a gateway to the broader Purview governance portal.

Purview is the central governance plane that overlays Fabric (and other data sources), providing the tools to define policies, classify data, track lineage, enforce protection, and consistently monitor activities. The seamless integration ensures that as data moves and transforms within Fabric, the governance context (like sensitivity labels and lineage) is maintained, enabling organizations to build a truly governed and trustworthy analytics environment.

https://learn.microsoft.com/en-us/purview/data-governance-overview

Step-by-Step Process for Implementing Data Governance in Microsoft Fabric

Implementing data governance in Microsoft Fabric is a phased process that involves defining policies, configuring technical controls, assigning responsibilities, and establishing ongoing monitoring. Here’s a practical step-by-step guide:

Step 1: Define Your Governance Policies and Framework

Before configuring any tools, establish the foundation – your governance framework. This involves defining the rules, standards, and responsibilities that will guide data handling within Fabric.

Identify Stakeholders and Requirements: Assemble a cross-functional team including representatives from IT, data management, legal, compliance, and key business units. Collaboratively identify all applicable external regulations (e.g., GDPR, HIPAA, or CCPA) and internal business requirements (e.g., data quality standards, retention policies, ethical use guidelines). Understanding these requirements is crucial for tailoring your policies.
Develop Data Classification Policies: Define clear data sensitivity levels (e.g., Public, Internal, Confidential, Highly Restricted). Map these levels to Microsoft Purview Information Protection sensitivity labels. Establish clear policies detailing how data in each classification level must be handled regarding access, sharing, encryption, retention, and disposal. For example, it mandates that all data classified as “Highly Restricted” must be encrypted and access restricted to specific roles. https://learn.microsoft.com/en-us/purview/sensitivity-labels
Configure Tenant Settings via Admin Portal: Fabric administrators should configure tenant-wide governance settings in the Fabric Admin Portal. This includes defining who can create workspaces, setting default sharing behaviors, enabling auditing, configuring capacity settings, and potentially restricting specific Fabric experiences. Many settings can be delegated to domain or capacity admins, where appropriate, for more granular control. Consider licensing requirements for advanced Purview features like automated labeling or DLP. https://learn.microsoft.com/en-us/fabric/admin/about-tenant-settings
Document and Communicate: Document all governance policies, standards, and procedures. Make this documentation easily accessible to all Fabric users. Communicate the policies effectively, explaining their rationale and clarifying user responsibilities. Assign clear accountability for policy enforcement, often involving data stewards, data owners, and workspace administrators.

Step 2: Establish Roles and Access Controls (RBAC)

With policies defined, implement Role-Based Access Control (RBAC) to enforce them.

Utilize Workspace Roles: Assign users or (preferably) security groups to Fabric workspace roles (Admin, Member, Contributor, Viewer) based on the principle of least privilege. Understand the permissions associated with each role to ensure users only have the access necessary for their jobs. https://learn.microsoft.com/en-us/fabric/fundamentals/roles-workspaces
Leverage Security Groups: Manage access primarily through Microsoft Entra ID (formerly Azure AD) security groups rather than individual user assignments. This simplifies administration as team memberships change.
Assign Admin Roles: Carefully assign higher-level administrative roles: Fabric Administrator (tenant-wide), Domain Administrator (for specific business areas), and Capacity Administrator (for managing compute resources)—delegate responsibilities where appropriate to distribute the governance workload. https://learn.microsoft.com/en-us/fabric/admin/roles
Implement Granular Security: For sensitive data within datasets, lakehouses, or warehouses, implement Row-Level Security (RLS) and Object-Level Security (OLS) (e.g., column-level or table/view permissions) to restrict access at a finer grain than workspace roles allow. https://learn.microsoft.com/en-us/fabric/data-warehouse/row-level-security, https://learn.microsoft.com/en-us/fabric/security/service-admin-row-level-security and https://learn.microsoft.com/en-us/fabric/data-warehouse/column-level-security
Establish Access Review Processes: Implement procedures for requesting, approving, and periodically reviewing access permissions, especially for sensitive data or privileged roles. Maintain logs of approvals for audit purposes.

Step 3: Configure Workspaces and Domains

Organize your Fabric environment logically to support governance.

Structure Domains: Group workspaces into logical domains, typically aligned with business units or subject areas (e.g., Finance, Marketing, Product Analytics). This facilitates delegated administration and helps users discover relevant data. https://learn.microsoft.com/en-us/fabric/governance/domains
Organize Workspaces: Within domains, organize workspaces based on purpose (e.g., project, team) or environment (Development, Test, Production). Use clear naming conventions and descriptions. Assign workspaces to the appropriate domain. https://learn.microsoft.com/en-us/fabric/fundamentals/workspaces
Apply Workspace Settings: Configure settings within each workspace, such as contact lists, license modes (Pro, PPU, Fabric capacity), and connections to resources like Git for version control, aligning them with your governance policies.
Consider Lifecycle Management: Use separate workspaces and potentially Fabric deployment pipelines to manage content promotion from development through testing to production, ensuring only validated assets reach end-users. https://learn.microsoft.com/en-us/fabric/cicd/deployment-pipelines/understand-the-deployment-process?tabs=new-ui

Step 4: Implement Data Protection and Security Measures

Actively protect your data assets using built-in and integrated tools.

Apply Sensitivity Labels: Implement the data classification policy by applying Microsoft Purview Information Protection sensitivity labels to Fabric items (datasets, reports, lakehouses, etc.). Use a combination of manual labeling by users, default labeling on workspaces or items, and automated labeling based on sensitive information types detected by Purview scanners. Ensure label inheritance policies are configured appropriately. https://learn.microsoft.com/en-us/power-bi/enterprise/service-security-enable-data-sensitivity-labels
Configure Data Loss Prevention (DLP) Policies: Define and enable Microsoft Purview DLP policies specifically for Power BI (and potentially other Fabric endpoints as capabilities expand) to detect and prevent the inappropriate sharing or exfiltration of sensitive data identified by sensitivity labels. (Note: Requires specific Purview licensing.) https://learn.microsoft.com/en-us/fabric/governance/data-loss-prevention-configure
Leverage Encryption: Understand and utilize Fabric’s encryption capabilities, including encryption at rest (often managed by the platform) and potentially customer-managed keys (CMK) for enhanced control over encryption if required. https://learn.microsoft.com/en-us/fabric/security/security-scenario

Step 5: Enable Monitoring and Auditing

Visibility into data usage and governance activities is crucial.

Utilize the Monitoring Hub to monitor the status and history of Fabric activities like pipeline runs, Spark jobs, and dataflow refreshes. https://learn.microsoft.com/en-us/fabric/admin/monitoring-hub
Analyze Capacity Metrics: Use the Fabric Capacity Metrics app to monitor resource utilization, performance, and potential throttling, helping optimize costs and ensure service levels. https://learn.microsoft.com/en-us/fabric/enterprise/metrics-app
Leverage the Purview Hub: Use the integrated Purview Hub within Fabric as a central point for administrators and data owners to view governance insights, manage metadata, and track compliance status. https://learn.microsoft.com/en-us/fabric/governance/use-microsoft-purview-hub?tabs=overview
Enable and Review Audit Logs: Ensure Fabric auditing is enabled and integrated with the Microsoft Purview compliance portal. Regularly review audit logs to track user activities, access patterns, policy changes, and potential security incidents. https://learn.microsoft.com/en-us/fabric/admin/track-user-activities
Use Admin Monitoring Workspace: Fabric administrators can use the dedicated monitoring workspace to gain insights into tenant-wide activities, usage patterns, and adoption. https://learn.microsoft.com/en-us/fabric/admin/monitoring-workspace

Step 6: Foster Data Discovery, Trust, and Reuse

Governance should also empower users by making trustworthy data easily accessible.

Promote the OneLake Data Hub: Encourage users to utilize the OneLake Data Hub to discover existing data items across the tenant, filtering by domain, keywords, or endorsement status. https://learn.microsoft.com/en-us/fabric/governance/onelake-catalog-overview and https://learn.microsoft.com/en-us/fabric/governance/onelake-catalog-govern
Implement Endorsement: Establish a straightforward process for promoting and certifying high-quality, reliable data assets (datasets, dataflows, reports). Clearly define the criteria for certification and authorize specific reviewers. This builds user trust in endorsed assets. https://learn.microsoft.com/en-us/fabric/governance/endorsement-overview
Utilize Tags: Encourage using relevant tags on fabric items to improve organization and searchability beyond standard metadata. https://learn.microsoft.com/en-us/fabric/governance/tags-overview
Leverage Lineage and Impact Analysis: Use Fabric’s lineage view to understand data origins, transformations, and dependencies. Before changing upstream items, perform an impact analysis to understand potential effects on downstream reports or processes. https://learn.microsoft.com/en-us/fabric/governance/lineage
Build a Business Glossary: Integrate with Microsoft Purview Data Catalog to build and maintain a business glossary, consistently defining key terms and metrics across the organization and linking them to relevant Fabric assets. https://learn.microsoft.com/en-us/purview/legacy/how-to-create-manage-glossary-term

Step 7: Monitor, Iterate, and Optimize

Data governance is not a one-time project but an ongoing process.

Regularly Review and Audit: Periodically review governance policies, access controls, label usage, and audit logs to ensure effectiveness and identify areas for improvement. https://learn.microsoft.com/en-us/fabric/governance/governance-compliance-overview
Gather Feedback: Solicit feedback from users and stakeholders on the governance processes and tools.
Adapt and Update: Update policies and configurations based on audit findings, user feedback, changing regulations, and evolving business needs. Stay informed about new Fabric and Purview governance features.

By following these steps, organizations can establish a comprehensive and practical data governance framework within Microsoft Fabric, enabling them to harness the full power of the platform while maintaining control, security, and compliance.

Real-World Examples: Data Governance in Action

The principles and steps outlined above are not just theoretical; organizations are actively implementing robust data governance frameworks using Microsoft Fabric and Purview to overcome challenges and drive value. Let’s look at a couple of examples:

1. Microsoft’s Internal Transformation:

Microsoft itself faced significant hurdles with its vast and complex data estate. Data was siloed across various business units and managed inconsistently, making it difficult to gain a unified enterprise view. Governance was often perceived as a bottleneck, hindering the pace of digital transformation. Microsoft embarked on its data transformation journey, leveraging its tools to address this.

Their strategy involved building an enterprise data platform centered around Microsoft Fabric as the unifying analytics foundation and Microsoft Purview for governance. Fabric helped break down silos by providing a common platform (including OneLake) for data integration and analytics across diverse sources. Purview was then layered on top to enable responsible data democratization. This meant implementing controls like a shared data catalog and consistent policies, not to restrict access arbitrarily, but to enable broader, secure access to trustworthy data. A key cultural shift was viewing governance as an accelerator for transformation, facilitated by the unified data strategy and strong leadership alignment. The outcome is a more agile, regulated, and business-focused data environment that fuels faster decision-making and innovation.

2. Leading Financial Institution:

A leading bank operating in a highly regulated industry revolutionized its data governance with Microsoft Purview. While specific challenges aren’t detailed in the summary, typical banking concerns include operational efficiency, stringent compliance requirements (like GDPR), data security, and preventing sensitive data loss.

By implementing Purview, the bank achieved significant improvements. Operationally, automated data discovery and a centralized view allowed business users to find information faster and reduced manual effort in reporting. From a compliance perspective, Purview provided centralized metrics for monitoring the compliance posture and automated processes for classifying and tagging data according to regulations, strengthening overall security. Furthermore, implementing Data Loss Prevention (DLP) rules based on data sensitivity helped safeguard critical information and prevent unauthorized access or sharing. Purview acted as a unified platform, enhancing efficiency, visibility, security, and control over the bank’s data assets.

These examples illustrate how organizations, facing everyday challenges like data silos, compliance pressures, and the need for agility, are successfully using Microsoft Fabric and Purview to establish effective data governance. They highlight the importance of a unified data strategy, the role of tools in automating and centralizing controls, and the cultural shift towards viewing governance as an enabler of business value.

Conclusion

Microsoft Fabric offers a robust, unified platform for end-to-end analytics, but realizing its full potential requires a deliberate and comprehensive approach to data governance. As we’ve explored, implementing governance in Fabric is not merely about restricting access; it’s about establishing a framework that ensures data quality, security, compliance, and usability, fostering trust and enabling confident, data-driven decision-making across the organization.

The real-world examples, from Microsoft’s internal transformation to implementations in regulated industries like finance, demonstrate that these are not just theoretical concepts. Organizations are actively leveraging Fabric’s unified foundation and Purview’s comprehensive governance capabilities to overcome tangible challenges like data silos, inconsistent management, compliance burdens, and operational inefficiencies.

By integrating Fabric’s built-in features—such as the Admin Portal, domains, workspaces, RBAC, endorsement, and lineage—with the advanced capabilities of Microsoft Purview—including Information Protection sensitivity labels, Data Loss Prevention, auditing, and the unified data catalog—organizations can create a robust governance posture tailored to their specific needs.

The outlined step-by-step process provides a roadmap, but the journey requires more than technical implementation. Success hinges on several key factors, reinforced by real-world experience:

Key Recommendations for Success:

Strategic Alignment and Collaboration: As seen in Microsoft’s case, define clear governance objectives that are aligned with business goals before configuring tools. Data governance requires a cultural shift and strong leadership alignment. It’s a team effort involving IT, data, legal, compliance, and business units.
Leverage the Unified Platform (Fabric + Purview): Treat Fabric and Purview as an integrated solution. Use Fabric to unify the data estate and Purview to apply consistent governance controls across it, enabling responsible democratization and breaking down silos.
Prioritize Automation for Efficiency and Consistency: Automate governance tasks like sensitivity labeling, policy enforcement (DLP), and monitoring wherever possible. As the banking case study demonstrated, this reduces manual effort, ensures consistency, improves responsiveness, and boosts operational efficiency.
Focus on User Empowerment and Education: Balance control with usability. Provide clear documentation, training, and tools (like the OneLake Data Hub and Purview catalog) to help users understand policies, find trustworthy data, and comply with requirements – turning governance into an accelerator, not a blocker.
Implement Incrementally and Iterate: Data governance is an ongoing journey. Start with a pilot or focus on critical assets first. Monitor effectiveness, gather feedback, and continuously refine your approach based on evolving needs, regulations, and platform capabilities.

By taking a structured, collaborative, and tool-aware approach, informed by others’ successes, organizations can build a foundation of trust and control within Microsoft Fabric, transforming governance from a perceived burden into a strategic enabler that unlocks the actual value of their data.

Should you have any questions or need assistance about Microsoft Fabric or Microsoft Purview, please don’t hesitate to contact me using the provided link: https://lawrence.eti.br/contact/

That’s it for today!

Sources

Implementing Data Governance in Power BI (Lawrence ETI Blog) – https://lawrence.eti.br/2023/01/29/implementing-data-governance-in-power-bi-a-step-by-step-guide/
Microsoft Learn: Governance and compliance in Microsoft Fabric – https://learn.microsoft.com/en-us/fabric/governance/governance-compliance-overview
Microsoft Learn: Govern your Fabric data with the OneLake catalog – https://learn.microsoft.com/en-us/fabric/governance/onelake-catalog-govern
Microsoft Learn: What is Microsoft Fabric? – https://learn.microsoft.com/en-us/fabric/fundamentals/microsoft-fabric-overview
Microsoft Learn: OneLake, the OneDrive for data – https://learn.microsoft.com/en-us/fabric/onelake/onelake-overview
Microsoft Learn: Information protection in Microsoft Fabric – https://learn.microsoft.com/en-us/fabric/governance/information-protection
Microsoft Learn: What is the admin portal? – https://learn.microsoft.com/en-us/fabric/admin/admin-overview
Microsoft Learn: Workspace roles – https://learn.microsoft.com/en-us/fabric/get-started/roles-workspaces
Microsoft Learn: Admin roles – https://learn.microsoft.com/en-us/fabric/admin/roles
Microsoft Learn: Security overview – https://learn.microsoft.com/en-us/fabric/security/security-overview#data-security
Microsoft Learn: Domains – https://learn.microsoft.com/en-us/fabric/governance/domains
Microsoft Learn: Introduction to deployment pipelines – https://learn.microsoft.com/en-us/fabric/cicd/deployment-pipelines/intro-deployment-pipelines
Microsoft Learn: Learn about data loss prevention policies for Power BI – https://learn.microsoft.com/en-us/purview/dlp-powerbi-learn-about
Microsoft Learn: Encryption – https://learn.microsoft.com/en-us/fabric/security/security-encryption
Microsoft Learn: Monitoring hub – https://learn.microsoft.com/en-us/fabric/admin/monitoring-hub
Microsoft Learn: Microsoft Fabric Capacity Metrics – https://learn.microsoft.com/en-us/fabric/enterprise/metrics-app
Microsoft Learn: Use the Microsoft Purview hub – https://learn.microsoft.com/en-us/fabric/governance/use-microsoft-purview-hub
Microsoft Learn: Track user activities in Microsoft Fabric – https://learn.microsoft.com/en-us/fabric/admin/track-user-activities
Microsoft Learn: Admin monitoring – https://learn.microsoft.com/en-us/fabric/admin/admin-monitoring
Microsoft Learn: OneLake data hub – https://learn.microsoft.com/en-us/fabric/governance/onelake-catalog-overview
Microsoft Learn: Endorsement overview – https://learn.microsoft.com/en-us/fabric/governance/endorsement-overview
Microsoft Learn: Tags – https://learn.microsoft.com/en-us/fabric/governance/tags
Microsoft Learn: Lineage – https://learn.microsoft.com/en-us/fabric/governance/lineage
Microsoft Learn: Create and manage glossary terms – https://learn.microsoft.com/en-us/purview/how-to-create-edit-glossary-terms
YASH Technologies Blog: Microsoft Fabric Data Governance: What You Need Today – https://www.yash.com/blog/comprehensive-guide-to-data-governance-in-microsoft-fabric/
Microsoft Inside Track: Transforming Data Governance at Microsoft with Purview and Fabric – https://www.microsoft.com/insidetrack/blog/transforming-data-governance-at-microsoft-with-microsoft-purview-and-microsoft-fabric/
Kanerika Case Study: Achieving Data Governance Excellence with Microsoft Purview – https://kanerika.com/case-studies/revolutionizing-data-governance-for-a-leading-bank-with-microsoft-purview/

The Collapse of the Backend: AI Agents as the New Logic Layer – From CRUD to Smart Databases

According to Satya Nadella, Microsoft’s CEO vision, for decades, traditional back-end systems have served as the backbone of modern software applications, enabling the seamless processing of data and execution of business logic. Yet, a technological revolution spearheaded by AI agents is poised to disrupt this foundational paradigm. As these intelligent systems take center stage, the role of back-end logic in database management is undergoing a profound transformation—and in some cases, it’s disappearing altogether. The implications are vast, affecting not only how applications are built but also how businesses operate and innovate.

The Traditional Role of Back-End Logic

At its core, the back end of an application manages three primary tasks:

Database Operations: Performing CRUD (Create, Read, Update, Delete) operations on structured data stored in relational or non-relational databases. These operations form the lifeblood of application functionality, serving as the intermediary between users and data. Without this foundational capability, no application could fulfill its purpose effectively.
Business Logic Implementation: Enforcing rules, workflows, and validations that dictate how data is processed. Business logic ensures consistency and relevance, transforming raw data into actionable insights. This layer has traditionally required meticulous coding to capture all possible scenarios and use cases, often resulting in systems that are inflexible and difficult to scale.
API Management: Enabling secure and efficient communication between front-end interfaces and back-end systems. APIs have been the cornerstone of modular and scalable application architectures, facilitating interoperability and enabling ecosystems of interconnected applications.

While this architecture has served the industry well, it’s not without limitations. Hardcoding business logic into back-end layers creates rigidity, slows innovation, and makes scaling complex systems resource-intensive. Enter AI agents, poised to address these limitations with unprecedented efficiency.

AI Agents: Redefining Back-End Logic

AI agents represent a paradigm shift in how business applications operate. These systems, powered by advanced large language models (LLMs) and deep learning algorithms, are capable of:

Understanding natural language queries and translating them into actionable database operations, making data more accessible to a broader range of users.
Automating complex workflows by dynamically adapting to user needs and data contexts, effectively learning from interactions to become more efficient over time.
Replacing traditional static rules with flexible, AI-driven decision-making processes that adjust based on real-time data and evolving business requirements.
Providing predictive insights and recommendations, going beyond traditional logic to add strategic value and enhance decision-making capabilities.

This shift effectively collapses the back-end logic layer into an AI-powered orchestration layer, drastically reducing the need for predefined logic encoded in application code. AI agents execute and evolve, learning from interactions to improve accuracy and relevance over time.

How AI Agents Replacing Back-End Logic

Direct Database Interactions AI agents eliminate the need for static APIs by interacting directly with databases. For example, instead of relying on a back-end service to process a customer query, an AI agent can dynamically generate and execute SQL queries to retrieve and analyze data. This real-time interaction reduces latency and enhances user experience. It also enables faster iteration, as developers no longer need to write intermediary code for every new functionality.
Dynamic Business Logic Traditional business logic often requires developers to write, test, and deploy code changes for every new rule or workflow. AI agents, on the other hand, can adapt to new scenarios by learning from data patterns and user interactions, reducing the dependency on human intervention. This adaptability allows businesses to respond to market changes more rapidly, fostering a culture of agility and resilience.
Orchestrating Multi-Repository Operations AI agents can perform CRUD operations across multiple databases or repositories without discrimination. This capability simplifies data integration, enabling seamless workflows that previously required complex middleware solutions. Whether it’s synchronizing customer records across platforms or aggregating analytics from diverse sources, AI agents streamline operations, reducing costs and minimizing errors.
Natural Language Interfaces By interpreting natural language commands, AI agents remove the necessity of rigid front-end forms and predefined user inputs. This enables users to engage with databases directly, bypassing traditional back-end processing layers. Natural language interfaces democratize access, empowering non-technical users to interact with data effortlessly. Such democratization is critical in an era where data-driven decision-making is a competitive advantage.

Implications for Developers and Businesses

The rise of AI agents as the new logic layer brings both challenges and opportunities:

For Developers: the focus shifts from writing application-specific logic to architecting AI-ready data systems. Skills in data engineering, AI model training, and natural language processing become essential. Developers must now consider creating adaptive, context-aware systems that prioritize user needs over rigid workflows.
For Businesses: Reducing reliance on monolithic applications can help companies achieve greater agility. AI agents empower businesses to automate processes, enhance scalability, and provide personalized user experiences. This shift opens new avenues for innovation and operational efficiency. Businesses that adopt AI-first strategies early will likely gain a significant competitive edge.

Case in Point: Excel with Python and AI

Satya Nadella, Microsoft’s CEO, has highlighted how Excel is evolving into an agent-driven platform. With Python integration and Copilot, Excel can now interpret data, automate analysis, and generate actionable insights without predefined macros or back-end workflows. This example illustrates how even established tools are transforming under the influence of AI.

Excel no longer functions merely as a spreadsheet; it has become a dynamic analytics platform. AI agents leverage Python for advanced computation and Excel’s interface for visualization, creating a powerful combination for data-driven decision-making. This integration underscores the broader trend of tools evolving into AI-powered ecosystems, enhancing their utility and relevance in modern workflows.

Vanna AI: An Example of Agent-Driven Data Interaction

A compelling real-world example of this transformation is Vanna AI, a tool that revolutionizes data analysis by enabling conversational interactions with databases. Vanna AI integrates seamlessly with Azure SQL Database, allowing users to query and analyze data using plain language. By translating natural language questions into precise SQL commands, Vanna AI bridges the gap between complex data operations and user-friendly interaction.

What sets Vanna AI apart is its ability to contextualize database structures, such as schemas and historical queries, ensuring accurate SQL generation. This democratizes data access, empowering non-technical users to glean insights from their data without requiring specialized skills. It exemplifies how AI agents are reshaping workflows by removing traditional barriers and introducing unprecedented flexibility and efficiency.

For instance, I developed a practical application that integrates Vanna AI with the Microsoft Adventure Works database. The app enables natural language queries against the database, demonstrating how AI agents can streamline even the most complex data interactions. This serves as a real-world example of a new log layer for databases—specifically, Azure SQL Database.

By doing so, Vanna AI demonstrates the power of AI-driven tools to transform database interactions, reflecting the principles I outlined in my earlier article about Vanna. Follow below:

Interactive Data Analysis: Chat with Your Data in Azure SQL Database Using Vanna AI

Challenges in the Transition

While the collapse of traditional back-end logic is promising, it’s not without hurdles:

Data Privacy and Security: Direct database interactions by AI agents demand robust governance to prevent unauthorized access and ensure compliance. As AI systems become integral to workflows, maintaining data integrity and security will be paramount.
Bias and Reliability: AI models must be carefully trained to avoid biases and ensure accurate decision-making. Developers and businesses must invest in robust testing and continuous monitoring to mitigate risks. The potential for misuse or misinterpretation of AI-generated insights requires vigilance.
Legacy Systems: Transitioning from existing architectures to AI-first systems may require significant investment in infrastructure and training. Organizations must balance innovation with the realities of maintaining business continuity. Change management strategies will be critical in ensuring a smooth transition.

The Future of Back-End Logic

The rise of AI agents signals the dawn of a new era in software development. As these systems become increasingly sophisticated, the traditional notion of a back-end layer may fade, replaced by a fluid, intelligent logic layer capable of adapting to the ever-changing needs of businesses and users. For developers and businesses alike, the challenge now is to embrace this transformation and harness the power of AI to unlock unprecedented efficiency and innovation.

AI agents represent more than just a technological evolution; they embody a philosophical shift in how we conceptualize and interact with software. By decentralizing logic and empowering users, these systems pave the way for a future where applications are not just tools but intelligent collaborators. They are the catalysts for a world where innovation is limited only by imagination.

Conclusion

This vision of the Microsoft CEO is not yet a reality, but according to the video Satya Nadella On Evolution of SaaS, it represents a direction Microsoft is implementing in its solutions. As technology professionals, we must closely follow these changes to avoid being caught off guard in the future. In this article, a few examples were presented, still in their early stages, such as the Copilot in Excel with Python and the application I developed using the Vanna API, which has no connection to Microsoft. These are just a few examples that show we are moving toward this transformation.

As this transformation unfolds, businesses and developers must embrace the tools and paradigms that will define the agent-driven future. This includes prioritizing AI-native architectures, fostering a culture of adaptability, and ensuring that ethics and data governance remain at the forefront. The organizations that succeed will be those that see AI agents not as a replacement, but as an extension of their capabilities—collaborators that empower human ingenuity and drive innovation. Together, humans and AI can forge a new era of intelligent, seamless, and transformative solutions.

That’s it for today!

Sources

https://medium.com/@younes10sillimi/microsofts-vision-the-end-of-traditional-business-apps-and-saas-the-beginning-of-the-agent-era-2492833f5d6a
https://www.cxtoday.com/data-analytics/microsoft-ceo-ai-agents-will-transform-saas-as-we-know-it/
https://corner.buka.sh/the-death-of-saas-satya-nadellas-vision-for-an-ai-driven-future/

Integrating Azure OpenAI with Native Vector Support in Azure SQL Databases for Advanced Search Capabilities and Data Insights

Azure SQL Database has taken a significant step forward by introducing native support for vectors, unlocking advanced capabilities for applications that rely on semantic search, AI, and machine learning. By integrating vector search into Azure SQL, developers can now store, search, and analyze vector data directly alongside traditional SQL data, offering a unified solution for complex data analysis and enhanced search experiences.

Vectors in Azure SQL Database

Vectors are numerical representations of objects like text, images, or audio. They are essential for applications involving semantic search, recommendation systems, and more. These vectors are typically generated by machine learning models, capturing the semantic meaning of the data they represent.

The new vector functionality in Azure SQL Database allows you to store and manage these vectors within a familiar SQL environment. This eliminates the need for separate vector databases, streamlining your application architecture and simplifying your data management processes.

Key Benefits of Native Vector Support in Azure SQL

Unified Data Management: Store and query both traditional and vector data in a single database, reducing complexity and maintenance overhead.
Advanced Search Capabilities: Perform similarity searches alongside standard SQL queries, leveraging Azure SQL’s sophisticated query optimizer and powerful enterprise features.
Optimized Performance: Vectors are stored in a compact binary format, allowing for efficient distance calculations and optimized performance on vector-related operations.

Embeddings: The Foundation of Vector Search

At the heart of vector search are embeddings—dense vector representations of objects, generated by deep learning models. These embeddings capture the semantic similarities between related concepts, enabling tasks such as semantic search, natural language processing, and recommendation systems.

For example, word embeddings can cluster related words like “computer,” “software,” and “machine,” while distant clusters might represent words with entirely different meanings, such as “lion,” “cat,” and “dog.” These embeddings are particularly powerful in applications where context and meaning are more important than exact keyword matches.

Azure OpenAI makes it easy to generate embeddings by providing pre-trained machine learning models accessible through REST endpoints. Once generated, these embeddings can be stored directly in an Azure SQL Database, allowing you to perform vector search queries to find similar data points.

You can explore how vector embeddings work by visiting this amazing website: Transformer Explainer. It offers an excellent interactive experience to help you better understand how Generative AI operates in general.

Vector Search Use Cases

Vector search is a powerful technique used to find vectors in a dataset that are similar to a given query vector. This capability is essential in various applications, including:

Semantic Search: Rank search results based on their relevance to the user’s query.
Recommendation Systems: Suggest related items based on similarity in vector space.
Clustering: Group similar items together based on vector similarity.
Anomaly Detection: Identify outliers in data by finding vectors that differ significantly from the norm.
Classification: Classify items based on the similarity of their vectors to predefined categories.

For instance, consider a semantic search application where a user queries for “healthy breakfast options.” A vector search would compare the vector representation of the query with vectors representing product reviews, finding the most contextually relevant items—even if the exact keywords don’t match.

Key Features of Native Vector Support in Azure SQL

Azure SQL’s native vector support introduces several new functions to operate on vectors, which are stored in a binary format to optimize performance. Here are the key functions:

JSON_ARRAY_TO_VECTOR: Converts a JSON array into a vector, enabling you to store embeddings in a compact format.
ISVECTOR: Checks whether a binary value is a valid vector, ensuring data integrity.
VECTOR_TO_JSON_ARRAY: Converts a binary vector back into a human-readable JSON array, making it easier to work with the data.
VECTOR_DISTANCE: Calculates the distance between two vectors using a chosen distance metric, such as cosine or Euclidean distance.

These functions enable powerful operations for creating, storing, and querying vector data in Azure SQL Database.

Example: Vector Search in Action

Let’s walk through an example of using Azure SQL Database to store and query vector embeddings. Imagine you have a table of customer reviews, and you want to find reviews that are contextually related to a user’s search query.

Storing Embeddings as Vectors:
After generating embeddings using Azure OpenAI, you can store these vectors in a VARBINARY(8000) column in your SQL table:

SQL

   ALTER TABLE [dbo].[FineFoodReviews] ADD [VectorBinary] VARBINARY(8000);
   UPDATE [dbo].[FineFoodReviews]
   SET [VectorBinary] = JSON_ARRAY_TO_VECTOR([vector]);

This allows you to store the embeddings efficiently, ready for vector search operations.

Performing Similarity Searches:
To find reviews that are similar to a user’s query, you can convert the query into a vector and calculate the cosine distance between the query vector and the stored embeddings:

SQL

   DECLARE @e VARBINARY(8000);
   EXEC dbo.GET_EMBEDDINGS @model = '<yourmodeldeploymentname>', @text = 'healthy breakfast options', @embedding = @e OUTPUT;

   SELECT TOP(10) ProductId,
                  Summary,
                  Text,
                  VECTOR_DISTANCE('cosine', @e, VectorBinary) AS Distance
   FROM dbo.FineFoodReviews
   ORDER BY Distance;

This query returns the top reviews that are contextually related to the user’s search, even if the exact words don’t match.

Hybrid Search with Filters:
You can enhance vector search by combining it with traditional keyword filters to improve relevance and performance. For example, you could filter reviews based on criteria like user identity, review score, or the presence of specific keywords, and then apply vector search to rank the results by relevance:

SQL

   -- Comprehensive query with multiple filters.
   SELECT TOP(10)
       f.Id,
       f.ProductId,
       f.UserId,
       f.Score,
       f.Summary,
       f.Text,
       VECTOR_DISTANCE('cosine', @e, VectorBinary) AS Distance,
       CASE 
           WHEN LEN(f.Text) > 100 THEN 'Detailed Review'
           ELSE 'Short Review'
       END AS ReviewLength,
       CASE 
           WHEN f.Score >= 4 THEN 'High Score'
           WHEN f.Score BETWEEN 2 AND 3 THEN 'Medium Score'
           ELSE 'Low Score'
       END AS ScoreCategory
   FROM FineFoodReviews f
   WHERE
       f.UserId NOT LIKE 'Anonymous%'  -- Exclude anonymous users
       AND f.Score >= 2               -- Score threshold filter
       AND LEN(f.Text) > 50           -- Text length filter for detailed reviews
       AND (f.Text LIKE '%gluten%' OR f.Text LIKE '%dairy%') -- Keyword filter
   ORDER BY
       Distance,  -- Order by cosine distance
       f.Score DESC, -- Secondary order by review score
       ReviewLength DESC; -- Tertiary order by review length

This query combines semantic search with traditional filters, balancing relevance and computational efficiency.

Leveraging REST Services for Embedding Generation

Azure OpenAI provides REST endpoints for generating embeddings, which can be consumed directly from Azure SQL Database using the sp_invoke_external_rest_endpoint system stored procedure. This integration enables seamless interaction between your data and AI models, allowing you to build intelligent applications that combine the power of machine learning with the familiarity of SQL.

Here’s a stored procedure example that retrieves embeddings from a deployed Azure OpenAI model and stores them in the database:

SQL

CREATE PROCEDURE [dbo].[GET_EMBEDDINGS]
(
    @model VARCHAR(MAX),
    @text NVARCHAR(MAX),
    @embedding VARBINARY(8000) OUTPUT
)
AS
BEGIN
    DECLARE @retval INT, @response NVARCHAR(MAX);
    DECLARE @url VARCHAR(MAX);
    DECLARE @payload NVARCHAR(MAX) = JSON_OBJECT('input': @text);

    SET @url = 'https://<resourcename>.openai.azure.com/openai/deployments/' + @model + '/embeddings?api-version=2023-03-15-preview';

    EXEC dbo.sp_invoke_external_rest_endpoint 
        @url = @url,
        @method = 'POST',   
        @payload = @payload,   
        @headers = '{"Content-Type":"application/json", "api-key":"<openAIkey>"}', 
        @response = @response OUTPUT;

    DECLARE @jsonArray NVARCHAR(MAX) = JSON_QUERY(@response, '$.result.data[0].embedding');
    SET @embedding = JSON_ARRAY_TO_VECTOR(@jsonArray);
END
GO

This stored procedure retrieves embeddings from the Azure OpenAI model and converts them into a binary format for storage in the database, making them available for similarity search and other operations.

Let’s implementing a experiment with the Native Vector Support in Azure SQL

Azure SQL Database provides a seamless way to store and manage vector data despite not having a specific vector data type. Column-store indexes, vectors, and essentially lists of numbers can be efficiently stored in a table. Each vector can be represented in a row with individual elements as columns or serialized arrays. This approach ensures efficient storage and retrieval, making Azure SQL suitable for large-scale vector data management.

I used the Global News Dataset from Kaggle in my experiment.

First, you must create the columns to save the vector information. In my case, I created two columns: title_vector For the news title and content_vector the news content. For this, create a small Python code, but you can also do that directly from SQL using a cursor. It's important to know that you don't need to pay for any Vector Databases by saving the vector information inside the Azure SQL.

Python

from litellm import embedding
import pyodbc  # or another SQL connection library
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Set up OpenAI credentials from environment variables
os.environ['AZURE_API_KEY'] =os.getenv('AZURE_API_KEY')
os.environ['AZURE_API_BASE'] = os.getenv('AZURE_API_BASE')
os.environ['AZURE_API_VERSION'] = os.getenv('AZURE_API_VERSION')

# Connect to your Azure SQL database
conn = pyodbc.connect(f'DRIVER={{ODBC Driver 17 for SQL Server}};'
                      f'SERVER={os.getenv("DB_SERVER")};'
                      f'DATABASE={os.getenv("DB_DATABASE")};'
                      f'UID={os.getenv("DB_UID")};'
                      f'PWD={os.getenv("DB_PWD")}')

def get_embeddings(text):
    # Truncate the text to 8191 characters bacause of the text-embedding-3-     small OpenAI API Embedding model limit
    truncated_text = text[:8191]

    response = embedding(
        model="azure/text-embedding-3-small",
        input=truncated_text,
        api_key=os.getenv('AZURE_API_KEY'),
        api_base=os.getenv('AZURE_API_BASE'),
        api_version=os.getenv('AZURE_API_VERSION')
        )
        
    embeddings = response['data'][0]['embedding']
    return embeddings


def update_database(article_id, title_vector, content_vector):
    cursor = conn.cursor()

    # Convert vectors to strings
    title_vector_str = str(title_vector)
    content_vector_str = str(content_vector)

    # Update the SQL query to use the string representations
    cursor.execute("""
        UPDATE newsvector
        SET title_vector = ?, content_vector = ?
        WHERE article_id = ?
    """, (title_vector_str, content_vector_str, article_id))
    conn.commit()


def embed_and_update():
    cursor = conn.cursor()
    cursor.execute("SELECT article_id, title, full_content FROM newsvector where title_vector is null and full_content is not null and title is not null order by published asc")
    
    title_vector = ""
    content_vector = ""
    
    for row in cursor.fetchall():
        article_id, title, full_content = row
        
        print(f"Embedding article {article_id} - {title}")
        
        title_vector = get_embeddings(title)
        content_vector = get_embeddings(full_content)
        
        update_database(article_id, title_vector, content_vector)

embed_and_update()

These two columns will contain something like this: [-0.02232750505208969, -0.03755787014961243, -0.0066827102564275265…]

Second, you must create a procedure in the Azure Database to transform the query into a vector embedding.

SQL

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[GET_EMBEDDINGS]
(
    @model VARCHAR(MAX),
    @text NVARCHAR(MAX),
    @embedding VARBINARY(8000) OUTPUT
)
AS
BEGIN
    DECLARE @retval INT, @response NVARCHAR(MAX);
    DECLARE @url VARCHAR(MAX);
    DECLARE @payload NVARCHAR(MAX) = JSON_OBJECT('input': @text);

    -- Set the @url variable with proper concatenation before the EXEC statement
    SET @url = 'https://<Your App>.openai.azure.com/openai/deployments/' + @model + '/embeddings?api-version=2024-02-15-preview';

    EXEC dbo.sp_invoke_external_rest_endpoint 
        @url = @url,
        @method = 'POST',   
        @payload = @payload,   
        @headers = '{"Content-Type":"application/json", "api-key":"<Your Azure Open AI API Key"}', 
        @response = @response OUTPUT;

    -- Use JSON_QUERY to extract the embedding array directly
    DECLARE @jsonArray NVARCHAR(MAX) = JSON_QUERY(@response, '$.result.data[0].embedding');

    
    SET @embedding = JSON_ARRAY_TO_VECTOR(@jsonArray);
END

I also create another procedure to search directly to the dataset using the Native Vector Support in Azure SQL.

SQL

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[SearchNewsVector] 
    @inputText NVARCHAR(MAX)
AS
BEGIN
    -- Query the SimilarNewsContentArticles table using the response
    IF OBJECT_ID('dbo.result', 'U') IS NOT NULL
        DROP TABLE dbo.result;

	--Assuming you have a stored procedure to get embeddings for a given text
	DECLARE @e VARBINARY(8000);
	EXEC dbo.GET_EMBEDDINGS @model = 'text-embedding-3-small', @text = @inputText, @embedding = @e OUTPUT;

	SELECT TOP(10) 
       [article_id]
      ,[source_id]
      ,[source_name]
      ,[author]
      ,[title]
      ,[description]
      ,[url]
      ,[url_to_image]
      ,[content]
      ,[category]
      ,[full_content]
      ,[title_vector]
      ,[content_vector]
      ,[published]
      ,VECTOR_DISTANCE('cosine', @e, VectorBinary) AS cosine_distance
	into result
	FROM newsvector
	ORDER BY cosine_distance;
END

Finally, you can start querying your table using prompts instead of keywords. This is awesome!

Check out the app I developed with the Native Vector Support in Azure SQL, which is designed to assist you in crafting prompts and evaluating your performance using my newsvector dataset. To explore the app, click here.

Like always, I also created this GitHub repository with everything I did.

Azure SQL Database Native vector support subscription for the Private Preview

You can sign up for the private preview at this link.

This article, published by Davide Mauri and Pooja Kamath at Microsoft Build 2024 event, provides all the information.

Announcing EAP for Vector Support in Azure SQL Database – Azure SQL Devs’ Corner (microsoft.com)

Conclusion

The integration of Azure OpenAI with native vector support in Azure SQL Database unlocks new possibilities for applications that require advanced search capabilities and data analysis. By storing and querying vector embeddings alongside traditional SQL data, you can build powerful solutions that combine the best of both worlds—semantic understanding with the reliability and performance of Azure SQL.

This innovation simplifies application development, enhances data insights, and paves the way for the next generation of intelligent applications.

That’s it for today!

Sources

Azure SQL DB Vector Functions Private Preview | Data Exposed (youtube.com)

Announcing EAP for Vector Support in Azure SQL Database – Azure SQL Devs’ Corner (microsoft.com)