Chatting with your Enterprise data privately and securely through the use of Azure Cognitive Search and Azure Open AI

In an age where data is power, businesses are constantly looking for ways to leverage their vast enterprise data stores. One promising avenue lies in the intersection of AI and search technologies, specifically through the use of Azure Cognitive Search and Azure Open AI. These tools provide powerful ways to converse with enterprise data privately and securely.

Enterprise data can take various forms, from structured database datasets to unstructured documents, emails, and files. Some examples are data about the company’s benefits, internal policies, job descriptions, roles, and much more.

What is Azure Cognitive Search?

Azure Cognitive Search is a cloud-based service provided by Microsoft Azure that enables developers to build sophisticated search experiences into custom applications. It integrates with other Azure Cognitive Services to enable AI-driven content understanding through capabilities such as natural language processing, entity recognition, image analysis, and more.

Here are some of the key benefits of Azure Cognitive Search:

  1. Fully Managed: Azure Cognitive Search is fully managed, meaning you don’t have to worry about infrastructure setup, maintenance, or scaling. You just need to focus on the development of your application.
  2. Rich Search Experiences: It allows for the creation of rich search experiences, including auto-complete, geospatial search, filtering, and faceting.
  3. AI-Enhanced Search Capabilities: When combined with other Azure Cognitive Services, Azure Cognitive Search can provide advanced search features. For example, it can extract key phrases, detect languages, identify entities, and more. It can even index and search unstructured data, like text within documents or images.
  4. Scalability and Performance: Azure Cognitive Search can automatically scale to handle large volumes of data and high query loads. It provides fast, efficient search across large datasets.
  5. Data Integration: It can pull in data from a variety of sources, including Azure SQL Database, Azure Cosmos DB, Azure Blob Storage, and more.
  6. Security: Azure Cognitive Search supports data encryption at rest and in transit. It also integrates with Azure Active Directory for identity and access management.
  7. Developer Friendly: It provides a simple, RESTful API and integrates with popular programming languages and development frameworks. This makes it easier for developers to embed search functionality into applications.
  8. Indexing: The service provides robust indexing capabilities, allowing you to index data from a variety of sources and formats. This allows for a more comprehensive search experience for end-users.

In summary, Azure Cognitive Search can provide powerful, intelligent search capabilities for your applications, allowing users to find the information they need quickly and easily.

What is Azure Open AI?

Azure OpenAI Service is a platform that provides REST API access to OpenAI’s powerful language models, including GPT-3, GPT-4, Codex, and Embeddings. It can be used for tasks such as content generation, summarization, semantic search, and natural language-to-code translation.

The security and safety of enterprise data is a top priority for Azure OpenAI. Here are some key points on how it ensures safety:

  • The Azure OpenAI Service is fully controlled by Microsoft and does not interact with any services operated by OpenAI. Your prompts (inputs) and completions (outputs), your embeddings, and your training data are not available to other customers, OpenAI, or used to improve OpenAI models, any Microsoft or 3rd party products or services, or to automatically improve Azure OpenAI models for your use in your resource. Your fine-tuned Azure OpenAI models are available exclusively for your use.
  • The service processes different types of data including prompts and generated content, augmented data included with prompts, and training & validation data.
  • When generating completions, images, or embeddings, the service evaluates the prompt and completion data in real-time to check for harmful content types. The models are stateless, meaning no prompts or generations are stored in the model, and prompts and generations are not used to train, retrain, or improve the base models.
  • With the “on your data” feature, the service retrieves relevant data from a configured data store and augments the prompt to produce generations that are grounded with your data. The data remains stored in the data source and location you designate. No data is copied into the Azure OpenAI service.
  • Training data uploaded for fine-tuning is stored in the Azure OpenAI resource in the customer’s Azure tenant. It can be double encrypted at rest and can be deleted by the customer at any time. This data is not used to train, retrain, or improve any Microsoft or 3rd party base models.
  • Azure OpenAI includes both content filtering and abuse monitoring features to reduce the risk of harmful use of the service. To detect and mitigate abuse, Azure OpenAI stores all prompts and generated content securely for up to thirty (30) days.
  • The data store where prompts and completions are stored is logically separated by customer resources. Prompts and generated content are stored in the Azure region where the customer’s Azure OpenAI service resource is deployed, within the Azure OpenAI service boundary. Human reviewers can only access the data when it has been flagged by the abuse monitoring system.
  • Customers who meet additional Limited Access eligibility criteria and attest to specific use cases can apply to modify the Azure OpenAI content management features. Suppose Microsoft approves a customer’s request to change abuse monitoring. In that case, Microsoft does not store any prompts and completions associated with the approved Azure subscription for which abuse monitoring is configured.

In conclusion, Azure OpenAI takes numerous measures to ensure that your enterprise data is kept secure and confidential while using its service.

Revolutionize your Enterprise Data with ChatGPT: step by step how to create your own Enterprise Chat

This sample demonstrates a few approaches for creating ChatGPT-like experiences over your own data using the Retrieval Augmented Generation pattern. It uses Azure Open AI Service to access the ChatGPT model (gpt-35-turbo), and Azure Cognitive Search for data indexing and retrieval.

The repo includes sample data so it’s ready to try end-to-end. In this sample application, we use a fictitious company called Contoso Electronics, and the experience allows its employees to ask questions about the benefits, internal policies, as well as job descriptions, and roles.

Features

  • Chat and Q&A interfaces
  • Explores various options to help users evaluate the trustworthiness of responses with citations, tracking of source content, etc.
  • Shows possible approaches for data preparation, prompt construction, and orchestration of interaction between model (ChatGPT) and retriever (Cognitive Search)
  • Settings directly in the UX to tweak the behavior and experiment with options
Chat screen

Getting Started

IMPORTANT: In order to deploy and run this example, you’ll need an Azure subscription with access enabled for the Azure OpenAI service. You can request access here. You can also visit here to get some free Azure credits to get you started.

AZURE RESOURCE COSTS by default this sample will create Azure App Service and Azure Cognitive Search resources that have a monthly cost, as well as Form Recognizer resource that has cost per document page. You can switch them to free versions of each of them if you want to avoid this cost by changing the parameters file under the infra folder (though there are some limits to consider; for example, you can have up to 1 free Cognitive Search resource per subscription, and the free Form Recognizer resource only analyzes the first 2 pages of each document.)

Prerequisites

To Run Locally

  • Azure Developer CLI
  • Python 3+
    • Important: Python and the pip package manager must be in the path in Windows for the setup scripts to work.
    • Important: Ensure you can run python --version from the console. On Ubuntu, you might need to run sudo apt install python-is-python3 to link python to python3.
  • Node.js
  • Git
  • Powershell 7+ (pwsh) – For Windows users only.
    • Important: Ensure you can run pwsh.exe from a PowerShell command. If this fails, you likely need to upgrade PowerShell.

NOTE: Your Azure Account must have Microsoft.Authorization/roleAssignments/write permissions, such as User Access Administrator or Owner.

Installation

Project Initialization

  1. Create a new folder and switch to it in the terminal
  2. Run azd auth login
  3. Run azd init -t azure-search-openai-demo
    • For the target location, the regions that currently support the models used in this sample are East US or South Central US. For an up-to-date list of regions and models, check here
    • note that this command will initialize a git repository and you do not need to clone this repository

Starting from scratch:

Execute the following command, if you don’t have any pre-existing Azure services and want to start from a fresh deployment.

  1. Run azd up – This will provision Azure resources and deploy this sample to those resources, including building the search index based on the files found in the ./data folder.
  2. After the application has been successfully deployed you will see a URL printed to the console. Click that URL to interact with the application in your browser.

For detailed information click here on my GitHub and follow a video from Microsoft talking about the example solution.

You can look at the Chat App that I’ve developed, which I will make available for you to test for a few days.

Firstly, it’s important to understand that you have the ability to replace the PDF files within the “./data” directory with your own business data.

If you wish to examine these files first to gain insights into the types of questions you can make in the chat to test, please click here.

Regrettably, the demo app had to be deactivated due to Azure expenses. If you’d like it to be reactivated, please click here to contact me. Thank you.

You’re able to query any content found within the enterprise PDF files located in the “./data” directory. The chat will respond with citations from the respective PDFs, and you have the option to click through and verify the information directly from the source PDF.

Conclusion

The vast universe of enterprise data, spanning from structured database datasets to unstructured documents, emails, and files, holds a wealth of insights that can drive an organization’s growth and success. Azure Cognitive Search and Azure OpenAI serve as powerful tools that make this data readily accessible, private, and secure. By leveraging these technologies, businesses can tap into the full potential of their internal data, from understanding the intricacies of their benefits and policies to defining roles and job descriptions more effectively. With a future powered by AI and machine learning, the conversations we can have with our data are only just beginning. This is more than just a technological shift; it’s a new era of informed decision-making, driven by data that’s within our reach. This solution provides an array of opportunities to assist businesses in leveraging their corporate data and disseminating it amongst their employees. This method simplifies comprehension, fostering organizational growth and enhancing the company culture. Should you require additional details on this topic, please do not hesitate to reach out to me.

That’s it for today!

The Future of Data Analytics: An Introduction to Microsoft Fabric

Microsoft Fabric, launched on May 24-25 of 2023 at the Microsoft Build event, is an end-to-end data and analytics platform that combines Microsoft’s OneLake data lake, Power BI, Azure Synapse, and Azure Data Factory into a unified software as a service (SaaS) platform. It’s a one-stop solution designed to serve various data professionals including data engineers, data warehousing professionals, data scientists, data analysts, and business users, enabling them to collaborate effectively within the platform to foster a healthy data culture across their organizations​​.

What are the Microsoft Fabric key features?


Data Factory – Microsoft’s Azure Data Factory is a powerful tool that combines the simplicity of Power Query with Azure Data Factory’s scale. It provides over 200 native connectors for data linkage from on-premises and cloud-based sources. Data Factory enables the scheduling and orchestration of notebooks and Spark jobs.

Data Engineering – Leveraging the extensive capabilities of Spark, data engineering in Microsoft Fabric provides premier authoring experiences and facilitates large-scale data transformations. It plays a crucial role in democratizing data through the lakehouse model. Moreover, integration with

Data Science – The data science capability in Microsoft Fabric aids in building, deploying, and operationalizing machine learning models within the Fabric framework. It interacts with Azure Machine Learning for built-in experiment tracking and model registry, empowering data scientists to enhance organizational data with predictions that business analysts can incorporate into their BI reports, thereby transitioning from descriptive to predictive insights.

Data Warehouse – The data warehousing component of Microsoft Fabric offers top-tier SQL performance and scalability. It features a full separation of computing and storage for independent scaling and native data storage in the open Delta Lake format.

Real-Time Analytics – Observational data, acquired from diverse sources like apps, IoT devices, human interactions, and more, represents the fastest-growing data category. This semi-structured, high-volume data, often in JSON or Text format with varying schemas, presents challenges for conventional data warehousing platforms. However, Microsoft Fabric’s Real-Time Analytics offers a superior solution for analyzing such data.

Power BI – Recognised as a leading Business Intelligence platform worldwide, Power BI in Microsoft Fabric enables business owners to access all Fabric data swiftly and intuitively for data-driven decision-making.

What are the Advantages of Microsoft Fabric?

Unified Platform: Microsoft Fabric provides a unified platform for different data analytics workloads such as data integration, engineering, warehousing, data science, real-time analytics, and business intelligence. This can foster a well-functioning data culture across the organization as data engineers, warehousing professionals, data scientists, data analysts, and business users can collaborate within Fabric​​.

Multi-cloud Support: Fabric is designed with a multi-cloud approach in mind, with support for data in Amazon S3 and (soon) Google Cloud Platform. This means that users are not restricted to using data only from Microsoft’s ecosystem, providing flexibility​.

Accessibility: Microsoft Fabric is currently available in public preview, and anyone can try the service without providing their credit card information. Starting from July 1, Fabric will be enabled for all Power BI tenants​.

AI Integration: The private preview of Copilot in Power BI will combine advanced generative AI with data, enabling users to simply describe the insights they need or ask a question about their data, and Copilot will analyze and pull the correct data into a report, turning data into actionable insights instantly​​.

Microsoft Fabric – Licensing and Pricing

Microsoft Fabric capacities are available for purchase in the Azure portal. These capacities provide the compute resources for all the experiences in Fabric from the Data Factory to ingest and transform to Data Engineering, Data Science, Data Warehouse, Real-Time Analytics, and all the way to Power BI for data visualization. A single capacity can power all workloads concurrently and does not need to be pre-allocated across the workloads. Moreover, a single capacity can be shared among multiple users and projects, without any limitations on the number of workspaces or creators that can utilize it.

To gain access to Microsoft Fabric, you have three options:

  1. Leverage your existing Power BI Premium subscription by turning on the Fabric preview switch. All Power BI Premium capacities can instantly power all the Fabric workloads with no additional action required. If you already have a Power BI Premium subscription, you can simply turn on the Fabric preview switch. This means you can enable Microsoft Fabric’s capabilities as part of your existing Power BI Premium subscription without having to do anything else. All the capacities you have with your Power BI Premium subscription can be used to power the full range of workloads in Microsoft Fabric. In other words, you can use your existing Power BI Premium resources to run all of the data and analytics tasks that Microsoft Fabric can handle.
  2. Start a Fabric trial if your tenant supports trials. If you’re not sure about committing to Microsoft Fabric yet, you can start a trial if your tenant (an instance of Azure Active Directory) supports it. A trial allows you to test the service before deciding to purchase. During the trial period, you can explore the full capabilities of Microsoft Fabric, such as data ingestion, data transformation, data engineering, data science, data warehouse operations, real-time analytics, and data visualization with Power BI.
  3. Purchase a Fabric pay-as-you-go capacity from the Azure portal. If you decide that Microsoft Fabric suits your needs and you don’t have a Power BI Premium subscription, you can directly purchase a Fabric capacity on a pay-as-you-go basis from the Azure portal. The pay-as-you-go model is flexible because it allows you to pay for only the compute and storage resources you use. Microsoft Fabric capacities come in different sizes, from F2 to F2048, representing 2 – 2048 Capacity Units (CU). Your bill will be determined by the amount of computing you provision (i.e., the size of the capacity you choose) and the amount of storage you use in OneLake, the data lake built into Microsoft Fabric. This model also allows you to easily scale your capacities up and down to adjust their computing power, and even pause your capacities when not in use to save on your bills​​.

Microsoft Fabric is a unified product for all your data and analytics workloads. Rather than provisioning and managing separate compute for each workload, with Fabric, your bill is determined by two variables: the amount of compute you provision and the amount of storage you use.

Follow the capacities that you can buy in the Azure portal:

Check out this video from Guy and Cube which breaks down the details on pricing and licensing.

How to activate the Microsoft Fabric Trial version?

Step 1

Login to Microsoft Power BI with your Developer Account

You will observe that asides from the OneLake icon at the top left, everything looks normal if you are familiar with Power BI Service.

Step 2

Enable Microsoft Fabric for your Tenant

Your Screen will Look like this

So far, we’ve only enabled Microsoft Fabric at the tenant level. This doesn’t give full access to Fabric resources as can be seen in the illustration below

So, Let’s upgrade the Power BI License to Microsoft Fabric Trial

For a smoother experience, You should create a new Workspace and add Microsoft Fabric Trial License as can be seen below

As you can see, while creating a new Workspace, you can now Assign Fabric Trial License to it. Upon creation, we are able to take full advantage of Microsoft Fabric

This video by Guy and Cube explains the steps for getting the Microsoft Fabric Trial.

Conclusion

Microsoft Fabric is currently in preview but already represents a significant advancement in the field of data and analytics, offering a unified platform that brings together various tools and services. It enables a smooth and collaborative experience for a variety of data professionals, fostering a data-driven culture within organizations. Let´s wait for the next steps from Microsoft.

That’s it for today!