Integrating Azure OpenAI with Native Vector Support in Azure SQL Databases for Advanced Search Capabilities and Data Insights

Azure SQL Database has taken a significant step forward by introducing native support for vectors, unlocking advanced capabilities for applications that rely on semantic search, AI, and machine learning. By integrating vector search into Azure SQL, developers can now store, search, and analyze vector data directly alongside traditional SQL data, offering a unified solution for complex data analysis and enhanced search experiences.

Vectors in Azure SQL Database

Vectors are numerical representations of objects like text, images, or audio. They are essential for applications involving semantic search, recommendation systems, and more. These vectors are typically generated by machine learning models, capturing the semantic meaning of the data they represent.

The new vector functionality in Azure SQL Database allows you to store and manage these vectors within a familiar SQL environment. This eliminates the need for separate vector databases, streamlining your application architecture and simplifying your data management processes.

Key Benefits of Native Vector Support in Azure SQL

  • Unified Data Management: Store and query both traditional and vector data in a single database, reducing complexity and maintenance overhead.
  • Advanced Search Capabilities: Perform similarity searches alongside standard SQL queries, leveraging Azure SQL’s sophisticated query optimizer and powerful enterprise features.
  • Optimized Performance: Vectors are stored in a compact binary format, allowing for efficient distance calculations and optimized performance on vector-related operations.

Embeddings: The Foundation of Vector Search

At the heart of vector search are embeddings—dense vector representations of objects, generated by deep learning models. These embeddings capture the semantic similarities between related concepts, enabling tasks such as semantic search, natural language processing, and recommendation systems.

For example, word embeddings can cluster related words like ā€œcomputer,ā€ ā€œsoftware,ā€ and ā€œmachine,ā€ while distant clusters might represent words with entirely different meanings, such as ā€œlion,ā€ ā€œcat,ā€ and ā€œdog.ā€ These embeddings are particularly powerful in applications where context and meaning are more important than exact keyword matches.

Azure OpenAI makes it easy to generate embeddings by providing pre-trained machine learning models accessible through REST endpoints. Once generated, these embeddings can be stored directly in an Azure SQL Database, allowing you to perform vector search queries to find similar data points.

You can explore how vector embeddings work by visiting this amazing website: Transformer Explainer. It offers an excellent interactive experience to help you better understand how Generative AI operates in general.

Vector Search Use Cases

Vector search is a powerful technique used to find vectors in a dataset that are similar to a given query vector. This capability is essential in various applications, including:

  • Semantic Search: Rank search results based on their relevance to the user’s query.
  • Recommendation Systems: Suggest related items based on similarity in vector space.
  • Clustering: Group similar items together based on vector similarity.
  • Anomaly Detection: Identify outliers in data by finding vectors that differ significantly from the norm.
  • Classification: Classify items based on the similarity of their vectors to predefined categories.

For instance, consider a semantic search application where a user queries for ā€œhealthy breakfast options.ā€ A vector search would compare the vector representation of the query with vectors representing product reviews, finding the most contextually relevant items—even if the exact keywords don’t match.

Key Features of Native Vector Support in Azure SQL

Azure SQL’s native vector support introduces several new functions to operate on vectors, which are stored in a binary format to optimize performance. Here are the key functions:

  • JSON_ARRAY_TO_VECTOR: Converts a JSON array into a vector, enabling you to store embeddings in a compact format.
  • ISVECTOR: Checks whether a binary value is a valid vector, ensuring data integrity.
  • VECTOR_TO_JSON_ARRAY: Converts a binary vector back into a human-readable JSON array, making it easier to work with the data.
  • VECTOR_DISTANCE: Calculates the distance between two vectors using a chosen distance metric, such as cosine or Euclidean distance.

These functions enable powerful operations for creating, storing, and querying vector data in Azure SQL Database.

Example: Vector Search in Action

Let’s walk through an example of using Azure SQL Database to store and query vector embeddings. Imagine you have a table of customer reviews, and you want to find reviews that are contextually related to a user’s search query.

  1. Storing Embeddings as Vectors:
    After generating embeddings using Azure OpenAI, you can store these vectors in a VARBINARY(8000) column in your SQL table:
SQL
   ALTER TABLE [dbo].[FineFoodReviews] ADD [VectorBinary] VARBINARY(8000);
   UPDATE [dbo].[FineFoodReviews]
   SET [VectorBinary] = JSON_ARRAY_TO_VECTOR([vector]);

This allows you to store the embeddings efficiently, ready for vector search operations.

  1. Performing Similarity Searches:
    To find reviews that are similar to a user’s query, you can convert the query into a vector and calculate the cosine distance between the query vector and the stored embeddings:
SQL
   DECLARE @e VARBINARY(8000);
   EXEC dbo.GET_EMBEDDINGS @model = '<yourmodeldeploymentname>', @text = 'healthy breakfast options', @embedding = @e OUTPUT;

   SELECT TOP(10) ProductId,
                  Summary,
                  Text,
                  VECTOR_DISTANCE('cosine', @e, VectorBinary) AS Distance
   FROM dbo.FineFoodReviews
   ORDER BY Distance;

This query returns the top reviews that are contextually related to the user’s search, even if the exact words don’t match.

  1. Hybrid Search with Filters:
    You can enhance vector search by combining it with traditional keyword filters to improve relevance and performance. For example, you could filter reviews based on criteria like user identity, review score, or the presence of specific keywords, and then apply vector search to rank the results by relevance:
SQL
   -- Comprehensive query with multiple filters.
   SELECT TOP(10)
       f.Id,
       f.ProductId,
       f.UserId,
       f.Score,
       f.Summary,
       f.Text,
       VECTOR_DISTANCE('cosine', @e, VectorBinary) AS Distance,
       CASE 
           WHEN LEN(f.Text) > 100 THEN 'Detailed Review'
           ELSE 'Short Review'
       END AS ReviewLength,
       CASE 
           WHEN f.Score >= 4 THEN 'High Score'
           WHEN f.Score BETWEEN 2 AND 3 THEN 'Medium Score'
           ELSE 'Low Score'
       END AS ScoreCategory
   FROM FineFoodReviews f
   WHERE
       f.UserId NOT LIKE 'Anonymous%'  -- Exclude anonymous users
       AND f.Score >= 2               -- Score threshold filter
       AND LEN(f.Text) > 50           -- Text length filter for detailed reviews
       AND (f.Text LIKE '%gluten%' OR f.Text LIKE '%dairy%') -- Keyword filter
   ORDER BY
       Distance,  -- Order by cosine distance
       f.Score DESC, -- Secondary order by review score
       ReviewLength DESC; -- Tertiary order by review length

This query combines semantic search with traditional filters, balancing relevance and computational efficiency.

Leveraging REST Services for Embedding Generation

Azure OpenAI provides REST endpoints for generating embeddings, which can be consumed directly from Azure SQL Database using the sp_invoke_external_rest_endpoint system stored procedure. This integration enables seamless interaction between your data and AI models, allowing you to build intelligent applications that combine the power of machine learning with the familiarity of SQL.

Here’s a stored procedure example that retrieves embeddings from a deployed Azure OpenAI model and stores them in the database:

SQL
CREATE PROCEDURE [dbo].[GET_EMBEDDINGS]
(
    @model VARCHAR(MAX),
    @text NVARCHAR(MAX),
    @embedding VARBINARY(8000) OUTPUT
)
AS
BEGIN
    DECLARE @retval INT, @response NVARCHAR(MAX);
    DECLARE @url VARCHAR(MAX);
    DECLARE @payload NVARCHAR(MAX) = JSON_OBJECT('input': @text);

    SET @url = 'https://<resourcename>.openai.azure.com/openai/deployments/' + @model + '/embeddings?api-version=2023-03-15-preview';

    EXEC dbo.sp_invoke_external_rest_endpoint 
        @url = @url,
        @method = 'POST',   
        @payload = @payload,   
        @headers = '{"Content-Type":"application/json", "api-key":"<openAIkey>"}', 
        @response = @response OUTPUT;

    DECLARE @jsonArray NVARCHAR(MAX) = JSON_QUERY(@response, '$.result.data[0].embedding');
    SET @embedding = JSON_ARRAY_TO_VECTOR(@jsonArray);
END
GO

This stored procedure retrieves embeddings from the Azure OpenAI model and converts them into a binary format for storage in the database, making them available for similarity search and other operations.

Let’s implementing a experiment with the Native Vector Support in Azure SQL

Azure SQL Database provides a seamless way to store and manage vector data despite not having a specific vector data type. Column-store indexes, vectors, and essentially lists of numbers can be efficiently stored in a table. Each vector can be represented in a row with individual elements as columns or serialized arrays. This approach ensures efficient storage and retrieval, making Azure SQL suitable for large-scale vector data management.

I used the Global News Dataset from Kaggle in my experiment.

First, you must create the columns to save the vector information. In my case, I created two columns: title_vector For the news title and content_vector the news content. For this, create a small Python code, but you can also do that directly from SQL using a cursor. It's important to know that you don't need to pay for any Vector Databases by saving the vector information inside the Azure SQL.

Python
from litellm import embedding
import pyodbc  # or another SQL connection library
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Set up OpenAI credentials from environment variables
os.environ['AZURE_API_KEY'] =os.getenv('AZURE_API_KEY')
os.environ['AZURE_API_BASE'] = os.getenv('AZURE_API_BASE')
os.environ['AZURE_API_VERSION'] = os.getenv('AZURE_API_VERSION')

# Connect to your Azure SQL database
conn = pyodbc.connect(f'DRIVER={{ODBC Driver 17 for SQL Server}};'
                      f'SERVER={os.getenv("DB_SERVER")};'
                      f'DATABASE={os.getenv("DB_DATABASE")};'
                      f'UID={os.getenv("DB_UID")};'
                      f'PWD={os.getenv("DB_PWD")}')

def get_embeddings(text):
    # Truncate the text to 8191 characters bacause of the text-embedding-3-     small OpenAI API Embedding model limit
    truncated_text = text[:8191]

    response = embedding(
        model="azure/text-embedding-3-small",
        input=truncated_text,
        api_key=os.getenv('AZURE_API_KEY'),
        api_base=os.getenv('AZURE_API_BASE'),
        api_version=os.getenv('AZURE_API_VERSION')
        )
        
    embeddings = response['data'][0]['embedding']
    return embeddings


def update_database(article_id, title_vector, content_vector):
    cursor = conn.cursor()

    # Convert vectors to strings
    title_vector_str = str(title_vector)
    content_vector_str = str(content_vector)

    # Update the SQL query to use the string representations
    cursor.execute("""
        UPDATE newsvector
        SET title_vector = ?, content_vector = ?
        WHERE article_id = ?
    """, (title_vector_str, content_vector_str, article_id))
    conn.commit()


def embed_and_update():
    cursor = conn.cursor()
    cursor.execute("SELECT article_id, title, full_content FROM newsvector where title_vector is null and full_content is not null and title is not null order by published asc")
    
    title_vector = ""
    content_vector = ""
    
    for row in cursor.fetchall():
        article_id, title, full_content = row
        
        print(f"Embedding article {article_id} - {title}")
        
        title_vector = get_embeddings(title)
        content_vector = get_embeddings(full_content)
        
        update_database(article_id, title_vector, content_vector)

embed_and_update()

These two columns will contain something like this: [-0.02232750505208969, -0.03755787014961243, -0.0066827102564275265…]

Second, you must create a procedure in the Azure Database to transform the query into a vector embedding.

SQL
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[GET_EMBEDDINGS]
(
    @model VARCHAR(MAX),
    @text NVARCHAR(MAX),
    @embedding VARBINARY(8000) OUTPUT
)
AS
BEGIN
    DECLARE @retval INT, @response NVARCHAR(MAX);
    DECLARE @url VARCHAR(MAX);
    DECLARE @payload NVARCHAR(MAX) = JSON_OBJECT('input': @text);

    -- Set the @url variable with proper concatenation before the EXEC statement
    SET @url = 'https://<Your App>.openai.azure.com/openai/deployments/' + @model + '/embeddings?api-version=2024-02-15-preview';

    EXEC dbo.sp_invoke_external_rest_endpoint 
        @url = @url,
        @method = 'POST',   
        @payload = @payload,   
        @headers = '{"Content-Type":"application/json", "api-key":"<Your Azure Open AI API Key"}', 
        @response = @response OUTPUT;

    -- Use JSON_QUERY to extract the embedding array directly
    DECLARE @jsonArray NVARCHAR(MAX) = JSON_QUERY(@response, '$.result.data[0].embedding');

    
    SET @embedding = JSON_ARRAY_TO_VECTOR(@jsonArray);
END

I also create another procedure to search directly to the dataset using the Native Vector Support in Azure SQL.

SQL
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[SearchNewsVector] 
    @inputText NVARCHAR(MAX)
AS
BEGIN
    -- Query the SimilarNewsContentArticles table using the response
    IF OBJECT_ID('dbo.result', 'U') IS NOT NULL
        DROP TABLE dbo.result;

	--Assuming you have a stored procedure to get embeddings for a given text
	DECLARE @e VARBINARY(8000);
	EXEC dbo.GET_EMBEDDINGS @model = 'text-embedding-3-small', @text = @inputText, @embedding = @e OUTPUT;

	SELECT TOP(10) 
       [article_id]
      ,[source_id]
      ,[source_name]
      ,[author]
      ,[title]
      ,[description]
      ,[url]
      ,[url_to_image]
      ,[content]
      ,[category]
      ,[full_content]
      ,[title_vector]
      ,[content_vector]
      ,[published]
      ,VECTOR_DISTANCE('cosine', @e, VectorBinary) AS cosine_distance
	into result
	FROM newsvector
	ORDER BY cosine_distance;
END

Finally, you can start querying your table using prompts instead of keywords. This is awesome!

Check out the app I developed with the Native Vector Support in Azure SQL, which is designed to assist you in crafting prompts and evaluating your performance using my newsvector dataset. To explore the app, click here.

Like always, I also created this GitHub repository with everything I did.

Azure SQL Database Native vector support subscription for the Private Preview

You can sign up for the private preview at this link.

This article, published by Davide Mauri and Pooja Kamath at Microsoft Build 2024 event, provides all the information.

Announcing EAP for Vector Support in Azure SQL Database – Azure SQL Devs’ Corner (microsoft.com)

Conclusion

The integration of Azure OpenAI with native vector support in Azure SQL Database unlocks new possibilities for applications that require advanced search capabilities and data analysis. By storing and querying vector embeddings alongside traditional SQL data, you can build powerful solutions that combine the best of both worlds—semantic understanding with the reliability and performance of Azure SQL.

This innovation simplifies application development, enhances data insights, and paves the way for the next generation of intelligent applications.

That’s it for today!

Sources

Azure SQL DB Vector Functions Private Preview | Data Exposed (youtube.com)

Announcing EAP for Vector Support in Azure SQL Database – Azure SQL Devs’ Corner (microsoft.com)

Navigating Vector Operations in Azure SQL for Better Data Insights: A Guide How to Use Generative AI to Prompt Queries in Datasets

The evolving landscape of data analytics has brought vector databases to the forefront, especially with their application in finding similarities in diverse data types such as articles, photos, and products. Azure SQL, combined with the prowess of OpenAI, offers a powerful platform for executing vector operations, simplifying the task of finding similar items and enhancing recommendation systems in applications.

What is Vector Similarity, and How do you calculate cosine similarity?

Vector similarity revolves around transforming data into numerical vectors or embeddings. These embeddings are numerical representations of various concepts converted into sequences of numbers, making it easier for computers to grasp their relationships. This method is particularly effective in comparing and finding similarities between data points, a crucial feature in applications like search engines and clustering algorithms.

Cosine similarity, a commonly used metric in vector similarity, measures the cosine of the angle between two vectors. This metric is crucial in determining the degree of similarity between the vectors, irrespective of their size. In Azure SQL, cosine similarity can be computed with a simple SQL formula involving SUM and SQRT functions applied to the vector elements, thus providing a straightforward yet powerful way to measure vector similarity.

Cosine similarity can be calculated in SQL using the following formula, given two vectors, a and b:

SELECT 
    SUM(a.value * b.value) / (  
        SQRT(SUM(a.value * a.value)) * SQRT(SUM(b.value * b.value))   
    ) AS cosine_similarity
FROM
    vectors_values

How to Querying Azure OpenAI?

Azure SQL’s integration with Azure OpenAI simplifies generating and working with embeddings. Users can obtain the vector representation of any given text by creating an Azure OpenAI resource and deploying a model like text-embedding-ada-002. This integration enables the execution of REST API calls from within the Azure SQL Database, making fetching and working with embeddings more streamlined and accessible.

SQL
-- Declare a variable to hold the response from the external REST API
declare @response nvarchar(max);

-- Declare and initialize a variable with JSON payload. 
-- The JSON object contains an 'input' key with a text value.
declare @payload nvarchar(max) = json_object('input': 'This is how to futureproof your career in an AI world');

-- Execute a stored procedure to invoke an external REST endpoint
exec sp_invoke_external_rest_endpoint
    @url = 'https://<your-app-name>.openai.azure.com/openai/deployments/embeddings/embeddings?api-version=2023-03-15-preview', -- The URL of the REST endpoint
    @credential = [https://<your-app-name>.openai.azure.com], -- Credential for accessing the REST API
    @payload = @payload, -- The JSON payload defined earlier
    @response = @response output; -- Output parameter to store the response from the API

Advantages of Implementing Vectors in Azure SQL Database

1. Ease of Access with Azure OpenAI: Azure SQL combined with Azure OpenAI offers easy access to REST services for generating embeddings using pre-trained machine learning models. This accessibility facilitates the calculation of embeddings, which is otherwise a complex task.

2. Efficient Storage with Columnstore Indexes: Azure SQL databases efficiently store vectors using column store indexes. This method is particularly beneficial since Azure SQL doesn’t have a specific vector data type. Vectors, essentially lists of numbers, can be conveniently stored in a table with one row per vector element.

3. Fast Distance Calculation: The internal optimization of column store indexes in Azure SQL, employing SIMD and AVX-512 instructions, allows for high-speed calculation of distances between vectors, which is crucial for determining similarity.

4. Integration with Azure AI Search: Azure SQL’s integration with Azure AI Search streamlines the entire process of chunking, generating, storing, and querying vectors for vector search, significantly speeding up the development of the vectorization pipeline and minimizing maintenance tasks.

5. Capability for Complex Operations: Azure SQL enables complex operations like indexing, storing, and retrieving vector embeddings from a search index, which is essential for identifying the most similar documents in a vector space.

6. Versatile Data Handling: Azure SQL’s ability to handle structured and unstructured data, along with vector data, provides more versatility compared to vector databases, which are primarily optimized for vector data storage and retrieval.

7. Ease of Querying and Relevance Determination: The integration with Azure OpenAI allows easy querying of the REST service to obtain vector representations of text, which can then be used to calculate similarity against stored vectors, identifying the most relevant data.

8. Simplified Deployment and Management: Deploying and managing an embedding model via the Azure portal is straightforward, reducing the complexity of managing vector databases.

9. Suitability for a Range of Applications: While vector databases are specialized for high-dimensional similarity searches, Azure SQL’s broader application scope makes it suitable for various types of data and applications, from financial records to customer data.

10. Support for Advanced Azure Features: Azure SQL supports advanced Azure features, such as AI and machine learning capabilities, which can be seamlessly integrated with vector similarity operations for enhanced analytics and insights.

These advantages highlight the flexibility, efficiency, and ease of use of Azure SQL databases for vector similarity operations, making them a preferable choice in scenarios where diverse data types and complex operations are involved, alongside the need for seamless integration with other Azure services.

Implementing Vectors in Azure SQL Database

Azure SQL Database provides a seamless way to store and manage vector data despite not having a specific vector data type. Column-store indexes, vectors, and essentially lists of numbers can be efficiently stored in a table. Each vector can be represented in a row with individual elements as columns or serialized arrays. This approach ensures efficient storage and retrieval, making Azure SQL suitable for large-scale vector data management.

I used the Global News Dataset from Kaggle in my experiment.

First, you must create the columns to save the vector information. In my case, I created two columns: title_vector For the news title and content_vector the news content. For this, create a small Python code, but you can also do that directly from SQL using a cursor. It's important to know that you don't need to pay for any Vector Databases by saving the vector information inside the Azure SQL.

Python
import openai
import pyodbc  # or another SQL connection library

# Set up OpenAI credentials
openai.api_type = "azure"
openai.api_key = "<YOUR AZURE OPEN AI KEY>"
openai.api_base = "https://<your-app-name>.openai.azure.com/"
openai.api_version = "2023-07-01-preview"

# Connect to your Azure SQL database
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};'
                      'SERVER=<SERVER>;DATABASE=<DATABASE>;'
                      'UID=<USER>;PWD=<PASSWORD>')

def get_embeddings(text):
    # Truncate the text to 8000 characters
    truncated_text = text[:8000]

    response = openai.Embedding.create(input=truncated_text, engine="embeddings")
    embeddings = response['data'][0]['embedding']
    return embeddings

def update_database(article_id, title_vector, content_vector):
    cursor = conn.cursor()

    # Convert vectors to strings
    title_vector_str = str(title_vector)
    content_vector_str = str(content_vector)

    # Update the SQL query to use the string representations
    cursor.execute("""
        UPDATE news
        SET title_vector = ?, content_vector = ?
        WHERE article_id = ?
    """, (title_vector_str, content_vector_str, article_id))
    conn.commit()

def embed_and_update():
    cursor = conn.cursor()
    cursor.execute("SELECT article_id, title, full_content FROM news where title_vector is null and full_content is not null and title is not null order by published desc")
    
    title_vector = ""
    content_vector = ""
    
    for row in cursor.fetchall():
        article_id, title, full_content = row
        
        print(f"Embedding article {article_id} - {title}")
        
        title_vector = get_embeddings(title)
        content_vector = get_embeddings(full_content)
        
        update_database(article_id, title_vector, content_vector)

embed_and_update()

These two columns will contain something like this: [-0.02232750505208969, -0.03755787014961243, -0.0066827102564275265…]

Second, you must extract these vectors line by line into a new table for each vector field. In this case, I created a procedure to do it.

SQL
-- =============================================
-- Author:      Lawrence Teixeira
-- Create Date: 11-24-2023
-- Description: This stored procedure creates vectors for news titles and contents.
-- It processes data from the 'news' table and stores the vectors in separate tables.
-- =============================================

CREATE PROCEDURE [dbo].[Create_News_Vector]
AS
BEGIN
    SET NOCOUNT ON; -- Prevents the sending of DONE_IN_PROC messages to the client.

    -- Extract and store title vectors:
    -- First, check if the 'news_titles_vector' table exists and drop it if it does.
    IF OBJECT_ID('dbo.news_titles_vector', 'U') IS NOT NULL
        DROP TABLE dbo.news_titles_vector;

    -- Using a Common Table Expression (CTE) to process title vectors.
    WITH cte AS
    (
        SELECT 
            v.article_id,    
            CAST(tv.[key] AS INT) AS vector_value_id, -- Casting 'key' as INT for vector ID.
            CAST(tv.[value] AS FLOAT) AS vector_value   -- Casting 'value' as FLOAT for vector value.
        FROM 
            dbo.news AS v 
        CROSS APPLY 
            OPENJSON(v.title_vector) tv -- Parses JSON of title_vector in the 'news' table.
    )
    -- Create 'news_titles_vector' table with processed vectors.
    SELECT
        article_id,
        vector_value_id,
        vector_value
    INTO
        dbo.news_titles_vector
    FROM
        cte;

    -- Extract and store content vectors:
    -- Check and drop 'news_contents_vector' table if it exists.
    IF OBJECT_ID('dbo.news_contents_vector', 'U') IS NOT NULL
        DROP TABLE dbo.news_contents_vector;

    -- CTE for processing content vectors.
    WITH cte AS
    (
        SELECT 
            v.article_id,    
            CAST(tv.[key] AS INT) AS vector_value_id, -- Casting 'key' as INT for vector ID.
            CAST(tv.[value] AS FLOAT) AS vector_value   -- Casting 'value' as FLOAT for vector value.
        FROM 
            dbo.news AS v 
        CROSS APPLY 
            OPENJSON(v.content_vector) tv -- Parses JSON of content_vector in the 'news' table.
    )
    -- Create 'news_contents_vector' table with processed vectors.
    SELECT
        article_id,
        vector_value_id,
        vector_value
    INTO
        dbo.news_contents_vector
    FROM
        cte;

    -- Columnstore indexes creation is advised outside the stored procedure.
    -- These indexes optimize data storage and query performance on vector tables.
    CREATE CLUSTERED COLUMNSTORE INDEX cci_news_titles_vector
    ON dbo.news_titles_vector 
	order (article_id);

    CREATE CLUSTERED COLUMNSTORE INDEX cci_news_contents_vector
    ON dbo.news_contents_vector
	order (article_id);
END

I also create a function to search directly to the dataset using the Cosine similarity formula.

SQL
-- This Azure SQL function finds news articles similar to the given content vector.
CREATE function [dbo].[SimilarNewsContentArticles](
    @vector nvarchar(max) -- Input parameter: JSON string representing a content vector.
)
returns table -- The function returns a table.
as
return with 

-- CTE for processing the input vector.
cteVector as
(
    select 
        cast([key] as int) as [vector_value_id], -- Extracts and casts the 'key' from JSON to int.
        cast([value] as float) as [vector_value] -- Extracts and casts the 'value' from JSON to float.
    from 
        openjson(@vector) -- Parses the input JSON vector.
),

-- CTE for calculating similarity scores with existing articles.
cteSimilar as
(
    select top (50)
        v2.article_id, 
        sum(v1.[vector_value] * v2.[vector_value]) as cosine_distance 
        -- Calculates cosine similarity (distance) between vectors.
    from 
        cteVector v1 -- Uses the processed input vector.
    inner join 
        dbo.news_contents_vector v2 
        on v1.vector_value_id = v2.vector_value_id -- Joins with stored article vectors.
    group by
        v2.article_id
    order by
        cosine_distance desc -- Orders by similarity score, descending.
)

-- Final selection combining article data with similarity scores.
select 
    a.*, -- Selects all columns from the news article.
    r.cosine_distance -- Includes the calculated similarity score.
from 
    cteSimilar r -- Uses the similarity scores CTE.
inner join 
    dbo.news a on r.article_id = a.article_id -- Joins with the news articles table.
GO

Finally, you can start querying your table using prompts instead of keywords. This is awesome!

Check out the app I developed, which is designed to assist you in crafting prompts and evaluating your performance using my news dataset. To explore the app, click here.

I also created this GitHub repository with everything I did.

Click on this image to open the GitHub repository.

Conclusion

While the sample in this blog is not optimized for maximum efficiency, it is an excellent starting point for understanding and implementing vector operations in Azure SQL. The process, despite its simplicity, is relatively fast. For instance, querying an eight vCore Azure SQL database can return the fifty most similar articles in just half a second, demonstrating the efficiency and utility of vector operations in Azure SQL Database for data analytics and insights. If you want to know more about this topic, don’t hesitate to get in touch with me.

That’s it for today!

Sources:

Vector Similarity Search with Azure SQL database and OpenAI – Azure SQL Devs’ Corner (microsoft.com)

Vector Similarity Search with Azure SQL database and OpenAI | by Davide Mauri | Microsoft Azure | Medium

Azure-Samples/azure-sql-db-openai: Samples on how to use Azure SQL database with Azure OpenAI (github.com)