Ollama – 💡Tech News & Insights

Open WebUI and Free Chatbot AI: Empowering Corporations with Private Offline AI and LLM Capabilities

Artificial intelligence (AI) is reshaping how corporations function and interact with data in today’s digital landscape. However, with AI comes the challenge of securing corporate information and ensuring data privacy—especially when dealing with Large Language Models (LLMs). Public cloud-based AI services may expose sensitive data to third parties, making corporations wary of deploying models on external servers.

Open WebUI addresses this issue head-on by offering a self-hosted, offline, and highly extensible platform for deploying and interacting with LLMs. Built to run entirely offline, Open WebUI provides corporations with complete control over their AI models, ensuring data security, privacy, and compliance.

What is Open WebUI?

Open WebUI is a versatile, feature-rich, and user-friendly web interface for interacting with Large Language Models (LLMs). Initially launched as Ollama WebUI, Open WebUI is a community-driven, open-source platform enabling businesses, developers, and researchers to deploy, manage, and interact with AI models offline.

Open WebUI is designed to be extensible, supporting multiple LLM runners and integrating with different AI frameworks. Its clean, intuitive interface mimics popular platforms like ChatGPT, making it easy for users to communicate with AI models while maintaining full control over their data. By allowing businesses to self-host the web interface, Open WebUI ensures that no data leaves the corporate environment, which is crucial for organizations concerned with data privacy, security, and regulatory compliance.

Key Features of Open WebUI

1. Self-hosted and Offline Operation

Open WebUI is built to run in a self-hosted environment, ensuring that all data remains within your organization’s infrastructure. This feature is critical for companies handling sensitive information and those in regulated industries where external data transfers are a risk.

2. Extensibility and Model Support

Open WebUI supports various LLM runners, allowing businesses to deploy the language models that best meet their needs. This flexibility enables integration with custom models, including OpenAI-compatible APIs and models such as Ollama, GPT, and others. Users can also seamlessly switch between different models in real time to suit diverse use cases.

3. User-Friendly Interface

Designed to be intuitive and easy to use, Open WebUI features a ChatGPT-style interface that allows users to communicate with language models via a web browser. This makes it ideal for corporate teams who may not have a deep technical background but need to interact with LLMs for business insights, automation, or customer support.

4. Docker-Based Deployment

To ensure ease of setup and management, Open WebUI runs inside a Docker container. This provides an isolated environment, making it easier to deploy and maintain while ensuring compatibility across different systems. With Docker, corporations can manage their AI models and interfaces without disrupting their existing infrastructure.

5. Role-Based Access Control (RBAC)

To maintain security, Open WebUI offers granular user permissions through RBAC. Administrators can control who has access to specific models, tools, and settings, ensuring that only authorized personnel can interact with sensitive AI models.

6. Multi-Model Support

Open WebUI allows for concurrent utilization of multiple models, enabling organizations to harness the unique capabilities of different models in parallel. This is especially useful for businesses requiring a range of AI solutions from simple chat interactions to advanced language processing tasks.

7. Markdown and LaTeX Support

For enriched interaction, Open WebUI includes full support for Markdown and LaTeX, making it easier for users to create structured documents, write reports, and interact with AI using precise formatting and mathematical notation.

8. Retrieval-Augmented Generation (RAG)

Open WebUI integrates RAG technology, which allows users to feed documents into the AI environment and interact with them through chat. This feature enhances document analysis by enabling users to ask specific questions and retrieve document-based answers.

9. Custom Pipelines and Plugin Framework

The platform supports a highly modular plugin framework that allows businesses to create and integrate custom pipelines, tailor-made to their specific AI workflows. This enables the addition of specialized logic, ranging from AI agents to integration with third-party services, directly within the web UI.

10. Real-Time Multi-Language Support

For global organizations, Open WebUI offers multilingual support, enabling interaction with LLMs in various languages. This feature ensures that businesses can deploy AI solutions for different regions, enhancing both internal communication and customer-facing AI tools.

What Open WebUI Can Do?

Open WebUI Community

You can find good examples of models, prompts, tools, and functions at the Open WebUI Community.

Inside Open WebUI at workspaces as an admin, you can configure a lot of good stuff. The possibilities here are unlimited.

Why Corporations Should Consider Open WebUI

As businesses adopt AI to streamline operations and enhance decision-making, the need for secure, private, and controlled solutions is paramount. Open WebUI offers corporations the following distinct advantages:

1. Data Privacy and Compliance

By allowing organizations to run their AI models offline, Open WebUI ensures that no data leaves the corporate environment. This eliminates the risk of data exposure associated with cloud-based AI services. It also helps businesses stay compliant with data protection regulations such as GDPR, HIPAA, or CCPA.

2. Flexibility and Customization

Open WebUI’s extensibility makes it a highly flexible tool for enterprises. Businesses can integrate custom AI models, adapt the platform to meet unique needs, and deploy models specific to their industry or use case.

3. Cost Savings

For enterprises that require frequent AI model interactions, a self-hosted solution like Open WebUI can result in significant cost savings compared to paying for cloud-based API usage. Over time, this can reduce the operational cost of AI adoption.

4. Improved Control Over AI Systems

With Open WebUI, corporations have complete control over how their AI models are deployed, managed, and utilized. This includes controlling access, managing updates, and ensuring that AI models are used in compliance with corporate policies.

5. You can use Azure Open AI

Azure OpenAI Service ensures data privacy by not sharing your data with other customers or using it to improve models without your permission. It includes integrated content filtering to protect against harmful inputs and outputs, adheres to strict regulatory standards, and provides enterprise-grade security. Additionally, it features abuse monitoring to maintain safe and responsible AI use, making it a reliable choice for businesses prioritizing safety and privacy.

Installation and Setup

Getting started with Open WebUI is straightforward. Here are the basic steps:

1. Install Docker

Docker is required to deploy Open WebUI. If Docker isn’t already installed, it can be easily set up on your system. Docker provides an isolated environment to run applications, ensuring compatibility and security.

2. Launch Open WebUI

Using Docker, you can pull the Open WebUI image and start a container. The Docker command will depend on whether you are running the language model locally or connecting to a remote server.

Kotlin

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

3. Create an Admin Account

Once the web UI is running, the first user to sign up will be granted administrator privileges. This account will have comprehensive control over the web interface and the language models.

4. Connect to Language Models

You can configure Open WebUI to connect with various LLMs, including OpenAI or Ollama models. This can be done via the web UI settings, where you can specify API keys or server URLs for remote model access.

There are a lot of ways to implement Open WebUI and you can access it at this link.

Run AI Models Locally: Ollama Tutorial (Step-by-Step Guide + WebUI)

Open WebUI – Tutorial & Windows Install

Free Chatbot AI: Easy Access to Open WebUI for Corporations

To make Open WebUI even more accessible, I have deployed a version called Free Chatbot AI. This platform serves as an easy-access solution for businesses and users who want to experience the power of Open WebUI without the need for complex setup or hosting infrastructure. Free Chatbot AI offers a user-friendly interface where users can interact with Large Language Models (LLMs) in real time, all while maintaining the key benefits of privacy and control.

Key Benefits of Free Chatbot AI for Corporations:

Instant Access: Free Chatbot AI is pre-configured and hosted, allowing companies to quickly test and use AI models without worrying about setup or technical configurations.
Data Privacy: Like the self-hosted version of Open WebUI, Free Chatbot AI ensures that sensitive information is protected. No data is sent to third-party servers, ensuring that interactions remain private and secure.
Flexible Deployment: While Free Chatbot AI is an accessible hosted version, it also offers corporations the ability to experiment with LLMs before committing to a self-hosted deployment. This is perfect for businesses looking to try out AI capabilities before taking full control of their AI infrastructure.
User-Friendly Interface: Built with a simple and intuitive design, Free Chatbot AI mirrors the same ease of use as Open WebUI. This makes it suitable for teams across the organization, from technical users to non-technical departments like customer support or HR, enhancing workflows with AI-powered insights and automation.
No Setup Required: Free Chatbot AI eliminates the need for complex setup processes. Corporations can access the platform directly and begin leveraging the power of AI for their business operations immediately.

Use Cases for Free Chatbot AI:

Internal Team Collaboration: Free Chatbot AI enables teams to quickly interact with LLMs to generate ideas, draft content, or automate repetitive tasks such as writing summaries and answering FAQs.
AI-Assisted Customer Support: Businesses can test Free Chatbot AI to power customer support bots that deliver accurate, conversational responses to customer queries, all while maintaining data security.
Document Processing and Summarization: Teams can upload documents and let Free Chatbot AI generate summaries, extracting relevant information with ease, improving efficiency in knowledge management and decision-making.

How to access Free Chatbot AI?

First, click on this link and you have to create an account by clicking on Sign up.

Fill the fields below and click on Create Account.

After that, you have to select one of the models and have fun!

This is the home page.

You can create images by clicking on Image Gen.

You can type a prompt like “photorealistic image taken with Nikon Z50, 18mm lens, a vast and untouched wilderness, with a winding river flowing through a dense forest, showcasing the pristine beauty of untouched nature, aspect ratio 16:9“.

There are a lot of options to explore. Use Free Chatbot AI to explore all the options and good look!

Conclusion

As AI becomes increasingly integral to business operations, ensuring data privacy and control has never been more important. Open WebUI offers corporations a secure, customizable, and user-friendly platform to deploy and interact with Large Language Models, entirely offline. With its range of features, from role-based access to multi-model support and flexible integrations, Open WebUI is the ideal solution for businesses looking to adopt AI while maintaining full control over their data and processes.

For companies aiming to harness the power of AI while ensuring compliance with industry regulations, Open WebUI is a game-changer, offering the perfect balance between innovation and security.

If you have any doubts about how to implement it in your company you can contact me at this link.

That´s it for today!

Sources

https://docs.openwebui.com

https://medium.com/@omargohan/open-webui-the-llm-web-ui-66f47d530107

https://medium.com/free-or-open-source-software/open-webui-how-to-build-and-run-locally-with-nodejs-8155c51bcb55

https://openwebui.com/#open-webui-community

Cost-Effective Text Embedding: Leveraging Ollama Local Models with Azure SQL Databases

Embedding text using a local model can provide significant cost advantages and flexibility over cloud-based services. In this blog post, we explore how to set up and use a local model for text embedding and how this approach can be integrated with Azure SQL databases for advanced querying capabilities.

Cost Comparison: Open AI text-embedding-ada-002 pay Model vs. Local Model Setup Cost

When choosing between a paid service and setting up a local model for text embedding, it’s crucial to consider the cost implications based on the scale of your data and the frequency of usage. Below is a detailed comparison of the costs of using a paid model versus establishing a local one.

Pay Model Cost Estimate:

Open AI text-embedding-ada-002:

Using a paid model like OpenAI’s Ada V2 for embedding 1 terabyte of OCR texts would cost around $25,000. This estimation is based on converting every 4 characters into one token, which might vary depending on the content and structure of the OCR texts.

Local Model Cost Estimate:

Setup Costs:

The initial investment for setting up a local model can range from $4,050 to $12,750, depending on the selection of components, from mid-range to high-end. This one-time cost can be amortized over many uses and datasets, potentially offering a more cost-effective solution in the long run, especially for large data volumes.

Overall Financial Implications

While the upfront cost for a local model might seem high, it becomes significantly more economical with increased data volumes and repeated use. In contrast, the cost of using a pay model like OpenAI’s text-embedding-ada-002 scales linearly with data volume, leading to potentially high ongoing expenses.

Considering these factors, the local model offers a cost advantage and greater control over data processing and security, making it an attractive option for organizations handling large quantities of sensitive data.

Why I Have Decided to Use a Local Model?

Cost and data volume considerations primarily drove the decision to use a local model for text embedding. With over 20 terabytes of data, including 1 terabyte of OCR text to embed, the estimated cost of using a commercial text-embedding model like OpenAI’s text-embedding-ada-002 would be around USD 25,000. By setting up a local model, we can process our data at a fraction of this cost, reducing expenses by 49% to 84%.

Exploring Local Models: Testing BGE-M3, MXBAI-EMBED-LARGE, NOMIC-EMBED-TEXT, and text-embedding-ada-002 from Open AI.

I encountered some intriguing results in my recent tests with local embedding models BGE-M3 and NOMIC-EMBED-TEXT. Both models showed an accuracy below 0.80 when benchmarked against OpenAI’s “Text-embedding-ada-002.” This comparison has sparked a valuable discussion about the capabilities and limitations of different embedding technologies.

How to Choose the Best Model for Your Needs?

When considering open-source embedding models like NOMIC-EMBED-TEXT, BGE-M3, and MXBAI-EMBED-LARGE, specific strengths and applications that make them suitable for various machine learning tasks should be considered.

1. NOMIC-EMBED-TEXT: This model is specifically designed for handling long-context text, making it suitable for tasks that involve processing extensive documents or content that benefits from understanding broader contexts. It achieves this by training on full Wikipedia articles and various large-scale question-answering datasets, which helps it capture long-range dependencies.

2. BGE-M3: Part of the BGE (Beijing Academy of Artificial Intelligence) series, this model is adapted for sentence similarity tasks. It’s built to handle multilingual content effectively, which makes it a versatile choice for applications requiring understanding or comparing sentences across different languages.

3. MXBAI-EMBED-LARGE: This model is noted for its feature extraction capabilities, making it particularly useful for tasks that require distilling complex data into simpler, meaningful representations. Its training involves diverse datasets, enhancing its generalization across text types and contexts.

Each model brings unique capabilities, such as handling longer texts or providing robust multilingual support. When choosing among these models, consider the specific needs of your project, such as the length of text you need to process, the importance of multilingual capabilities, and the type of machine learning tasks you aim to perform (e.g., text similarity, feature extraction). Testing them with specific data is crucial to determine which model performs best in your context.

In our analysis, we’ve compared various results and identified the best open-source model to use compared to the OpenAI’s Text-embedding-ada-002.

We executed this query using the keyword ‘Microsoft’ to search the vector table and compare the content of Wikipedia articles.

SQL

declare @v nvarchar(max)
select @v = content_vector from dbo.wikipedia_articles_embeddings where title = 'Microsoft'
select w.title, w.text from 
(select top (10) id, title, text, dot_product
from [$vector].find_similar$wikipedia_articles_embeddings$content_vector(@v, 1, 0.25) 
order by dot_product desc) w
order by w.title
go

We utilized the KMeans compute node for text similarity analysis, focusing on a single cluster search. For a detailed, step-by-step guide on creating this dataset, please refer to the article I shared at the end of this article.

Follow the results overview:

To calculate the percentage of similarity of each model with “Text-embedding-ada-002”, we’ll determine how many keywords match between “Text-embedding-ada-002” and the other models, then express this as a percentage of the total keywords in “Text-embedding-ada-002”. Here’s the updated table with the percentages:

Follow the comparison table:

Text-embedding-ada-002 Keywords Total: 10 (100% is based on these keywords).
Matching Keywords:
   – BGE-M3: Matches 7 out of 10 keywords of Text-embedding-ada-002.
   – NOMIC-EMBED-TEXT: Matches 3 out of 10 keywords of Text-embedding-ada-002.
   – MXBAI-EMBED-LARGE: Matches 1 out of 10 keywords of Text-embedding-ada-002.

This table illustrates that the BGE-M3 model is similar to “Text-embedding-ada-002,” with 70% of the keywords matching. It is followed by “NOMIC-EMBED-TEXT” at 30% and “MXBAI-EMBED-LARGE,” with the least similarity at 10%.

How does it perform when doing an approximate search with 1, 4, 8, and 16 clusters?

We execute this query within the Azure database to perform this test across each database and model we use:

SQL

create table #trab ( linha varchar( 200) null )

insert into #trab (linha) values ('Model: mxbai-embed-large')

declare @v nvarchar(max)
select @v = content_vector from dbo.wikipedia_articles_embeddings where title = 'Microsoft'

insert into #trab (linha) values ('')
insert into #trab (linha) values ('Search with 1 cluster')

insert into #trab (linha)
select w.title from 
(select top (10) id, title, text, dot_product
from [$vector].find_similar$wikipedia_articles_embeddings$content_vector(@v, 1, 0.25) 
order by dot_product desc) w
order by w.title
go

declare @v nvarchar(max)
select @v = content_vector from dbo.wikipedia_articles_embeddings where title = 'Microsoft'

insert into #trab (linha) values ('')
insert into #trab (linha) values ('Search with 4 clusters')

insert into #trab (linha)
select w.title from 
(select top (10) id, title, text, dot_product
from [$vector].find_similar$wikipedia_articles_embeddings$content_vector(@v, 4, 0.25) 
order by dot_product desc) w
order by w.title
go

declare @v nvarchar(max)
select @v = content_vector from dbo.wikipedia_articles_embeddings where title = 'Microsoft'

insert into #trab (linha) values ('')
insert into #trab (linha) values ('Search with 8 clusters')

insert into #trab (linha)
select w.title from 
(select top (10) id, title, text, dot_product
from [$vector].find_similar$wikipedia_articles_embeddings$content_vector(@v, 8, 0.25) 
order by dot_product desc) w
order by w.title
go

declare @v nvarchar(max)
select @v = content_vector from dbo.wikipedia_articles_embeddings where title = 'Microsoft'

insert into #trab (linha) values ('')
insert into #trab (linha) values ('Search with 16 clusters')

insert into #trab (linha)
select w.title from 
(select top (10) id, title, text, dot_product
from [$vector].find_similar$wikipedia_articles_embeddings$content_vector(@v, 16, 0.25) 
order by dot_product desc) w
order by w.title
go

select * from #trab

drop table #trab

Follow the results overview:

Based on the previous detailed list, here are the calculations for the percentage of similarity:

1. Total Distinct Keywords in Text-embedding-ada-002: 10 (100% based on these keywords)

2. Keywords in each Cluster Search:

– BGE-M3: 5 keywords (Microsoft, Microsoft Office, Microsoft Windows, Microsoft Word, MSN)

– NOMIC-EMBED-TEXT: 4 keywords (Microsoft, MSN, Nokia, Outlook.com)

– MXBAI-EMBED-LARGE: 2 keywords (Microsoft, Nokia)

Here’s the updated table with the percentage similarity for searches with 1, 4, 8, and 16 clusters:

This table shows the similarity percentages for each model across different cluster configurations compared to the “text-embedding-ada-002” model. Each model retains a consistent similarity percentage across all cluster numbers, indicating that the cluster configuration did not affect the keywords searched for in these cases.

To execute the Python code to embed the vectors, first, you have to install Ollama

How Did You Set Up a Local Model Using Ollama?

To run an Ollama model with your GPU, you can use the official Docker image provided by Ollama. The Docker image supports Nvidia GPUs and can be installed using the NVIDIA Container Toolkit. Here are the steps to get started:

Install Docker: Download and install Docker Desktop or Docker Engine, depending on your operating system.
Select and Pull the Ollama Model: Choose a preferred model from the Ollama library, such as nomic-embed-text or mxbai-embed-large, and pull it using the following command: docker pull ollama/ollama..
Run the Ollama Docker Image: Execute Docker run commands to set up the Ollama container. You can configure it specifically for either CPU or Nvidia GPU environments. Run the Docker container with the following command: docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama.
You can now run the Ollama model using the following command: docker exec -it ollama ollama run nomic-embed-text or docker exec -it ollama ollama run mxbai-embed-large .
Access and Use the Model: To start interacting with your model, utilize the Ollama WebUI by navigating to the local address provided (typically http://localhost:11434).

Please note that the above commands assume you have already installed Docker on your system. If you haven’t installed Docker yet, you can download it from the official Docker website.

You can also download and install Ollama on Windows:

How do you convert text into embedding using the local model using Ollama?

After setting up your local model with Ollama, you can use the following Python script to convert text into embeddings:

Python

# Importing necessary libraries and modules
import os
import pyodbc  # SQL connection library for Microsoft databases
import requests  # For making HTTP requests
from dotenv import load_dotenv  # To load environment variables from a .env file
import numpy as np  # Library for numerical operations
from sklearn.preprocessing import normalize  # For normalizing data
import json  # For handling JSON data
from db.utils import NpEncoder  # Custom JSON encoder for numpy data types

# Load environment variables from a .env file located in the same directory
load_dotenv()

# This is the connection string for connecting to the Azure SQL database we are getting from the environment variables
#MSSQL='Driver={ODBC Driver 17 for SQL Server};Server=localhost;Database=<DATABASE NAME>;Uid=<USER>;Pwd=<PASSWOPRD>;Encrypt=No;Connection Timeout=30;'

# Retrieve the database connection string from environment variables
dbconnectstring = os.getenv('MSSQL')

# Establish a connection to the Azure SQL database using the connection string
conn = pyodbc.connect(dbconnectstring)

def get_embedding(text, model):
    # Prepare the input text by truncating it or preprocessing if needed
    truncated_text = text

    # Make an HTTP POST request to a local server API to get embeddings for the input text
    res = requests.post(url='http://localhost:11434/api/embeddings',
                        json={
                            'model': model, 
                            'prompt': truncated_text
                        }
    )
    
    # Extract the embedding from the JSON response
    embeddings = res.json()['embedding']
    
    # Convert the embedding list to a numpy array
    embeddings = np.array(embeddings)    
    
    # Normalize the embeddings array to unit length
    nc = normalize([embeddings])
        
    # Convert the numpy array back to JSON string using a custom encoder that handles numpy types
    return json.dumps(nc[0], cls=NpEncoder )

def update_database(id, title_vector, content_vector):
    # Obtain a new cursor from the database connection
    cursor = conn.cursor()

    # Convert numpy array embeddings to string representations for storing in SQL
    title_vector_str = str(title_vector)
    content_vector_str = str(content_vector)

    # SQL query to update the embeddings in the database
    cursor.execute("""
        UPDATE wikipedia_articles_embeddings
        SET title_vector = ?, content_vector = ?
        WHERE id = ?
    """, (title_vector_str, content_vector_str, id))
    conn.commit()  # Commit the transaction to the database

def embed_and_update(model):
    # Get a cursor from the database connection
    cursor = conn.cursor()
    
    # Retrieve articles from the database that need their embeddings updated
    cursor.execute("select id, title, text from wikipedia_articles_embeddings where title_vector = '' or content_vector = '' order by id desc")
    
    for row in cursor.fetchall():
        id, title, text = row
        
        # Get embeddings for title and text
        title_vector = get_embedding(title, model)
        content_vector = get_embedding(text, model)
        
        # Print the progress with length of the generated embeddings
        print(f"Embedding article {id} - {title}", "len:", len(title_vector), len(content_vector))
        
        # Update the database with new embeddings
        update_database(id, title_vector, content_vector)

# Call the function to update embeddings using the 'nomic-embed-text' model
embed_and_update('nomic-embed-text')

# To use another model, uncomment and call the function with the different model name
# embed_and_update('mxbai-embed-large')

I’ve also created a GitHub repository with these codes; you can access it at this link.

Download the pre-calculated embeddings using OpenAI’s text-embedding-ada-002

The pre-calculated embeddings with OpenAI’s text-embedding-ada-002, both for the title and the body, of a selection of Wikipedia articles, is made available by Open AI here:

https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip

Once you have successfully embedded your text, I recommend exploring two of my blog posts that detail how to create a vector database for prompting and searching. These posts provide step-by-step guidance on utilizing Azure SQL alongside cosine similarity and KMeans algorithms for efficient and effective data retrieval.

Leveraging KMeans Compute Node for Text Similarity Analysis through Vector Search in Azure SQL

Navigating Vector Operations in Azure SQL for Better Data Insights: A Guide How to Use Generative AI to Prompt Queries in Datasets

Azure SQL Database now has native vector support

You can sign up for the private preview at this link.

This article, published by Davide Mauri and Pooja Kamath during this week’s Microsoft Build event, provides all the information.

Announcing EAP for Vector Support in Azure SQL Database – Azure SQL Devs’ Corner (microsoft.com)

Conclusion

Embedding text locally using models like Ollama presents a cost-effective, scalable solution for handling large volumes of data. By integrating these embeddings into Azure SQL databases, organizations can leverage generative AI to enhance their querying capabilities, making extracting meaningful insights from vast datasets easier. The outlined process ensures significant cost savings and enhances data security and processing efficiency.

This approach is a technical exercise and a strategic asset that can drive better decision-making and innovation across various data-intensive applications.

That’s it for today!

Sources

GitHub – Azure-Samples/azure-sql-db-vectors-kmeans: Use KMeans clustering to speed up vector search in Azure SQL DB

Ollama

How to Install and Run Ollama with Docker: A Beginner’s Guide – Collabnix

Leveraging KMeans Compute Node for Text Similarity Analysis through Vector Search in Azure SQL – Tech News & Insights (lawrence.eti.br)

Navigating Vector Operations in Azure SQL for Better Data Insights: A Guide How to Use Generative AI to Prompt Queries in Datasets – Tech News & Insights (lawrence.eti.br)

GitHub – LawrenceTeixeira/embedyourlocalmodel

Revolutionizing Corporate AI with Ollama: How Local LLMs Boost Privacy, Efficiency, and Cost Savings

The integration of Ollama into corporate environments marks a pivotal shift in the deployment and operation of large language models (LLMs). By enabling local hosting of LLMs, Ollama provides companies with enhanced privacy, greater efficiency, and significant cost reductions.

In the evolving world of artificial intelligence, the trend of deploying large language models (LLMs) locally is gaining unprecedented momentum. Traditionally dominated by cloud-based services offered by giants like Open AI, Google, and Anthropic, LLMs’ accessibility has been both a boon and a bane. While these platforms provide easy-to-use interfaces and powerful functionalities, they pose significant privacy concerns, as they can access any data processed through their systems.

In response to these concerns, the landscape is shifting. Companies and individual users prioritizing data security are increasingly turning towards solutions allowing them to operate LLMs on their hardware. This movement was galvanized by the advent of open-source models like the new Llama3, which have democratized access to powerful AI tools without the hefty price tag of proprietary systems.

However, local deployment comes with challenges, primarily regarding resource management and hardware requirements. Initial models required significant computational power, making them impractical for standard hardware. However, technological advancements such as model quantization, which compresses model weights to reduce size drastically, are making local deployment more feasible and efficient.

This blog post delves into why running LLMs locally is becoming a popular choice. It explores the benefits of enhanced privacy, reduced reliance on internet connectivity, and the potential for lower latency in applications requiring real-time data processing. As we continue to navigate the intricacies of AI deployment, the shift towards local solutions represents a critical step in balancing power and privacy in the digital age.

What is Ollama?

Ollama is an open-source application that facilitates the local operation of large language models (LLMs) directly on personal or corporate hardware. It supports a variety of models from different sources, such as Llama3, Mistral, Openchat, and many others, allowing users to run these models on their local machines without the need for continuous internet connectivity. This local deployment secures sensitive data and provides complete control over the AI models and their operation.

Enhanced Privacy

Running LLMs locally / on-premise with Ollama ensures that sensitive data remains protected within the corporate firewall, significantly reducing the risks associated with data breaches and unauthorized access often seen in cloud-based solutions. This local control is vital for industries where data governance and privacy are paramount.

Increased Efficiency

Ollama dramatically improves the performance of LLMs by reducing model inference time by up to 50% compared to traditional cloud-based platforms, depending on hardware configuration. This is primarily due to eliminating data transfer delays and enhancing response times for AI-driven applications.

Cost Savings

Ollama is notably cost-effective, eliminating many expenses associated with cloud services. By running models on local infrastructure, companies can avoid continuous subscription costs and reduce their reliance on external data management services.

10 advantages to use Ollama in the corporate environment

Using Ollama in a corporate environment can offer several distinct advantages, particularly for companies leveraging local large language models (LLMs) for various applications. Here are ten advantages based on the capabilities and features of Ollama:

Local Data Control: Ollama allows for the local running of models, which ensures all data processed remains within the company’s infrastructure, enhancing security and privacy.
Customization and Flexibility: Thanks to Ollama’s support for customizable prompts and parameters, companies can customize models to suit specific needs or requirements.
Cross-Platform Compatibility: Ollama supports multiple operating systems, including Windows, macOS, and Linux, facilitating integration into diverse IT environments.
GPU Acceleration: Ollama can leverage GPU acceleration to speed up model inference, which is particularly useful for computationally intensive tasks.
Ease of Integration: It integrates seamlessly with Python, the leading programming language for data science and machine learning, allowing for easy incorporation into existing projects.
Support for Multimodal Models: Ollama supports multimodal LLMs, enabling the processing of both text and image data within the same model, which is beneficial for tasks requiring analysis of varied data types.
Community and Open Source: Being an open-source tool, Ollama benefits from community contributions, which continually enhance its capabilities and features.
Enhanced AI Capabilities: Ollama can be paired with tools like Langchain to create sophisticated applications like Retrieval-Augmented Generation systems, enhancing the depth and contextuality of responses.
Web and Desktop Applications: There are numerous open-source clients and frameworks that facilitate the deployment of Ollama on both web and desktop platforms, enhancing accessibility and user interaction.
Retrieval Capabilities: Ollama has robust retrieval features that can be utilized to fetch relevant information from large datasets, which can significantly improve the effectiveness of language models in generating informed and accurate outputs.

These advantages make Ollama a powerful and versatile choice for organizations looking to leverage advanced AI capabilities while maintaining control over their data and computational infrastructure.

Pros and Cons of Ollama: A Detailed Analysis

Pros of Ollama

Data Privacy:
Ollama ensures that all sensitive data is processed and stored locally, preventing external access and significantly mitigating the risk of data breaches. This is especially crucial for industries that handle sensitive information, such as healthcare and finance, where data privacy regulations are stringent.
Cost-Effectiveness:
By hosting LLMs locally, Ollama eliminates the need for costly cloud service subscriptions and data transfer fees. This can result in substantial long-term savings, particularly for organizations that require extensive data processing capabilities.
Customization:
Ollama provides extensive customization options that allow users to tailor models to specific business needs. This includes adjusting model parameters, integrating unique data sets, and modifying the model’s behavior to better align with organizational goals.
Ease of Setup:
Despite its advanced capabilities, Ollama offers a user-friendly installation process that is well-documented and supported for macOS and Linux. This simplifies the deployment of LLMs, making it accessible even to those with limited IT infrastructure.

Cons of Ollama

Complexity for Beginners:
The reliance on command-line interfaces can be a barrier for users without technical expertise. Although powerful, the CLI approach requires a learning curve that might deter non-technical users from fully leveraging the platform’s capabilities.
Hardware Requirements:
Running LLMs locally requires substantial computational resources, particularly for larger models. These can include high-end GPUs and significant memory allocation might be beyond the reach of small—to medium-sized enterprises without the necessary IT infrastructure.
Scalability Challenges:
While not previously mentioned, scalability can be a concern with Ollama. Unlike cloud services offering on-demand scalability, local deployment means scaling up operations often requires additional physical infrastructure. This can involve considerable investment in hardware and maintenance as needs grow.

Overall, Ollama presents a compelling option for organizations looking to maintain control over their AI operations with a focus on privacy, cost savings, and customization. However, the potential technical and infrastructural challenges must be carefully considered to ensure that they align with the organization’s capabilities and long-term strategy.

Real-World Applications of Ollama in Organizations

Financial Sector – Fraud Detection:
Banks could use Ollama to run models that analyze transaction patterns on local servers, ensuring sensitive financial data remains secure while detecting potential fraudulent activities in real-time.
Healthcare – Patient Data Analysis:
Hospitals might deploy Ollama to analyze patient records locally to ensure compliance with health data privacy regulations (like HIPAA in the U.S.), while utilizing AI to predict patient outcomes or personalize treatment plans.
Legal – Document Review:
Law firms could utilize Ollama for in-house document review systems, allowing lawyers to quickly parse through large volumes of legal documents without exposing client-sensitive information to third-party cloud providers.
Retail – Customer Service Automation:
Retail companies could implement Ollama to run customer service bots locally, handling inquiries and complaints while ensuring all customer data stays within the company’s control.
Telecommunications – Network Optimization:
Telecom companies might use Ollama to process data from network traffic locally to predict and prevent outages and optimize network performance without the latency involved in cloud processing.
Manufacturing – Predictive Maintenance:
Manufacturing firms could deploy Ollama to analyze machinery sensor data on-premises, predicting failures and scheduling maintenance without the need to send potentially sensitive operational data to the cloud.
Education – Personalized Learning:
Educational institutions might use Ollama to run models that adapt learning content based on student performance data stored and processed locally, enhancing student privacy and data security.
Real Estate – Market Analysis:
Real estate agencies could employ Ollama to analyze local market trends and client preferences securely on their servers, aiding in personalized property recommendations without exposing client data externally.
Media and Entertainment – Content Recommendation:
Media companies could use Ollama to host recommendation systems on local servers, processing user data to personalize content recommendations while keeping user preferences confidential and secure.
Automotive – Autonomous Vehicle Development:
Automotive companies might deploy Ollama locally in research centers to develop and test AI models for autonomous vehicles, processing large volumes of sensor data securely on-premises.

These examples illustrate the versatility of Ollama in various industries, highlighting its benefits in terms of data security, compliance, and operational efficiency.

How to install Ollama?

You can visit the official Ollama website or use the instructions below to set up Ollama using Docker.

Step 1: Install Docker

Before you can run Ollama in a Docker container, you need to have Docker installed on your system. If it’s not already installed, you can download and install Docker from the official Docker website. This process varies depending on your operating system (Windows, macOS, or Linux).

Step 2: Pull the Ollama Docker Image

Once Docker is installed, you can pull the Ollama Docker image from the Docker Hub or any other registry where it’s hosted. Open your terminal or command prompt and run the following command:

docker pull ollama/ollama

This command downloads the Ollama image to your local machine, allowing you to run it inside a Docker container.

Step 3: Run Ollama Using Docker

To start an Ollama container, use the Docker run command. This command creates a new container and starts it. Here’s how you can run the Ollama Docker container:

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

The -it flag attaches an interactive terminal to the Docker container, allowing you to interact with Ollama directly.

Model Customization and Advanced Setup

If you need to customize Ollama’s behavior or use specific models, you can modify the Docker command to mount a directory from your host into the container. This is useful for providing custom Modelfiles or accessing specific datasets:

docker exec -it ollama run llama3:70b

Using the steps outlined in this guide, you can switch to any model you prefer. Here’s a link to a tutorial that shows you how.

docker exec -it ollama run <model>

Following these steps, you can easily set up and run Ollama in a Docker environment, making it more portable and easier to manage across different machines and platforms.

Build Chatbot on Llama 3 with Ollama Locally

In this guide, we will walk through the steps to set up Ollama with the Llama 3 model and deploy a local ChatBot interface. This process allows users to interact with the powerful Llama 3 AI model locally, enhancing privacy and customizability. OLama is a tool designed to simplify the installation and management of large language models on local systems. We’ll also cover the setup of a chatbot interface using the ChatBot OLama tool developed by Ivan.

Pre-Requisites

Before beginning the installation, ensure the following prerequisites are met:

Operating System: Ubuntu 22.02 or a compatible Linux distribution.
Installed Software:

Docker: For running containerized applications.
Node.js: Latest version, for running JavaScript server-side.
npm (Node Package Manager): For managing JavaScript packages.

Step-by-Step Setup

Step 1: Install OLlama

Download Ollama: Use the curl command to download and install OLama on your local system. If Ollama is already installed, you can skip this step.

   curl -s https://example.com/install-olama | sudo bash

Step 2: Verify Installation

Check Installed Software: Ensure Docker, Node.js, and npm are correctly installed by checking their versions.

   docker --version
   node --version
   npm --version

Run Ollama List: Verify that Ollama is running and list the installed models.

   olama list

Step 3: Download and Run Llama 3

Download Llama 3 Model: Use Ollama to download the Llama 3 model.

   olama run lama3

Wait for Download and Verification: Ollama will download the model and verify its checksum automatically.

Step 4: Deploy the ChatBot Interface

Clone ChatBot Ollama Repository: Clone the repository containing the ChatBot interface.

   git clone https://github.com/ivan/chatbot-olama.git
   cd chatbot-olama

Install Dependencies: Use npm to install necessary dependencies.

   npm install

Configure .env File: Create and configure the .env file to specify your OLama host IP and port.

   echo "OLAMA_HOST=http://0.0.0.0:3000" > .env

Run the ChatBot Interface:

   npm run dev

Step 5: Access the ChatBot UI

Open a Web Browser: Navigate to http://localhost:3000 to access the ChatBot UI.
Interact with Llama 3: Use the interface to send queries and receive responses from Llama 3.

This project is based on chatbot-ui by Mckay Wrigley.

Here are another alternatives for running large language models (LLMs) locally besides Ollama

Hugging Face and Transformers: This method involves using the Hugging Face library to run various models like GPT-2. You’ll need to download and set up the model manually using the Transformers library. It’s ideal for experimentation and learning due to its extensive library of models and easy-to-use code snippets.
LangChain: A Python framework that simplifies building AI applications on top of LLMs. It provides useful abstractions and middleware to develop AI applications, making it easier to manage models and integrate AI into your applications.
LM Studio: A comprehensive tool for running LLMs locally, allowing experimentation with different models, usually sourced from the HuggingFace repository. It provides a chat interface and an OpenAI-compatible local server, making it suitable for more advanced users who need a robust environment for LLM experimentation.
GPT4All: This desktop application is user-friendly and supports a variety of models. It includes a GUI for easy interaction and can process local documents for privacy-focused applications. GPT4All is particularly noted for its streamlined user experience.
Google Gemma: A commercial alternative known for being a lightweight, state-of-the-art model assisting developers in building AI responsibly. It is popular among Windows users.
Devin and Devika: Web-based tools that are designed to automate tasks and manage AI-driven projects without requiring extensive coding knowledge. These platforms focus on enhancing productivity and supporting engineers by automating routine tasks.
Private GPT: Focuses on privacy, allowing you to run LLMs on your local environment without an internet connection, ensuring that no data leaves your computer. This makes it ideal for sensitive or proprietary data applications.

These options provide a range of functionalities and environments to suit different needs, whether for development, experimentation, or specific applications like task automation and privacy-focused operations.

Conclusion

Ollama is reshaping how businesses utilize AI by offering a secure, efficient, cost-effective solution for running LLMs locally. As it continues to evolve with more features and broader platform support, Ollama is expected to become a vital tool in corporate AI strategies, enabling businesses to maximize their AI capabilities while maintaining stringent data privacy and operational efficiency.

For additional details on implementing Ollama within your organization, please feel free to reach out to me using this link.

That’s it for today!

Sources:

What is Ollama? A shallow dive into running LLMs locally | Isaac Chung (isaac-chung.github.io)

https://ollama.com

6 Ways to Run LLMs Locally (also how to use HuggingFace) (semaphoreci.com)

Seven Ways of Running Large Language Models (LLMs) Locally (April 2024) (kleiber.me)