Exploring Patent Data with Artificial Intelligence: An Automated Approach Using Langchain, Open AI, Google Search API, and Browserless

Navigating the Challenges of Data Extraction in Intellectual Property: Overcoming Obstacles with the Help of AI for Enhanced Analyses. With the rapid advancement of generative AI, let’s discuss how it’s possible to automate formerly complex tasks such as analyzing a 2023 Excel file from WIPO’s official patent gazette, identifying the top 10 applicants, and generating detailed summaries. Ultimately, we’ll integrate these insights into our Data Lake, but Data Warehouse or database options are also feasible.

Increasingly, we face a range of challenges in extracting data for Intellectual Property analyses, from a lack of standardization to the technological limitations of official bodies. Often, we find ourselves resorting to manual methods to collect specific information that can be integrated into our Data Lake for comprehensive data analysis.

However, thanks to the ongoing advancements of generative Artificial Intelligence (AI), particularly following the popularization of ChatGPT in November 2022, we’re witnessing the growing ease of automating tasks previously considered unreachable through traditional programming.

In this article, I’ll demonstrate how it’s possible to read an Excel file from a 2023 official publication of the World Intellectual Property Organization (OMPI), look for the top ten applicants, and employ a robot to search for these applicants on the internet. The AI will create a summary of each of these applicants, clarifying the type of company, their business lines, global presence, and websites, among others.

The obtained information will be saved in an Excel file. However, it’s worth noting that this data can be easily transferred to a Data Lake, Data Warehouse, or any other database system you prefer to use for your data analysis needs.

What is Google Search API?

Google Search API is a tool that allows developers to create programs that can search the internet and return results. It is like a library of code that developers can use to build their own search engines or other applications that use Google’s search technology. It is an important tool for people who want to build websites or apps that use search functionality.

Website: SerpApi: Google Search API

What is Browserless?

Browserless is a cloud-based platform that allows you to automate web-browser tasks. It uses open-source libraries and REST APIs to collect data, automate sites without APIs, produce PDFs, or run synthetic testing. In other words, it is a browser-as-a-service where you can use all the power of headless Chrome, hassle-free¹. It offers first-class integrations for Puppeteer, Playwright, Selenium’s WebDriver, and a slew of handy REST APIs for doing more common work.

Website: Browserless – #1 Web Automation & Headless Browser Automation Tool

I have created an application meant to test your search for other applicants on the web. Feel free to access it here.

For accessing the OMPI’s official patent gazette file, click here, and to access the Applications_informations file generated automatically by the Python code, click here.

To learn more about Langchain, click on the link to another post provided below:

To learn more about ChatGPT API, click on the link to another post provided below:

Take a look at the Python script that extracts patent applicant information from the web. It’s organized into two distinct sections: the functions section and the app section.

1 – functions.py

Python
import os
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Type
from bs4 import BeautifulSoup
import requests
import json
from langchain.schema import SystemMessage

from dotenv import load_dotenv

load_dotenv()

brwoserless_api_key = os.getenv("BROWSERLESS_API_KEY")
serper_api_key = os.getenv("SERP_API_KEY")

# 1. Tool for search

def search(query):
    url = "https://google.serper.dev/search"

    payload = json.dumps({
        "q": query
    })

    headers = {
        'X-API-KEY': serper_api_key,
        'Content-Type': 'application/json'
    }

    response = requests.request("POST", url, headers=headers, data=payload)

    print(response.text)

    return response.text


# 2. Tool for scraping
def scrape_website(objective: str, url: str):
    # scrape website, and also will summarize the content based on objective if the content is too large
    # objective is the original objective & task that user give to the agent, url is the url of the website to be scraped

    print("Scraping website...")
    # Define the headers for the request
    headers = {
        'Cache-Control': 'no-cache',
        'Content-Type': 'application/json',
    }

    # Define the data to be sent in the request
    data = {
        "url": url
    }

    # Convert Python object to JSON string
    data_json = json.dumps(data)

    # Send the POST request
    post_url = f"https://chrome.browserless.io/content?token={brwoserless_api_key}"
    response = requests.post(post_url, headers=headers, data=data_json)

    # Check the response status code
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, "html.parser")
        text = soup.get_text()
        print("CONTENTTTTTT:", text)

        if len(text) > 10000:
            output = summary(objective, text)
            return output
        else:
            return text
    else:
        print(f"HTTP request failed with status code {response.status_code}")


def summary(objective, content):
    llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k-0613")

    text_splitter = RecursiveCharacterTextSplitter(
        separators=["\n\n", "\n"], chunk_size=10000, chunk_overlap=500)
    docs = text_splitter.create_documents([content])
    map_prompt = """
    Write a summary of the following text for {objective}:
    "{text}"
    SUMMARY:
    """
    map_prompt_template = PromptTemplate(
        template=map_prompt, input_variables=["text", "objective"])

    summary_chain = load_summarize_chain(
        llm=llm,
        chain_type='map_reduce',
        map_prompt=map_prompt_template,
        combine_prompt=map_prompt_template,
        verbose=True
    )

    output = summary_chain.run(input_documents=docs, objective=objective)

    return output


class ScrapeWebsiteInput(BaseModel):
    """Inputs for scrape_website"""
    objective: str = Field(
        description="The objective & task that users give to the agent")
    url: str = Field(description="The url of the website to be scraped")


class ScrapeWebsiteTool(BaseTool):
    name = "scrape_website"
    description = "useful when you need to get data from a website url, passing both url and objective to the function; DO NOT make up any url, the url should only be from the search results"
    args_schema: Type[BaseModel] = ScrapeWebsiteInput

    def _run(self, objective: str, url: str):
        return scrape_website(objective, url)

    def _arun(self, url: str):
        raise NotImplementedError("error here")

2 – extract.py

Python
import pandas as pd
from functions import search, ScrapeWebsiteTool
from langchain.agents import initialize_agent, Tool
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from langchain.chat_models import ChatOpenAI
from langchain.prompts import MessagesPlaceholder
from langchain.memory import ConversationSummaryBufferMemory
from langchain.schema import SystemMessage

# 3. Create langchain agent with the tools above
tools = [
    Tool(
        name="Search",
        func=search,
        description="useful for when you need to answer questions about current events, data. You should ask targeted questions"
    ),
    ScrapeWebsiteTool(),
]

system_message = SystemMessage(
    content="""You are a world class researcher, who can do detailed research on any topic and produce facts based results; 
            you do not make things up, you will try as hard as possible to gather facts & data to back up the research
            
            Please make sure you complete the objective above with the following rules:
            1/ You should do enough research to gather as much information as possible about the objective
            2/ If there are url of relevant links & articles, you will scrape it to gather more information
            3/ After scraping & search, you should think "is there any new things i should search & scraping based on the data I collected to increase research quality?" If answer is yes, continue; But don't do this more than 3 iteratins
            4/ You should not make things up, you should only write facts & data that you have gathered
            5/ In the final output, You should include all reference data & links to back up your research; You should include all reference data & links to back up your research
            6/ In the final output, You should include all reference data & links to back up your research; You should include all reference data & links to back up your research"""
)

agent_kwargs = {
    "extra_prompt_messages": [MessagesPlaceholder(variable_name="memory")],
    "system_message": system_message,
}

llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k-0613")
memory = ConversationSummaryBufferMemory(
    memory_key="memory", return_messages=True, llm=llm, max_token_limit=1000)

agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.OPENAI_FUNCTIONS,
    verbose=True,
    agent_kwargs=agent_kwargs,
    memory=memory,
)

# Read the excel file using pandas
data = pd.read_excel('https://lawrence.eti.br/wp-content/uploads/2023/07/2023.xlsx')

# Print the first few rows
print(data.head())

# Assuming 'Applicant' is a column in your excel file
top_applicants = data['Applicant'].value_counts().nlargest(10)
print(top_applicants)

# Prepare an empty list to store the results
results = []

# Iterate over each applicant and their count
for applicant_name, count in top_applicants.items():
    first_word = str(applicant_name).split()[0]
    print('First word of applicant: ', first_word)
    
    # You can now use first_word in your agent function
    result = agent({"input": first_word})
    print('Applicant :', applicant_name, 'Information: ',result['output'])

    # Append the result into the results list
    results.append({'Applicant': applicant_name, 'Information': result['output']})

# Convert the results list into a DataFrame
results_df = pd.DataFrame(results)

# Save the DataFrame into an Excel file
results_df.to_excel("Applicants_Informations.xlsx", index=False)

Upon executing this Python script, you’ll observe the following in the console:

PowerShell
(.venv) PS D:\researcher\researcher-gpt> & d:/researcher/researcher-gpt/.venv/Scripts/python.exe d:/researcher/researcher-gpt/extract.py

  Publication Number Publication Date  ...                     Applicant                                                Url
0     WO/2023/272317       2023-01-05  ...  INNIO JENBACHER GMBH & CO OG  http://patentscope.wipo.int/search/en/WO202327...
1     WO/2023/272318       2023-01-05  ...                  STIRTEC GMBH  http://patentscope.wipo.int/search/en/WO202327...
2     WO/2023/272319       2023-01-05  ...                 SENDANCE GMBH  http://patentscope.wipo.int/search/en/WO202327...
3     WO/2023/272320       2023-01-05  ...                  HOMER, Alois  http://patentscope.wipo.int/search/en/WO202327...
4     WO/2023/272321       2023-01-05  ...      TGW LOGISTICS GROUP GMBH  http://patentscope.wipo.int/search/en/WO202327...
[5 rows x 8 columns]

Applicant
HUAWEI TECHNOLOGIES CO., LTD.                           3863
SAMSUNG ELECTRONICS CO., LTD.                           2502
QUALCOMM INCORPORATED                                   1908
GUANGDONG OPPO MOBILE TELECOMMUNICATIONS CORP., LTD.    1186
LG ELECTRONICS INC.                                     1180
ZTE CORPORATION                                         1134
TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)                  1039
CONTEMPORARY AMPEREX TECHNOLOGY CO., LIMITED             987
LG ENERGY SOLUTION, LTD.                                 967
NIPPON TELEGRAPH AND TELEPHONE CORPORATION               946

Entering new AgentExecutor chain…
Huawei is a Chinese multinational technology company that specializes in telecommunications equipment and consumer electronics. It was founded in 1987 by Ren Zhengfei and is headquartered in Shenzhen, Guangdong, China. Huawei is one of the largest telecommunications equipment manufacturers in the world and is also a leading provider of smartphones and other consumer devices.

Here are some key points about Huawei:

  1. Telecommunications Equipment: Huawei is a major player in the telecommunications industry, providing a wide range of equipment and solutions for network infrastructure, including 5G technology, mobile networks, broadband networks, and optical networks. The company offers products such as base stations, routers, switches, and optical transmission systems.
  2. Consumer Devices: Huawei is known for its smartphones, tablets, smartwatches, and other consumer electronics. The company’s smartphone lineup includes flagship models under the Huawei brand, as well as budget-friendly devices under the Honor brand. Huawei smartphones are known for their advanced camera technology and innovative features.
  3. Research and Development: Huawei invests heavily in research and development (R&D) to drive innovation and technological advancements. The company has established numerous R&D centers worldwide and collaborates with universities and research institutions to develop cutting-edge technologies. Huawei is particularly focused on areas such as 5G, artificial intelligence (AI), cloud computing, and Internet of Things (IoT).
  4. Global Presence: Huawei operates in over 170 countries and serves more than three billion people worldwide. The company has established a strong presence in both developed and emerging markets, offering its products and services to telecommunications operators, enterprises, and consumers.
  5. Controversies: Huawei has faced several controversies and challenges in recent years. The company has been accused by the United States government of posing a national security threat due to concerns over its alleged ties to the Chinese government. As a result, Huawei has faced restrictions and bans in some countries, limiting its access to certain markets.

For more detailed information about Huawei, you can refer to the following sources:

Please let me know if there is anything specific you would like to know about Huawei.

Finished chain.
Applicant : HUAWEI TECHNOLOGIES CO., LTD. Information: Huawei is a Chinese multinational technology company that specializes in telecommunications equipment and consumer electronics. It was founded in 1987 by Ren Zhengfei and is headquartered in Shenzhen, Guangdong, China. Huawei is one of the largest telecommunications equipment manufacturers in the world and is also a leading provider of smartphones and other consumer devices.

Entering new AgentExecutor chain…
Samsung is a South Korean multinational conglomerate that operates in various industries, including electronics, shipbuilding, construction, and more. It was founded in 1938 by Lee Byung-chul and is headquartered in Samsung Town, Seoul, South Korea. Samsung is one of the largest and most well-known technology companies in the world.

Here are some key points about Samsung:

  1. Electronics: Samsung Electronics is the most prominent subsidiary of the Samsung Group and is known for its wide range of consumer electronics products. The company manufactures and sells smartphones, tablets, televisions, home appliances, wearable devices, and other electronic gadgets. Samsung is particularly renowned for its flagship Galaxy smartphones and QLED televisions.
  2. Semiconductor: Samsung is a major player in the semiconductor industry. The company designs and manufactures memory chips, including DRAM (Dynamic Random Access Memory) and NAND flash memory, which are widely used in various electronic devices. Samsung is one of the leading suppliers of memory chips globally.
  3. Display Technology: Samsung is a leader in display technology and is known for its high-quality screens. The company produces a variety of displays, including OLED (Organic Light Emitting Diode) panels, LCD (Liquid Crystal Display) panels, and AMOLED (Active Matrix Organic Light Emitting Diode) panels. Samsung’s displays are used in smartphones, televisions, monitors, and other devices.
  4. Home Appliances: Samsung manufactures a range of home appliances, including refrigerators, washing machines, air conditioners, vacuum cleaners, and kitchen appliances. The company focuses on incorporating innovative features and smart technology into its appliances to enhance user experience and energy efficiency.
  5. Global Presence: Samsung has a strong global presence and operates in numerous countries around the world. The company has manufacturing facilities, research centers, and sales offices in various locations, allowing it to cater to a wide customer base.
  6. Research and Development: Samsung invests heavily in research and development to drive innovation and stay at the forefront of technology. The company has established multiple R&D centers globally and collaborates with universities and research institutions to develop new technologies and products.

For more detailed information about Samsung, you can refer to the following sources:

Please let me know if there is anything specific you would like to know about Samsung.

Finished chain.
Applicant : SAMSUNG ELECTRONICS CO., LTD. Information: Samsung is a South Korean multinational conglomerate that operates in various industries, including electronics, shipbuilding, construction, and more. It was founded in 1938 by Lee Byung-chul and is headquartered in Samsung Town, Seoul, South Korea. Samsung is one of the largest and most well-known technology companies in the world.

Entering new AgentExecutor chain…
Qualcomm Incorporated, commonly known as Qualcomm, is an American multinational semiconductor and telecommunications equipment company. It was founded in 1985 by Irwin M. Jacobs, Andrew Viterbi, Harvey White, and Franklin Antonio. The company is headquartered in San Diego, California, United States.

Here are some key points about Qualcomm:

  1. Semiconductors: Qualcomm is a leading provider of semiconductors and system-on-chip (SoC) solutions for various industries, including mobile devices, automotive, networking, and IoT (Internet of Things). The company designs and manufactures processors, modems, and other semiconductor components that power smartphones, tablets, wearables, and other electronic devices.
  2. Mobile Technologies: Qualcomm is widely recognized for its contributions to mobile technologies. The company has developed numerous innovations in wireless communication, including CDMA (Code Division Multiple Access) technology, which has been widely adopted in mobile networks worldwide. Qualcomm’s Snapdragon processors are widely used in smartphones and tablets, offering high performance and power efficiency.
  3. 5G Technology: Qualcomm is at the forefront of 5G technology development. The company has been instrumental in driving the adoption and commercialization of 5G networks and devices. Qualcomm’s 5G modems and SoCs enable faster data speeds, lower latency, and enhanced connectivity for a wide range of applications.
  4. Licensing and Intellectual Property: Qualcomm holds a significant portfolio of patents related to wireless communication technologies. The company licenses its intellectual property to other manufacturers, generating a substantial portion of its revenue through licensing fees. Qualcomm’s licensing practices have been the subject of legal disputes and regulatory scrutiny in various jurisdictions.
  5. Automotive and IoT: In addition to mobile devices, Qualcomm provides solutions for the automotive industry and IoT applications. The company offers connectivity solutions, processors, and software platforms for connected cars, telematics, and smart home devices. Qualcomm’s technologies enable advanced features such as vehicle-to-vehicle communication, infotainment systems, and autonomous driving capabilities.
  6. Research and Development: Qualcomm invests heavily in research and development to drive innovation and stay competitive in the rapidly evolving technology landscape. The company has research centers and collaborations with academic institutions worldwide, focusing on areas such as wireless communication, AI (Artificial Intelligence), and IoT.

For more detailed information about Qualcomm, you can refer to the following sources:

Please let me know if there is anything specific you would like to know about Qualcomm.

Finished chain.
Applicant : QUALCOMM INCORPORATED Information: Qualcomm Incorporated, commonly known as Qualcomm, is an American multinational semiconductor and telecommunications equipment company. It was founded in 1985 by Irwin M. Jacobs, Andrew Viterbi, Harvey White, and Franklin Antonio. The company is headquartered in San Diego, California, United States.

Entering new AgentExecutor chain…
Guangdong is a province located in the southern part of China. It is one of the most populous and economically prosperous provinces in the country. Here are some key points about Guangdong:

  1. Location and Geography: Guangdong is situated on the southern coast of China, bordering the South China Sea. It is adjacent to Hong Kong and Macau, two Special Administrative Regions of China. The province covers an area of approximately 180,000 square kilometers (69,500 square miles) and has a diverse landscape, including mountains, plains, and coastline.
  2. Population: Guangdong has a large population, making it the most populous province in China. As of 2020, the estimated population of Guangdong was over 115 million people. The province is known for its cultural diversity, with various ethnic groups residing there, including Han Chinese, Cantonese, Hakka, and others.
  3. Economy: Guangdong is one of the economic powerhouses of China. It has a highly developed and diversified economy, contributing significantly to the country’s GDP. The province is known for its manufacturing and export-oriented industries, including electronics, textiles, garments, toys, furniture, and more. Guangdong is home to many multinational corporations and industrial zones, attracting foreign investment and driving economic growth.
  4. Trade and Ports: Guangdong has several major ports that play a crucial role in international trade. The Port of Guangzhou, Port of Shenzhen, and Port of Zhuhai are among the busiest and most important ports in China. These ports facilitate the import and export of goods, connecting Guangdong with global markets.
  5. Tourism: Guangdong offers a rich cultural heritage and natural attractions, attracting tourists from both within China and abroad. The province is known for its historical sites, such as the Chen Clan Ancestral Hall, Kaiping Diaolou and Villages, and the Mausoleum of the Nanyue King. Guangdong also has popular tourist destinations like Shenzhen, Guangzhou, Zhuhai, and the scenic areas of the Pearl River Delta.
  6. Cuisine: Guangdong cuisine, also known as Cantonese cuisine, is renowned worldwide. It is one of the eight major culinary traditions in China. Guangdong dishes are characterized by their freshness, delicate flavors, and emphasis on seafood. Dim sum, roast goose, sweet and sour dishes, and various types of noodles are popular examples of Guangdong cuisine.

For more detailed information about Guangdong, you can refer to the following sources:

Please let me know if there is anything specific you would like to know about Guangdong.

Finished chain.
Applicant : GUANGDONG OPPO MOBILE TELECOMMUNICATIONS CORP., LTD. Information: Guangdong is a province located in the southern part of China. It is one of the most populous and economically prosperous provinces in the country. Here are some key points about Guangdong:

Entering new AgentExecutor chain…
LG Corporation, formerly known as Lucky-Goldstar, is a multinational conglomerate based in South Korea. It is one of the largest and most well-known companies in the country. Here are some key points about LG:

  1. Company Overview: LG Corporation is a diversified company with operations in various industries, including electronics, chemicals, telecommunications, and more. It was founded in 1947 and has its headquarters in Seoul, South Korea. LG operates through numerous subsidiaries and affiliates, with a global presence in over 80 countries.
  2. Electronics: LG is widely recognized for its consumer electronics products. The company manufactures and sells a wide range of electronic devices, including televisions, refrigerators, washing machines, air conditioners, smartphones, and home appliances. LG’s electronics division is known for its innovative designs, advanced technologies, and high-quality products.
  3. LG Electronics: LG Electronics is a subsidiary of LG Corporation and focuses on the development, production, and sale of consumer electronics. It is one of the leading manufacturers of televisions and smartphones globally. LG’s OLED TVs are highly regarded for their picture quality, and the company’s smartphones have gained popularity for their features and design.
  4. Chemicals: LG also has a significant presence in the chemical industry. The company produces a wide range of chemical products, including petrochemicals, industrial materials, and specialty chemicals. LG Chem, a subsidiary of LG Corporation, is one of the largest chemical companies in the world and is involved in the production of batteries for electric vehicles and energy storage systems.
  5. Home Appliances: LG is a major player in the home appliance market. The company offers a comprehensive range of home appliances, including refrigerators, washing machines, dishwashers, vacuum cleaners, and air purifiers. LG’s home appliances are known for their energy efficiency, smart features, and innovative technologies.
  6. Telecommunications: LG has a presence in the telecommunications industry through its subsidiary, LG Electronics. The company manufactures and sells smartphones, tablets, and other mobile devices. LG smartphones have gained recognition for their unique features, such as dual screens and high-quality cameras.
  7. Research and Development: LG places a strong emphasis on research and development (R&D) to drive innovation and technological advancements. The company invests a significant amount in R&D activities across its various business sectors, focusing on areas such as artificial intelligence, 5G technology, and smart home solutions.

For more detailed information about LG Corporation, you can refer to the following sources:

Please let me know if there is anything specific you would like to know about LG.

Finished chain.
Applicant : LG ELECTRONICS INC. Information: LG Corporation, formerly known as Lucky-Goldstar, is a multinational conglomerate based in South Korea. It is one of the largest and most well-known companies in the country. Here are some key points about LG:

Entering new AgentExecutor chain…
ZTE Corporation is a Chinese multinational telecommunications equipment and systems company. It is one of the largest telecommunications equipment manufacturers in the world. Here are some key points about ZTE:

  1. Company Overview: ZTE Corporation was founded in 1985 and is headquartered in Shenzhen, Guangdong, China. It operates in three main business segments: Carrier Networks, Consumer Business, and Government and Corporate Business. ZTE provides a wide range of products and solutions for telecommunications operators, businesses, and consumers.
  2. Telecommunications Equipment: ZTE is primarily known for its telecommunications equipment and solutions. The company offers a comprehensive portfolio of products, including wireless networks, fixed-line networks, optical transmission, data communication, and mobile devices. ZTE’s equipment is used by telecommunications operators worldwide to build and upgrade their networks.
  3. 5G Technology: ZTE has been actively involved in the development and deployment of 5G technology. The company has made significant contributions to the advancement of 5G networks and has been a key player in the global 5G market. ZTE provides end-to-end 5G solutions, including infrastructure equipment, devices, and software.
  4. Mobile Devices: In addition to its telecommunications equipment business, ZTE also manufactures and sells mobile devices. The company offers a range of smartphones, tablets, and other mobile devices under its own brand. ZTE smartphones are known for their competitive features and affordability.
  5. International Presence: ZTE has a global presence and operates in over 160 countries. The company has established partnerships with telecommunications operators and businesses worldwide, enabling it to expand its reach and market share. ZTE’s international operations contribute significantly to its revenue and growth.
  6. Research and Development: ZTE places a strong emphasis on research and development (R&D) to drive innovation and technological advancements. The company invests a significant amount in R&D activities, focusing on areas such as 5G, artificial intelligence, cloud computing, and Internet of Things (IoT).
  7. Corporate Social Responsibility: ZTE is committed to corporate social responsibility and sustainability. The company actively participates in various social and environmental initiatives, including education, poverty alleviation, disaster relief, and environmental protection.

For more detailed information about ZTE Corporation, you can refer to the following sources:

Please let me know if there is anything specific you would like to know about ZTE.

Finished chain.
Applicant : ZTE CORPORATION Information: ZTE Corporation is a Chinese multinational telecommunications equipment and systems company. It is one of the largest telecommunications equipment manufacturers in the world. Here are some key points about ZTE:

Entering new AgentExecutor chain…
Telefonaktiebolaget LM Ericsson, commonly known as Ericsson, is a Swedish multinational telecommunications company. Here are some key points about Ericsson:

  1. Company Overview: Ericsson was founded in 1876 and is headquartered in Stockholm, Sweden. It is one of the leading providers of telecommunications equipment and services globally. The company operates in four main business areas: Networks, Digital Services, Managed Services, and Emerging Business.
  2. Networks: Ericsson’s Networks business focuses on providing infrastructure solutions for mobile and fixed networks. The company offers a wide range of products and services, including radio access networks, core networks, transport solutions, and network management systems. Ericsson’s network equipment is used by telecommunications operators worldwide to build and operate their networks.
  3. Digital Services: Ericsson’s Digital Services business provides software and services for the digital transformation of telecommunications operators. This includes solutions for cloud infrastructure, digital business support systems, and network functions virtualization. Ericsson helps operators evolve their networks and services to meet the demands of the digital era.
  4. Managed Services: Ericsson offers managed services to telecommunications operators, helping them optimize their network operations and improve efficiency. The company provides services such as network design and optimization, network rollout, and network operations and maintenance. Ericsson’s managed services enable operators to focus on their core business while leveraging Ericsson’s expertise.
  5. Emerging Business: Ericsson’s Emerging Business focuses on exploring new business opportunities and technologies. This includes areas such as Internet of Things (IoT), 5G applications, and industry digitalization. Ericsson collaborates with partners and customers to develop innovative solutions and drive digital transformation in various industries.
  6. Global Presence: Ericsson has a global presence and operates in more than 180 countries. The company works closely with telecommunications operators, enterprises, and governments worldwide to deliver advanced communication solutions. Ericsson’s global reach enables it to serve a diverse range of customers and markets.
  7. Research and Development: Ericsson invests heavily in research and development (R&D) to drive innovation and stay at the forefront of technology. The company has research centers and innovation hubs around the world, focusing on areas such as 5G, IoT, artificial intelligence, and cloud computing. Ericsson’s R&D efforts contribute to the development of cutting-edge telecommunications solutions.

For more detailed information about Ericsson, you can refer to the following sources:

Please let me know if there is anything specific you would like to know about Ericsson.

Finished chain.
Applicant : TELEFONAKTIEBOLAGET LM ERICSSON (PUBL) Information: Telefonaktiebolaget LM Ericsson, commonly known as Ericsson, is a Swedish multinational telecommunications company. Here are some key points about Ericsson:

Entering new AgentExecutor chain…
LG Corporation, formerly known as Lucky-Goldstar, is a South Korean multinational conglomerate. Here are some key points about LG:

  1. Company Overview: LG Corporation was founded in 1947 and is headquartered in Seoul, South Korea. It is one of the largest and most well-known conglomerates in South Korea. LG operates in various industries, including electronics, chemicals, telecommunications, and services.
  2. Electronics: LG Electronics is a subsidiary of LG Corporation and is known for its wide range of consumer electronics products. This includes televisions, home appliances (such as refrigerators, washing machines, and air conditioners), smartphones, audio and video equipment, and computer products. LG Electronics is recognized for its innovative designs and advanced technologies.
  3. Chemicals: LG Chem is another subsidiary of LG Corporation and is involved in the production of various chemical products. It manufactures and supplies a range of products, including petrochemicals, industrial materials, and high-performance materials. LG Chem is known for its focus on sustainability and environmentally friendly solutions.
  4. Telecommunications: LG Corporation has a presence in the telecommunications industry through its subsidiary LG Uplus. LG Uplus is a major telecommunications provider in South Korea, offering mobile, internet, and IPTV services. The company has been actively involved in the development and deployment of 5G technology.
  5. Research and Development: LG Corporation places a strong emphasis on research and development (R&D) to drive innovation and technological advancements. The company invests significant resources in R&D activities across its various business sectors. LG’s R&D efforts have led to the development of cutting-edge products and technologies.
  6. Global Presence: LG Corporation has a global presence and operates in numerous countries worldwide. The company has manufacturing facilities, sales offices, and research centers in various regions, including North America, Europe, Asia, and Latin America. LG’s global reach enables it to cater to a diverse customer base and expand its market share.

For more detailed information about LG Corporation, you can refer to the following sources:

Please let me know if there is anything specific you would like to know about LG.

Finished chain.
Applicant : LG ENERGY SOLUTION, LTD. Information: LG Corporation, formerly known as Lucky-Goldstar, is a South Korean multinational conglomerate. Here are some key points about LG:

Entering new AgentExecutor chain…
“Nippon” is the Japanese word for Japan. It is often used to refer to the country in a more traditional or formal context. Here are some key points about Japan (Nippon):

  1. Location and Geography: Japan is an island country located in East Asia. It is situated in the Pacific Ocean and consists of four main islands: Honshu, Hokkaido, Kyushu, and Shikoku. Japan is known for its diverse geography, including mountains, volcanoes, and coastal areas.
  2. Population: Japan has a population of approximately 126 million people. It is the 11th most populous country in the world. The capital city of Japan is Tokyo, which is one of the most populous cities globally.
  3. Economy: Japan has the third-largest economy in the world by nominal GDP. It is known for its advanced technology, automotive industry, electronics, and manufacturing sectors. Major Japanese companies include Toyota, Honda, Sony, Panasonic, and Nintendo.
  4. Culture and Traditions: Japan has a rich cultural heritage and is known for its traditional arts, such as tea ceremonies, calligraphy, and flower arranging (ikebana). The country is also famous for its cuisine, including sushi, ramen, tempura, and matcha tea. Traditional Japanese clothing includes the kimono and yukata.
  5. Technology and Innovation: Japan is renowned for its technological advancements and innovation. It is a global leader in areas such as robotics, electronics, and high-speed rail. Japanese companies have made significant contributions to the development of consumer electronics and automotive technology.
  6. Tourism: Japan attracts millions of tourists each year who come to experience its unique culture, historical sites, and natural beauty. Popular tourist destinations include Tokyo, Kyoto, Osaka, Hiroshima, Mount Fuji, and the ancient temples and shrines of Nara.

For more detailed information about Japan (Nippon), you can refer to the following sources:

Please let me know if there is anything specific you would like to know about Japan.

Finished chain.
Applicant : NIPPON TELEGRAPH AND TELEPHONE CORPORATION Information: “Nippon” is the Japanese word for Japan. It is often used to refer to the country in a more traditional or formal context. Here are some key points about Japan (Nippon):

Conclusion

The challenges of data extraction in Intellectual Property have always been a roadblock to effective and efficient analyses. However, with the advent of advanced generative AI models, we’re now able to automate complex tasks that used to require manual effort. From analyzing extensive patent gazette files to identifying top applicants and generating comprehensive summaries, AI is revolutionizing the way we handle data extraction in this field.

The integration of tools such as the Google Search API and Browserless illustrates the growing potential of AI to not only enhance the accuracy of our data but also to significantly reduce the time taken for these tasks. Our discussions have shown that whether the data is to be integrated into a Data Lake, Data Warehouse, or other database options, AI capabilities make it all possible and increasingly convenient.

However, it’s important to remember that as we continue to navigate the changing landscape of Intellectual Property, staying adaptive to technological advancements is crucial. AI will continue to evolve, and as it does, the ability to utilize it to its full potential will become an invaluable asset in our field. The challenge, therefore, is not just in overcoming the obstacles of data extraction but also in keeping pace with the rapid evolution of technology, and the many benefits it brings to Intellectual Property analyses.

As we look to the future, the promise of AI in overcoming challenges and enhancing analyses in Intellectual Property is incredibly promising. While we have made significant progress, this is only the beginning of the journey. The full potential of AI in this area is yet to be completely unlocked, and its future applications may very well reshape the entire field of Intellectual Property as we know it today. This rapid evolution of technology is not something to be feared, but rather, it’s an exciting opportunity that we must embrace, and I look forward to witnessing where this journey takes us.

That’s it for today!

OpenAI has unveiled a groundbreaking new feature, the Code Interpreter, accessible to all ChatGPT Plus users. Check out my experiments using the 2739 edition of BRPTO’s Patent Gazette

Code Interpreter is an innovative extension of ChatGPT, now available to all subscribers of the ChatGPT Plus service. This tool boasts the ability to execute code, work with uploaded files, analyze data, create charts, edit files, and carry out mathematical computations. The implications of this are profound, not just for academics and coders, but for anyone looking to streamline their research processes. Code Interpreter transcends the traditional scope of AI assistants, which have primarily been limited to generating text responses. It leverages large language models, the AI technology underpinning ChatGPT, to provide a general-purpose toolbox for problem-solving.

What is the Code Interpreter?

The Code Interpreter Plugin for ChatGPT is a multifaceted addition that provides the AI chatbot with the capacity to handle data and perform a broad range of tasks. This plugin equips ChatGPT with the ability to generate and implement code in natural language, thereby streamlining data evaluation, file conversions, and more. Pioneering users have experienced its effectiveness in activities like generating GIFs and examining musical preferences. The potential of the Code Interpreter Plugin is enormous, having the capability to revolutionize coding processes and unearth novel uses. By capitalizing on ChatGPT’s capabilities, users can harness the power of this plugin, sparking a voyage of discovery and creativity.

Professor Ethan Mollick from the Wharton School of the University of Pennsylvania shares his experiences with using the Code Interpreter

Artificial intelligence is rapidly revolutionizing every aspect of our lives, particularly in the world of data analytics and computational tasks. This transition was recently illuminated by Wharton Professor Ethan Mollick who commented, “Things that took me weeks to master in my PhD were completed in seconds by the AI.” This is not just a statement about time saved or operational efficiency, but it speaks volumes about the growing capabilities of AI technologies, specifically OpenAI’s new tool for ChatGPT – Code Interpreter.

Mollick, an early adopter of AI and an esteemed academic at the Wharton School of the University of Pennsylvania lauded Code Interpreter as the most significant application of AI in the sphere of complex knowledge work. Not only does it complete intricate tasks in record time, but Mollick also noticed fewer errors than those typically expected from human analysts.

One might argue that Code Interpreter transcends the traditional scope of AI assistants, which have primarily been limited to generating text responses. It leverages large language models, the AI technology underpinning ChatGPT, to provide a general-purpose toolbox for problem-solving.

Mollick commended Code Interpreter’s use of Python, a versatile programming language known for its application in software building and data analysis. He pointed out that it closes some of the gaps in language models as the output is not entirely text-based. The code is processed through Python, which promptly flags any errors.

In practice, when given a dataset on superheroes, Code Interpreter could clean and merge the data seamlessly, with an admirable effort to maintain accuracy. This process would have been an arduous task otherwise. Additionally, it allows a back-and-forth interaction during data visualization, accommodating various alterations and enhancements.

Remarkably, Code Interpreter doesn’t just perform pre-set analyses but recommends pertinent analytical approaches. For instance, it conducted predictive modeling to anticipate a hero’s potential powers based on other factors. Mollick was struck by the AI’s human-like reasoning about data, noting the AI’s observation that the powers were often visually noticeable as they derived from the comic book medium.

Beyond its technical capabilities, Code Interpreter democratizes access to complex data analysis, making it accessible to more people, thereby transforming the future of work. It saves time and reduces the tedium of repetitive tasks, enabling individuals to focus on more fulfilling, in-depth work.

Here are 10 examples of how you can use Code Interpreter for data analysis:

  1. Analyzing customer feedback data to identify trends and patterns.
  2. Creating interactive dashboards and reports for business intelligence purposes.
  3. Cleaning and transforming datasets for machine learning models.
  4. Extracting insights from social media data to inform marketing strategies.
  5. Generating charts and graphs to visualize sales data.
  6. Analyzing website traffic data to optimize the user experience.
  7. Creating custom functions and scripts for specific data analysis tasks.
  8. Performing statistical analysis on survey data.
  9. Automating repetitive data analysis tasks with Python scripts.
  10. Creating custom visualizations for presentations and reports.

How to use it? Follow my own experiments

Initially, you must upload a file of your choice. Following this, you are permitted to pose queries concerning it. The interpreter for the code will generate and run a Python script to address nearly all inquiries you have. The entire process is incredibly swift. I conducted a few trials using the XML file from BRTPTO’s 2739 Patent Gazette. The experience was truly incredible. There’s no need for any understanding of coding or Python. The code interpreter simply provides the results. If you wish, you can view the Python script.

By selecting the “Show Work” option, the Python script will become visible to you.

He executed the work flawlessly!

This is the screen of the CSV file.

Here is the Python code generated by the Code Interpreter to create the CSV file:

Python
import pandas as pd
import xml.etree.ElementTree as ET

# Load the XML file
tree = ET.parse("/mnt/data/Patente_2739_04072023.xml")
root = tree.getroot()

# Function to extract data from XML into a dictionary
def extract_data(root):
    data = []
    for despacho in root.findall('despacho'):
        row = {}
        row['despacho_codigo'] = despacho.find('codigo').text
        row['despacho_titulo'] = despacho.find('titulo').text
        row['despacho_comentario'] = despacho.find('comentario').text if despacho.find('comentario') is not None else ""
        processo_patente = despacho.find('processo-patente')
        row['processo_numero'] = processo_patente.find('numero').text
        row['processo_data_deposito'] = processo_patente.find('data-deposito').text if processo_patente.find('data-deposito') is not None else ""
        row['processo_concessao'] = processo_patente.find('concessao').text if processo_patente.find('concessao') is not None else ""
        titulars = processo_patente.findall('titular-lista/titular')
        row['titular_nome_completo'] = "; ".join([titular.find('nome-completo').text for titular in titulars])
        row['titular_endereco_pais'] = "; ".join([titular.find('endereco/pais/sigla').text for titular in titulars])
        data.append(row)
    return data

# Extract data from XML
data = extract_data(root)

# Convert to DataFrame
df = pd.DataFrame(data)

# Save DataFrame to CSV
df.to_csv('/mnt/data/patente.csv', index=False)

df.head()  # Display the first few rows of the DataFrame

# Extract distinct application/patent numbers
distinct_application_numbers = df['processo_numero'].unique()

# Convert to DataFrame
df_distinct_application_numbers = pd.DataFrame(distinct_application_numbers, columns=['processo_numero'])

# Save DataFrame to CSV
df_distinct_application_numbers.to_csv('/mnt/data/distinct_application_numbers.csv', index=False)

df_distinct_application_numbers.head()  # Display the first few rows of the DataFrame

This video can demonstrate the capabilities of the Code Interpreter.

You can find more information on the official Open AI site by clicking here.

Conclusion

Code Interpreter is a powerful tool that is making data analysis accessible for everyone with ChatGPT Plus. By allowing users to run code snippets within their chat sessions, it enables them to perform a wide range of data analysis tasks quickly and easily. Whether you’re analyzing customer feedback data or creating custom visualizations for presentations and reports, Code Interpreter has something to offer everyone.

Code Interpreter invites us to consider how we can leverage such advancements across various sectors impacted by AI. Indeed, Code Interpreter signifies the dawn of a new era in artificial intelligence and computational capabilities. So why not give it a try today?

That’s it for today!

Sources:

Wharton professor sees future of work in new ChatGPT tool | Fortune

https://openai.com/blog/chatgpt-plugins#code-interpreter

https://www.searchenginejournal.com/code-interpreter-chatgpt-plus/490980/#close

https://www.gov.br/inpi/pt-br

Asking questions via chat to the BRPTO’s Basic Manual for Patent Protection PDF, using LangChain, Pinecone, and Open AI

Have you ever wanted to search through your PDF files and find the most relevant information quickly and easily? If you have a lot of PDF documents, such as books, articles, reports, or manuals, you might find it hard to locate the information you need without opening each file and scanning through the pages. Wouldn’t it be nice if you could type in a query and get the best matches from your PDF collection?

In this blog post, I will show you how to build a simple but powerful PDF search engine using LangChain, Pinecone, and Open AI. By combining these tools, we can create a system that can:

  1. Extract text and metadata from PDF files.
  2. Embed the text into vector representations using a language model.
  3. Index and query the vectors using a vector database.
  4. Generate natural language responses using the “text-embedding-ada-002” model from Open AI.

What is LangChain?

LangChain is a framework for developing applications powered by language models. It provides modular abstractions for the components necessary to work with language models, such as data loaders, prompters, generators, and evaluators. It also has collections of implementations for these components and use-case-specific chains that assemble these components in particular ways to accomplish a specific task.

Prompts: This part allows you to create adaptable instructions using templates. It can adjust to different language learning models based on the size of the conversation window and input factors like conversation history, search results, previous answers, and more.

Models: This part serves as a bridge to connect with most third-party language learning models. It has connections to roughly 40 public language learning models, chat, and text representation models.

Memory: This allows the language learning models to remember the conversation history.

Indexes: Indexes are methods to arrange documents so that language learning models can interact with them effectively. This part includes helpful functions for dealing with documents and connections to different database systems for storing vectors (numeric representations of text).

Agents: Some applications don’t just need a set sequence of calls to language learning models or other tools, but possibly an unpredictable sequence based on the user’s input. In these sequences, there’s an agent that has access to a collection of tools. Depending on the user’s input, the agent can decide which tool – if any – to use.

Chains: Using a language learning model on its own is fine for some simple applications, but more complex ones need to link multiple language learning models, either with each other or with other experts. LangChain offers a standard interface for these chains, as well as some common chain setups for easy use.

With LangChain, you can build applications that can:

  • Connect a language model to other sources of data, such as documents, databases, or APIs
  • Allow a language model to interact with its environments, such as chatbots, agents, or generators
  • Optimize the performance and quality of a language model using feedback and reinforcement learning

Some examples of applications that you can build with LangChain are:

  • Question answering over specific documents
  • Chatbots that can access external knowledge or services
  • Agents that can perform tasks or solve problems using language models
  • Generators that can create content or code using language models

You can learn more about LangChain from their documentation or their GitHub repository. You can also find tutorials and demos in different languages, such as Chinese, Japanese, or English.

What is Pinecone?

Pinecone is a vector database for vector search. It makes it easy to build high-performance vector search applications by managing and searching through vector embeddings in a scalable and efficient way. Vector embeddings are numerical representations of data that capture their semantic meaning and similarity. For example, you can embed text into vectors using a language model, such that similar texts have similar vectors.

With Pinecone, you can create indexes that store your vector embeddings and metadata, such as document titles or authors. You can then query these indexes using vectors or keywords, and get the most relevant results in milliseconds. Pinecone also handles all the infrastructure and algorithmic complexities behind the scenes, ensuring you get the best performance and results without any hassle.

Some examples of applications that you can build with Pinecone are:

  • Semantic search: Find documents or products that match the user’s intent or query
  • Recommendations: Suggest items or content that are similar or complementary to the user’s preferences or behavior
  • Anomaly detection: Identify outliers or suspicious patterns in data
  • Generation: Create new content or code that is similar or related to the input

You can learn more about Pinecone from their website or their blog. You can also find pricing details and sign up for a free account here.

Presenting the Python code and explaining its functionality

This code is divided into two parts:

This stage involves preparing the PDF document for querying
This stage pertains to executing queries on the PDF

Below is the Python script that I’ve developed which can be also executed in Google Colab at this link.

PowerShell
# Install the dependencies
pip install langChain
pip install OpenAI
pip install pinecone-client
pip install tiktoken
pip install pypdf
Python
# Provide your OpenAI API key and define the embedding model
OPENAI_API_KEY = "INSERT HERE YOUR OPENAI API KEY"
embed_model = "text-embedding-ada-002"

# Provide your Pinecone API key and specify the environment
PINECONE_API_KEY = "INSERT HERE YOUR PINECONE API KEY"
PINECONE_ENV = "INSERT HERE YOUR PINECONE ENVIRONMENT"

# Import the required modules
import openai, langchain, pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Pinecone
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader

# Define a text splitter to handle the 4096 token limit of OpenAI
text_splitter = RecursiveCharacterTextSplitter(
    # We set a small chunk size for demonstration
    chunk_size = 2000,
    chunk_overlap  = 0,
    length_function = len,
)

# Initialize Pinecone with your API key and environment
pinecone.init(
        api_key = PINECONE_API_KEY,
        environment = PINECONE_ENV
)

# Define the index name for Pinecone
index_name = 'pine-search'

# Create an OpenAI embedding object with your API key
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

# Set up an OpenAI LLM model
llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)

# Define a PDF loader and load the file
loader = PyPDFLoader("https://lawrence.eti.br/wp-content/uploads/2023/07/ManualdePatentes20210706.pdf")

# Use the text splitter to split the loaded file content into manageable chunks
book_texts = text_splitter.split_documents(file_content)

# Check if the index exists in Pinecone
if index_name not in pinecone.list_indexes():
    print("Index does not exist: ", index_name)

# Create a Pinecone vector search object from the text chunks
book_docsearch = Pinecone.from_texts([t.page_content for t in book_texts], embeddings, index_name = index_name)

# Define your query
query = "Como eu faço para depositar uma patente no Brasil?"

# Use the Pinecone vector search to find documents similar to the query
docs = book_docsearch.similarity_search(query)

# Set up a QA chain with the LLM model and the selected chain type
chain = load_qa_chain(llm, chain_type="stuff")

# Run the QA chain with the found documents and your query to get the answer
chain.run(input_documents=docs, question=query)

Below is the application I developed for real-time evaluation of the PDF Search Engine

You can examine the web application that I’ve designed, enabling you to carry out real-time tests of the PDF search engine. This app provides you with the facility to pose questions about the data contained within BRPTO’S Basic Manual for Patent Protection. Click here to launch the application.

Conclusion

In this blog post, I have shown you how to build a simple but powerful PDF search engine using LangChain, Pinecone, and Open AI. This system can help you find the most relevant information from your PDF files in a fast and easy way. You can also extend this system to handle other types of documents, such as images, audio, or video, by using different data loaders and language models.

I hope you enjoyed this tutorial and learned something new. If you have any questions or feedback, please feel free to leave a comment below or contact me here. Thank you for reading!

That’s it for today!

Sources:

GoodAITechnology/LangChain-Tutorials (github.com)

INPI – Instituto Nacional da Propriedade Industrial — Instituto Nacional da Propriedade Industrial (www.gov.br)

Chatting with your Enterprise data privately and securely through the use of Azure Cognitive Search and Azure Open AI

In an age where data is power, businesses are constantly looking for ways to leverage their vast enterprise data stores. One promising avenue lies in the intersection of AI and search technologies, specifically through the use of Azure Cognitive Search and Azure Open AI. These tools provide powerful ways to converse with enterprise data privately and securely.

Enterprise data can take various forms, from structured database datasets to unstructured documents, emails, and files. Some examples are data about the company’s benefits, internal policies, job descriptions, roles, and much more.

What is Azure Cognitive Search?

Azure Cognitive Search is a cloud-based service provided by Microsoft Azure that enables developers to build sophisticated search experiences into custom applications. It integrates with other Azure Cognitive Services to enable AI-driven content understanding through capabilities such as natural language processing, entity recognition, image analysis, and more.

Here are some of the key benefits of Azure Cognitive Search:

  1. Fully Managed: Azure Cognitive Search is fully managed, meaning you don’t have to worry about infrastructure setup, maintenance, or scaling. You just need to focus on the development of your application.
  2. Rich Search Experiences: It allows for the creation of rich search experiences, including auto-complete, geospatial search, filtering, and faceting.
  3. AI-Enhanced Search Capabilities: When combined with other Azure Cognitive Services, Azure Cognitive Search can provide advanced search features. For example, it can extract key phrases, detect languages, identify entities, and more. It can even index and search unstructured data, like text within documents or images.
  4. Scalability and Performance: Azure Cognitive Search can automatically scale to handle large volumes of data and high query loads. It provides fast, efficient search across large datasets.
  5. Data Integration: It can pull in data from a variety of sources, including Azure SQL Database, Azure Cosmos DB, Azure Blob Storage, and more.
  6. Security: Azure Cognitive Search supports data encryption at rest and in transit. It also integrates with Azure Active Directory for identity and access management.
  7. Developer Friendly: It provides a simple, RESTful API and integrates with popular programming languages and development frameworks. This makes it easier for developers to embed search functionality into applications.
  8. Indexing: The service provides robust indexing capabilities, allowing you to index data from a variety of sources and formats. This allows for a more comprehensive search experience for end-users.

In summary, Azure Cognitive Search can provide powerful, intelligent search capabilities for your applications, allowing users to find the information they need quickly and easily.

What is Azure Open AI?

Azure OpenAI Service is a platform that provides REST API access to OpenAI’s powerful language models, including GPT-3, GPT-4, Codex, and Embeddings. It can be used for tasks such as content generation, summarization, semantic search, and natural language-to-code translation.

The security and safety of enterprise data is a top priority for Azure OpenAI. Here are some key points on how it ensures safety:

  • The Azure OpenAI Service is fully controlled by Microsoft and does not interact with any services operated by OpenAI. Your prompts (inputs) and completions (outputs), your embeddings, and your training data are not available to other customers, OpenAI, or used to improve OpenAI models, any Microsoft or 3rd party products or services, or to automatically improve Azure OpenAI models for your use in your resource. Your fine-tuned Azure OpenAI models are available exclusively for your use.
  • The service processes different types of data including prompts and generated content, augmented data included with prompts, and training & validation data.
  • When generating completions, images, or embeddings, the service evaluates the prompt and completion data in real-time to check for harmful content types. The models are stateless, meaning no prompts or generations are stored in the model, and prompts and generations are not used to train, retrain, or improve the base models.
  • With the “on your data” feature, the service retrieves relevant data from a configured data store and augments the prompt to produce generations that are grounded with your data. The data remains stored in the data source and location you designate. No data is copied into the Azure OpenAI service.
  • Training data uploaded for fine-tuning is stored in the Azure OpenAI resource in the customer’s Azure tenant. It can be double encrypted at rest and can be deleted by the customer at any time. This data is not used to train, retrain, or improve any Microsoft or 3rd party base models.
  • Azure OpenAI includes both content filtering and abuse monitoring features to reduce the risk of harmful use of the service. To detect and mitigate abuse, Azure OpenAI stores all prompts and generated content securely for up to thirty (30) days.
  • The data store where prompts and completions are stored is logically separated by customer resources. Prompts and generated content are stored in the Azure region where the customer’s Azure OpenAI service resource is deployed, within the Azure OpenAI service boundary. Human reviewers can only access the data when it has been flagged by the abuse monitoring system.
  • Customers who meet additional Limited Access eligibility criteria and attest to specific use cases can apply to modify the Azure OpenAI content management features. Suppose Microsoft approves a customer’s request to change abuse monitoring. In that case, Microsoft does not store any prompts and completions associated with the approved Azure subscription for which abuse monitoring is configured.

In conclusion, Azure OpenAI takes numerous measures to ensure that your enterprise data is kept secure and confidential while using its service.

Revolutionize your Enterprise Data with ChatGPT: step by step how to create your own Enterprise Chat

This sample demonstrates a few approaches for creating ChatGPT-like experiences over your own data using the Retrieval Augmented Generation pattern. It uses Azure Open AI Service to access the ChatGPT model (gpt-35-turbo), and Azure Cognitive Search for data indexing and retrieval.

The repo includes sample data so it’s ready to try end-to-end. In this sample application, we use a fictitious company called Contoso Electronics, and the experience allows its employees to ask questions about the benefits, internal policies, as well as job descriptions, and roles.

Features

  • Chat and Q&A interfaces
  • Explores various options to help users evaluate the trustworthiness of responses with citations, tracking of source content, etc.
  • Shows possible approaches for data preparation, prompt construction, and orchestration of interaction between model (ChatGPT) and retriever (Cognitive Search)
  • Settings directly in the UX to tweak the behavior and experiment with options
Chat screen

Getting Started

IMPORTANT: In order to deploy and run this example, you’ll need an Azure subscription with access enabled for the Azure OpenAI service. You can request access here. You can also visit here to get some free Azure credits to get you started.

AZURE RESOURCE COSTS by default this sample will create Azure App Service and Azure Cognitive Search resources that have a monthly cost, as well as Form Recognizer resource that has cost per document page. You can switch them to free versions of each of them if you want to avoid this cost by changing the parameters file under the infra folder (though there are some limits to consider; for example, you can have up to 1 free Cognitive Search resource per subscription, and the free Form Recognizer resource only analyzes the first 2 pages of each document.)

Prerequisites

To Run Locally

  • Azure Developer CLI
  • Python 3+
    • Important: Python and the pip package manager must be in the path in Windows for the setup scripts to work.
    • Important: Ensure you can run python --version from the console. On Ubuntu, you might need to run sudo apt install python-is-python3 to link python to python3.
  • Node.js
  • Git
  • Powershell 7+ (pwsh) – For Windows users only.
    • Important: Ensure you can run pwsh.exe from a PowerShell command. If this fails, you likely need to upgrade PowerShell.

NOTE: Your Azure Account must have Microsoft.Authorization/roleAssignments/write permissions, such as User Access Administrator or Owner.

Installation

Project Initialization

  1. Create a new folder and switch to it in the terminal
  2. Run azd auth login
  3. Run azd init -t azure-search-openai-demo
    • For the target location, the regions that currently support the models used in this sample are East US or South Central US. For an up-to-date list of regions and models, check here
    • note that this command will initialize a git repository and you do not need to clone this repository

Starting from scratch:

Execute the following command, if you don’t have any pre-existing Azure services and want to start from a fresh deployment.

  1. Run azd up – This will provision Azure resources and deploy this sample to those resources, including building the search index based on the files found in the ./data folder.
  2. After the application has been successfully deployed you will see a URL printed to the console. Click that URL to interact with the application in your browser.

For detailed information click here on my GitHub and follow a video from Microsoft talking about the example solution.

You can look at the Chat App that I’ve developed, which I will make available for you to test for a few days.

Firstly, it’s important to understand that you have the ability to replace the PDF files within the “./data” directory with your own business data.

If you wish to examine these files first to gain insights into the types of questions you can make in the chat to test, please click here.

Regrettably, the demo app had to be deactivated due to Azure expenses. If you’d like it to be reactivated, please click here to contact me. Thank you.

You’re able to query any content found within the enterprise PDF files located in the “./data” directory. The chat will respond with citations from the respective PDFs, and you have the option to click through and verify the information directly from the source PDF.

Conclusion

The vast universe of enterprise data, spanning from structured database datasets to unstructured documents, emails, and files, holds a wealth of insights that can drive an organization’s growth and success. Azure Cognitive Search and Azure OpenAI serve as powerful tools that make this data readily accessible, private, and secure. By leveraging these technologies, businesses can tap into the full potential of their internal data, from understanding the intricacies of their benefits and policies to defining roles and job descriptions more effectively. With a future powered by AI and machine learning, the conversations we can have with our data are only just beginning. This is more than just a technological shift; it’s a new era of informed decision-making, driven by data that’s within our reach. This solution provides an array of opportunities to assist businesses in leveraging their corporate data and disseminating it amongst their employees. This method simplifies comprehension, fostering organizational growth and enhancing the company culture. Should you require additional details on this topic, please do not hesitate to reach out to me.

That’s it for today!