BRPTO – 💡Tech News & Insights

Presenting the First Edition of the New ChatBot for Making Questions about Trademarks, Patents, and More from the Brazilian Patent and Trademark Office (BRPTO)

It is with great delight that I introduce the innovative ChatBot developed with artificial intelligence to answer queries concerning the official manuals provided by the National Institute of Industrial Property (BRPTO) on trademarks, patents, industrial designs, and more.

These manuals, presented in Brazilian Portuguese PDF format, are readily accessible in the respective sections of the INPI website. What’s unique about this initiative is the inclusivity that transcends language barriers. In the app, you can ask questions in any language, and the ChatBot will comprehend and respond in the language used in the query.

A vital feature of this AI-powered ChatBot is its ability to provide the exact source of the information extracted. Each time the ChatBot responds to a query, it will present the hyperlink to the PDF from where the information was derived. This enhances transparency and allows users to delve into more details if they wish.

Moreover, within the left-side options in the app, you can access the complete manuals utilized to create the AI model. This provides direct access to the information and serves as a testament to the quality and reliability of the data used.

How do you get to the BRPTO’s Chatbot app?

To try out the app, click here to reach out to me. Please include “BRPTO Chatbot” in your message.

Here are a few examples of the questions you can ask:

What is a patent?
What is a trademark?
What are the fees I need to pay to file a patent?
What is a contract?
What is copyright?
What are the rules of geographical indications?
How was the intellectual protection of circuit topology established?
What are the filing procedures for an industrial design?

What are the limitations?

If the National Institute of Industrial Property (INPI) makes any updates to a manual, it will be necessary to reprocess the Artificial Intelligence model and roll out a new update for the app.
The chatbot is only limited to answering questions related to the manuals specified on the left side of the app.
Any questions outside the context of these manuals will not be answered.
As we are using a free version of the vector database (Pinecone), there might be some confusion in the responses.

How the app was created?

The app was created based on the guidelines provided in my previous post that I copy below.

Asking questions via chat to the BRPTO’s Basic Manual for Patent Protection PDF, using LangChain, Pinecone, and Open AI

Conclusion

The AI ChatBot is an innovative solution that aims to democratize access to information and make it more convenient for the public to learn and understand essential aspects of industrial property rights. So feel free to explore, ask, and learn!

That’s it for today!

OpenAI has unveiled a groundbreaking new feature, the Code Interpreter, accessible to all ChatGPT Plus users. Check out my experiments using the 2739 edition of BRPTO’s Patent Gazette

Code Interpreter is an innovative extension of ChatGPT, now available to all subscribers of the ChatGPT Plus service. This tool boasts the ability to execute code, work with uploaded files, analyze data, create charts, edit files, and carry out mathematical computations. The implications of this are profound, not just for academics and coders, but for anyone looking to streamline their research processes. Code Interpreter transcends the traditional scope of AI assistants, which have primarily been limited to generating text responses. It leverages large language models, the AI technology underpinning ChatGPT, to provide a general-purpose toolbox for problem-solving.

Code Interpreter will be available to all ChatGPT Plus users over the next week.

It lets ChatGPT run code, optionally with access to files you've uploaded. You can ask ChatGPT to analyze data, create charts, edit files, perform math, etc.

Plus users can opt in via settings. pic.twitter.com/IjH5JBqe5B
— OpenAI (@OpenAI) July 6, 2023

What is the Code Interpreter?

The Code Interpreter Plugin for ChatGPT is a multifaceted addition that provides the AI chatbot with the capacity to handle data and perform a broad range of tasks. This plugin equips ChatGPT with the ability to generate and implement code in natural language, thereby streamlining data evaluation, file conversions, and more. Pioneering users have experienced its effectiveness in activities like generating GIFs and examining musical preferences. The potential of the Code Interpreter Plugin is enormous, having the capability to revolutionize coding processes and unearth novel uses. By capitalizing on ChatGPT’s capabilities, users can harness the power of this plugin, sparking a voyage of discovery and creativity.

Professor Ethan Mollick from the Wharton School of the University of Pennsylvania shares his experiences with using the Code Interpreter

Artificial intelligence is rapidly revolutionizing every aspect of our lives, particularly in the world of data analytics and computational tasks. This transition was recently illuminated by Wharton Professor Ethan Mollick who commented, “Things that took me weeks to master in my PhD were completed in seconds by the AI.” This is not just a statement about time saved or operational efficiency, but it speaks volumes about the growing capabilities of AI technologies, specifically OpenAI’s new tool for ChatGPT – Code Interpreter.

Mollick, an early adopter of AI and an esteemed academic at the Wharton School of the University of Pennsylvania lauded Code Interpreter as the most significant application of AI in the sphere of complex knowledge work. Not only does it complete intricate tasks in record time, but Mollick also noticed fewer errors than those typically expected from human analysts.

One might argue that Code Interpreter transcends the traditional scope of AI assistants, which have primarily been limited to generating text responses. It leverages large language models, the AI technology underpinning ChatGPT, to provide a general-purpose toolbox for problem-solving.

Mollick commended Code Interpreter’s use of Python, a versatile programming language known for its application in software building and data analysis. He pointed out that it closes some of the gaps in language models as the output is not entirely text-based. The code is processed through Python, which promptly flags any errors.

In practice, when given a dataset on superheroes, Code Interpreter could clean and merge the data seamlessly, with an admirable effort to maintain accuracy. This process would have been an arduous task otherwise. Additionally, it allows a back-and-forth interaction during data visualization, accommodating various alterations and enhancements.

Remarkably, Code Interpreter doesn’t just perform pre-set analyses but recommends pertinent analytical approaches. For instance, it conducted predictive modeling to anticipate a hero’s potential powers based on other factors. Mollick was struck by the AI’s human-like reasoning about data, noting the AI’s observation that the powers were often visually noticeable as they derived from the comic book medium.

Beyond its technical capabilities, Code Interpreter democratizes access to complex data analysis, making it accessible to more people, thereby transforming the future of work. It saves time and reduces the tedium of repetitive tasks, enabling individuals to focus on more fulfilling, in-depth work.

Here are 10 examples of how you can use Code Interpreter for data analysis:

Analyzing customer feedback data to identify trends and patterns.
Creating interactive dashboards and reports for business intelligence purposes.
Cleaning and transforming datasets for machine learning models.
Extracting insights from social media data to inform marketing strategies.
Generating charts and graphs to visualize sales data.
Analyzing website traffic data to optimize the user experience.
Creating custom functions and scripts for specific data analysis tasks.
Performing statistical analysis on survey data.
Automating repetitive data analysis tasks with Python scripts.
Creating custom visualizations for presentations and reports.

How to use it? Follow my own experiments

Initially, you must upload a file of your choice. Following this, you are permitted to pose queries concerning it. The interpreter for the code will generate and run a Python script to address nearly all inquiries you have. The entire process is incredibly swift. I conducted a few trials using the XML file from BRTPTO’s 2739 Patent Gazette. The experience was truly incredible. There’s no need for any understanding of coding or Python. The code interpreter simply provides the results. If you wish, you can view the Python script.

By selecting the “Show Work” option, the Python script will become visible to you.

He executed the work flawlessly!

This is the screen of the CSV file.

Here is the Python code generated by the Code Interpreter to create the CSV file:

Python

import pandas as pd
import xml.etree.ElementTree as ET

# Load the XML file
tree = ET.parse("/mnt/data/Patente_2739_04072023.xml")
root = tree.getroot()

# Function to extract data from XML into a dictionary
def extract_data(root):
    data = []
    for despacho in root.findall('despacho'):
        row = {}
        row['despacho_codigo'] = despacho.find('codigo').text
        row['despacho_titulo'] = despacho.find('titulo').text
        row['despacho_comentario'] = despacho.find('comentario').text if despacho.find('comentario') is not None else ""
        processo_patente = despacho.find('processo-patente')
        row['processo_numero'] = processo_patente.find('numero').text
        row['processo_data_deposito'] = processo_patente.find('data-deposito').text if processo_patente.find('data-deposito') is not None else ""
        row['processo_concessao'] = processo_patente.find('concessao').text if processo_patente.find('concessao') is not None else ""
        titulars = processo_patente.findall('titular-lista/titular')
        row['titular_nome_completo'] = "; ".join([titular.find('nome-completo').text for titular in titulars])
        row['titular_endereco_pais'] = "; ".join([titular.find('endereco/pais/sigla').text for titular in titulars])
        data.append(row)
    return data

# Extract data from XML
data = extract_data(root)

# Convert to DataFrame
df = pd.DataFrame(data)

# Save DataFrame to CSV
df.to_csv('/mnt/data/patente.csv', index=False)

df.head()  # Display the first few rows of the DataFrame

# Extract distinct application/patent numbers
distinct_application_numbers = df['processo_numero'].unique()

# Convert to DataFrame
df_distinct_application_numbers = pd.DataFrame(distinct_application_numbers, columns=['processo_numero'])

# Save DataFrame to CSV
df_distinct_application_numbers.to_csv('/mnt/data/distinct_application_numbers.csv', index=False)

df_distinct_application_numbers.head()  # Display the first few rows of the DataFrame

This video can demonstrate the capabilities of the Code Interpreter.

You can find more information on the official Open AI site by clicking here.

Conclusion

Code Interpreter is a powerful tool that is making data analysis accessible for everyone with ChatGPT Plus. By allowing users to run code snippets within their chat sessions, it enables them to perform a wide range of data analysis tasks quickly and easily. Whether you’re analyzing customer feedback data or creating custom visualizations for presentations and reports, Code Interpreter has something to offer everyone.

Code Interpreter invites us to consider how we can leverage such advancements across various sectors impacted by AI. Indeed, Code Interpreter signifies the dawn of a new era in artificial intelligence and computational capabilities. So why not give it a try today?

That’s it for today!

Sources:

Wharton professor sees future of work in new ChatGPT tool | Fortune

https://openai.com/blog/chatgpt-plugins#code-interpreter

https://www.searchenginejournal.com/code-interpreter-chatgpt-plus/490980/#close

https://www.gov.br/inpi/pt-br

Asking questions via chat to the BRPTO’s Basic Manual for Patent Protection PDF, using LangChain, Pinecone, and Open AI

Have you ever wanted to search through your PDF files and find the most relevant information quickly and easily? If you have a lot of PDF documents, such as books, articles, reports, or manuals, you might find it hard to locate the information you need without opening each file and scanning through the pages. Wouldn’t it be nice if you could type in a query and get the best matches from your PDF collection?

In this blog post, I will show you how to build a simple but powerful PDF search engine using LangChain, Pinecone, and Open AI. By combining these tools, we can create a system that can:

Extract text and metadata from PDF files.
Embed the text into vector representations using a language model.
Index and query the vectors using a vector database.
Generate natural language responses using the “text-embedding-ada-002” model from Open AI.

What is LangChain?

LangChain is a framework for developing applications powered by language models. It provides modular abstractions for the components necessary to work with language models, such as data loaders, prompters, generators, and evaluators. It also has collections of implementations for these components and use-case-specific chains that assemble these components in particular ways to accomplish a specific task.

Prompts: This part allows you to create adaptable instructions using templates. It can adjust to different language learning models based on the size of the conversation window and input factors like conversation history, search results, previous answers, and more.

Models: This part serves as a bridge to connect with most third-party language learning models. It has connections to roughly 40 public language learning models, chat, and text representation models.

Memory: This allows the language learning models to remember the conversation history.

Indexes: Indexes are methods to arrange documents so that language learning models can interact with them effectively. This part includes helpful functions for dealing with documents and connections to different database systems for storing vectors (numeric representations of text).

Agents: Some applications don’t just need a set sequence of calls to language learning models or other tools, but possibly an unpredictable sequence based on the user’s input. In these sequences, there’s an agent that has access to a collection of tools. Depending on the user’s input, the agent can decide which tool – if any – to use.

Chains: Using a language learning model on its own is fine for some simple applications, but more complex ones need to link multiple language learning models, either with each other or with other experts. LangChain offers a standard interface for these chains, as well as some common chain setups for easy use.

With LangChain, you can build applications that can:

Connect a language model to other sources of data, such as documents, databases, or APIs
Allow a language model to interact with its environments, such as chatbots, agents, or generators
Optimize the performance and quality of a language model using feedback and reinforcement learning

Some examples of applications that you can build with LangChain are:

Question answering over specific documents
Chatbots that can access external knowledge or services
Agents that can perform tasks or solve problems using language models
Generators that can create content or code using language models

You can learn more about LangChain from their documentation or their GitHub repository. You can also find tutorials and demos in different languages, such as Chinese, Japanese, or English.

What is Pinecone?

Pinecone is a vector database for vector search. It makes it easy to build high-performance vector search applications by managing and searching through vector embeddings in a scalable and efficient way. Vector embeddings are numerical representations of data that capture their semantic meaning and similarity. For example, you can embed text into vectors using a language model, such that similar texts have similar vectors.

With Pinecone, you can create indexes that store your vector embeddings and metadata, such as document titles or authors. You can then query these indexes using vectors or keywords, and get the most relevant results in milliseconds. Pinecone also handles all the infrastructure and algorithmic complexities behind the scenes, ensuring you get the best performance and results without any hassle.

Some examples of applications that you can build with Pinecone are:

Semantic search: Find documents or products that match the user’s intent or query
Recommendations: Suggest items or content that are similar or complementary to the user’s preferences or behavior
Anomaly detection: Identify outliers or suspicious patterns in data
Generation: Create new content or code that is similar or related to the input

You can learn more about Pinecone from their website or their blog. You can also find pricing details and sign up for a free account here.

Presenting the Python code and explaining its functionality

This code is divided into two parts:

This stage involves preparing the PDF document for querying

This stage pertains to executing queries on the PDF

Below is the Python script that I’ve developed which can be also executed in Google Colab at this link.

PowerShell

# Install the dependencies
pip install langChain
pip install OpenAI
pip install pinecone-client
pip install tiktoken
pip install pypdf

Python

# Provide your OpenAI API key and define the embedding model
OPENAI_API_KEY = "INSERT HERE YOUR OPENAI API KEY"
embed_model = "text-embedding-ada-002"

# Provide your Pinecone API key and specify the environment
PINECONE_API_KEY = "INSERT HERE YOUR PINECONE API KEY"
PINECONE_ENV = "INSERT HERE YOUR PINECONE ENVIRONMENT"

# Import the required modules
import openai, langchain, pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Pinecone
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader

# Define a text splitter to handle the 4096 token limit of OpenAI
text_splitter = RecursiveCharacterTextSplitter(
    # We set a small chunk size for demonstration
    chunk_size = 2000,
    chunk_overlap  = 0,
    length_function = len,
)

# Initialize Pinecone with your API key and environment
pinecone.init(
        api_key = PINECONE_API_KEY,
        environment = PINECONE_ENV
)

# Define the index name for Pinecone
index_name = 'pine-search'

# Create an OpenAI embedding object with your API key
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

# Set up an OpenAI LLM model
llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)

# Define a PDF loader and load the file
loader = PyPDFLoader("https://lawrence.eti.br/wp-content/uploads/2023/07/ManualdePatentes20210706.pdf")

# Use the text splitter to split the loaded file content into manageable chunks
book_texts = text_splitter.split_documents(file_content)

# Check if the index exists in Pinecone
if index_name not in pinecone.list_indexes():
    print("Index does not exist: ", index_name)

# Create a Pinecone vector search object from the text chunks
book_docsearch = Pinecone.from_texts([t.page_content for t in book_texts], embeddings, index_name = index_name)

# Define your query
query = "Como eu faço para depositar uma patente no Brasil?"

# Use the Pinecone vector search to find documents similar to the query
docs = book_docsearch.similarity_search(query)

# Set up a QA chain with the LLM model and the selected chain type
chain = load_qa_chain(llm, chain_type="stuff")

# Run the QA chain with the found documents and your query to get the answer
chain.run(input_documents=docs, question=query)

Below is the application I developed for real-time evaluation of the PDF Search Engine

You can examine the web application that I’ve designed, enabling you to carry out real-time tests of the PDF search engine. This app provides you with the facility to pose questions about the data contained within BRPTO’S Basic Manual for Patent Protection. Click here to launch the application.

Conclusion

In this blog post, I have shown you how to build a simple but powerful PDF search engine using LangChain, Pinecone, and Open AI. This system can help you find the most relevant information from your PDF files in a fast and easy way. You can also extend this system to handle other types of documents, such as images, audio, or video, by using different data loaders and language models.

I hope you enjoyed this tutorial and learned something new. If you have any questions or feedback, please feel free to leave a comment below or contact me here. Thank you for reading!

That’s it for today!

Sources:

GoodAITechnology/LangChain-Tutorials (github.com)

INPI – Instituto Nacional da Propriedade Industrial — Instituto Nacional da Propriedade Industrial (www.gov.br)