It is with great delight that I introduce the innovative ChatBot developed with artificial intelligence to answer queries concerning the official manuals provided by the National Institute of Industrial Property (BRPTO) on trademarks, patents, industrial designs, and more.
These manuals, presented in Brazilian Portuguese PDF format, are readily accessible in the respective sections of the INPI website. What’s unique about this initiative is the inclusivity that transcends language barriers. In the app, you can ask questions in any language, and the ChatBot will comprehend and respond in the language used in the query.
A vital feature of this AI-powered ChatBot is its ability to provide the exact source of the information extracted. Each time the ChatBot responds to a query, it will present the hyperlink to the PDF from where the information was derived. This enhances transparency and allows users to delve into more details if they wish.
Moreover, within the left-side options in the app, you can access the complete manuals utilized to create the AI model. This provides direct access to the information and serves as a testament to the quality and reliability of the data used.
How do you get to the BRPTO’s Chatbot app?
To try out the app, click here to reach out to me. Please include “BRPTO Chatbot” in your message.
Here are a few examples of the questions you can ask:
What is a patent?
What is a trademark?
What are the fees I need to pay to file a patent?
What is a contract?
What is copyright?
What are the rules of geographical indications?
How was the intellectual protection of circuit topology established?
What are the filing procedures for an industrial design?
What are the limitations?
If the National Institute of Industrial Property (INPI) makes any updates to a manual, it will be necessary to reprocess the Artificial Intelligence model and roll out a new update for the app.
The chatbot is only limited to answering questions related to the manuals specified on the left side of the app.
Any questions outside the context of these manuals will not be answered.
As we are using a free version of the vector database (Pinecone), there might be some confusion in the responses.
How the app was created?
The app was created based on the guidelines provided in my previous post that I copy below.
The AI ChatBot is an innovative solution that aims to democratize access to information and make it more convenient for the public to learn and understand essential aspects of industrial property rights. So feel free to explore, ask, and learn!
Code Interpreter is an innovative extension of ChatGPT, now available to all subscribers of the ChatGPT Plus service. This tool boasts the ability to execute code, work with uploaded files, analyze data, create charts, edit files, and carry out mathematical computations. The implications of this are profound, not just for academics and coders, but for anyone looking to streamline their research processes. Code Interpreter transcends the traditional scope of AI assistants, which have primarily been limited to generating text responses. It leverages large language models, the AI technology underpinning ChatGPT, to provide a general-purpose toolbox for problem-solving.
Code Interpreter will be available to all ChatGPT Plus users over the next week.
It lets ChatGPT run code, optionally with access to files you've uploaded. You can ask ChatGPT to analyze data, create charts, edit files, perform math, etc.
The Code Interpreter Plugin for ChatGPT is a multifaceted addition that provides the AI chatbot with the capacity to handle data and perform a broad range of tasks. This plugin equips ChatGPT with the ability to generate and implement code in natural language, thereby streamlining data evaluation, file conversions, and more. Pioneering users have experienced its effectiveness in activities like generating GIFs and examining musical preferences. The potential of the Code Interpreter Plugin is enormous, having the capability to revolutionize coding processes and unearth novel uses. By capitalizing on ChatGPT’s capabilities, users can harness the power of this plugin, sparking a voyage of discovery and creativity.
Professor Ethan Mollick from the Wharton School of the University of Pennsylvaniashares his experiences with using the Code Interpreter
Artificial intelligence is rapidly revolutionizing every aspect of our lives, particularly in the world of data analytics and computational tasks. This transition was recently illuminated by Wharton Professor Ethan Mollick who commented, “Things that took me weeks to master in my PhD were completed in seconds by the AI.” This is not just a statement about time saved or operational efficiency, but it speaks volumes about the growing capabilities of AI technologies, specifically OpenAI’s new tool for ChatGPT – Code Interpreter.
Mollick, an early adopter of AI and an esteemed academic at the Wharton School of the University of Pennsylvania lauded Code Interpreter as the most significant application of AI in the sphere of complex knowledge work. Not only does it complete intricate tasks in record time, but Mollick also noticed fewer errors than those typically expected from human analysts.
One might argue that Code Interpreter transcends the traditional scope of AI assistants, which have primarily been limited to generating text responses. It leverages large language models, the AI technology underpinning ChatGPT, to provide a general-purpose toolbox for problem-solving.
Mollick commended Code Interpreter’s use of Python, a versatile programming language known for its application in software building and data analysis. He pointed out that it closes some of the gaps in language models as the output is not entirely text-based. The code is processed through Python, which promptly flags any errors.
In practice, when given a dataset on superheroes, Code Interpreter could clean and merge the data seamlessly, with an admirable effort to maintain accuracy. This process would have been an arduous task otherwise. Additionally, it allows a back-and-forth interaction during data visualization, accommodating various alterations and enhancements.
Remarkably, Code Interpreter doesn’t just perform pre-set analyses but recommends pertinent analytical approaches. For instance, it conducted predictive modeling to anticipate a hero’s potential powers based on other factors. Mollick was struck by the AI’s human-like reasoning about data, noting the AI’s observation that the powers were often visually noticeable as they derived from the comic book medium.
Beyond its technical capabilities, Code Interpreter democratizes access to complex data analysis, making it accessible to more people, thereby transforming the future of work. It saves time and reduces the tedium of repetitive tasks, enabling individuals to focus on more fulfilling, in-depth work.
Here are 10 examples of how you can use Code Interpreter for data analysis:
Analyzing customer feedback data to identify trends and patterns.
Creating interactive dashboards and reports for business intelligence purposes.
Cleaning and transforming datasets for machine learning models.
Extracting insights from social media data to inform marketing strategies.
Generating charts and graphs to visualize sales data.
Analyzing website traffic data to optimize the user experience.
Creating custom functions and scripts for specific data analysis tasks.
Performing statistical analysis on survey data.
Automating repetitive data analysis tasks with Python scripts.
Creating custom visualizations for presentations and reports.
How to use it? Follow my own experiments
Initially, you must upload a file of your choice. Following this, you are permitted to pose queries concerning it. The interpreter for the code will generate and run a Python script to address nearly all inquiries you have. The entire process is incredibly swift. I conducted a few trials using the XML file from BRTPTO’s 2739 Patent Gazette. The experience was truly incredible. There’s no need for any understanding of coding or Python. The code interpreter simply provides the results. If you wish, you can view the Python script.
By selecting the “Show Work” option, the Python script will become visible to you.
He executed the work flawlessly!
This is the screen of the CSV file.
Here is the Python code generated by the Code Interpreter to create the CSV file:
Python
import pandas as pdimport xml.etree.ElementTree as ET# Load the XML filetree = ET.parse("/mnt/data/Patente_2739_04072023.xml")root = tree.getroot()# Function to extract data from XML into a dictionarydefextract_data(root): data = []for despacho in root.findall('despacho'): row = {} row['despacho_codigo'] = despacho.find('codigo').text row['despacho_titulo'] = despacho.find('titulo').text row['despacho_comentario'] = despacho.find('comentario').text if despacho.find('comentario') isnotNoneelse"" processo_patente = despacho.find('processo-patente') row['processo_numero'] = processo_patente.find('numero').text row['processo_data_deposito'] = processo_patente.find('data-deposito').text if processo_patente.find('data-deposito') isnotNoneelse"" row['processo_concessao'] = processo_patente.find('concessao').text if processo_patente.find('concessao') isnotNoneelse"" titulars = processo_patente.findall('titular-lista/titular') row['titular_nome_completo'] = "; ".join([titular.find('nome-completo').text for titular in titulars]) row['titular_endereco_pais'] = "; ".join([titular.find('endereco/pais/sigla').text for titular in titulars]) data.append(row)return data# Extract data from XMLdata = extract_data(root)# Convert to DataFramedf = pd.DataFrame(data)# Save DataFrame to CSVdf.to_csv('/mnt/data/patente.csv', index=False)df.head() # Display the first few rows of the DataFrame# Extract distinct application/patent numbersdistinct_application_numbers = df['processo_numero'].unique()# Convert to DataFramedf_distinct_application_numbers = pd.DataFrame(distinct_application_numbers, columns=['processo_numero'])# Save DataFrame to CSVdf_distinct_application_numbers.to_csv('/mnt/data/distinct_application_numbers.csv', index=False)df_distinct_application_numbers.head() # Display the first few rows of the DataFrame
This video can demonstrate the capabilities of the Code Interpreter.
You can find more information on the official Open AI site by clicking here.
Conclusion
Code Interpreter is a powerful tool that is making data analysis accessible for everyone with ChatGPT Plus. By allowing users to run code snippets within their chat sessions, it enables them to perform a wide range of data analysis tasks quickly and easily. Whether you’re analyzing customer feedback data or creating custom visualizations for presentations and reports, Code Interpreter has something to offer everyone.
Code Interpreter invites us to consider how we can leverage such advancements across various sectors impacted by AI. Indeed, Code Interpreter signifies the dawn of a new era in artificial intelligence and computational capabilities. So why not give it a try today?
Have you ever wanted to search through your PDF files and find the most relevant information quickly and easily? If you have a lot of PDF documents, such as books, articles, reports, or manuals, you might find it hard to locate the information you need without opening each file and scanning through the pages. Wouldn’t it be nice if you could type in a query and get the best matches from your PDF collection?
In this blog post, I will show you how to build a simple but powerful PDF search engine using LangChain, Pinecone, and Open AI. By combining these tools, we can create a system that can:
Extract text and metadata from PDF files.
Embed the text into vector representations using a language model.
Index and query the vectors using a vector database.
Generate natural language responses using the “text-embedding-ada-002” model from Open AI.
What is LangChain?
LangChain is a framework for developing applications powered by language models. It provides modular abstractions for the components necessary to work with language models, such as data loaders, prompters, generators, and evaluators. It also has collections of implementations for these components and use-case-specific chains that assemble these components in particular ways to accomplish a specific task.
Prompts: This part allows you to create adaptable instructions using templates. It can adjust to different language learning models based on the size of the conversation window and input factors like conversation history, search results, previous answers, and more.
Models: This part serves as a bridge to connect with most third-party language learning models. It has connections to roughly 40 public language learning models, chat, and text representation models.
Memory: This allows the language learning models to remember the conversation history.
Indexes: Indexes are methods to arrange documents so that language learning models can interact with them effectively. This part includes helpful functions for dealing with documents and connections to different database systems for storing vectors (numeric representations of text).
Agents: Some applications don’t just need a set sequence of calls to language learning models or other tools, but possibly an unpredictable sequence based on the user’s input. In these sequences, there’s an agent that has access to a collection of tools. Depending on the user’s input, the agent can decide which tool – if any – to use.
Chains: Using a language learning model on its own is fine for some simple applications, but more complex ones need to link multiple language learning models, either with each other or with other experts. LangChain offers a standard interface for these chains, as well as some common chain setups for easy use.
With LangChain, you can build applications that can:
Connect a language model to other sources of data, such as documents, databases, or APIs
Allow a language model to interact with its environments, such as chatbots, agents, or generators
Optimize the performance and quality of a language model using feedback and reinforcement learning
Some examples of applications that you can build with LangChain are:
Question answering over specific documents
Chatbots that can access external knowledge or services
Agents that can perform tasks or solve problems using language models
Generators that can create content or code using language models
Pinecone is a vector database for vector search. It makes it easy to build high-performance vector search applications by managing and searching through vector embeddings in a scalable and efficient way. Vector embeddings are numerical representations of data that capture their semantic meaning and similarity. For example, you can embed text into vectors using a language model, such that similar texts have similar vectors.
With Pinecone, you can create indexes that store your vector embeddings and metadata, such as document titles or authors. You can then query these indexes using vectors or keywords, and get the most relevant results in milliseconds. Pinecone also handles all the infrastructure and algorithmic complexities behind the scenes, ensuring you get the best performance and results without any hassle.
Some examples of applications that you can build with Pinecone are:
Semantic search: Find documents or products that match the user’s intent or query
Recommendations: Suggest items or content that are similar or complementary to the user’s preferences or behavior
Anomaly detection: Identify outliers or suspicious patterns in data
Generation: Create new content or code that is similar or related to the input
You can learn more about Pinecone from their website or their blog. You can also find pricing details and sign up for a free account here.
Presenting the Python code and explaining its functionality
This code is divided into two parts:
This stage involves preparing the PDF document for queryingThis stage pertains to executing queries on the PDF
Below is the Python script that I’ve developed which can be also executed in Google Colab at this link.
# Provide your OpenAI API key and define the embedding modelOPENAI_API_KEY = "INSERT HERE YOUR OPENAI API KEY"embed_model = "text-embedding-ada-002"# Provide your Pinecone API key and specify the environmentPINECONE_API_KEY = "INSERT HERE YOUR PINECONE API KEY"PINECONE_ENV = "INSERT HERE YOUR PINECONE ENVIRONMENT"# Import the required modulesimport openai, langchain, pineconefrom langchain.embeddings.openai import OpenAIEmbeddingsfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.vectorstores import Pineconefrom langchain.llms import OpenAIfrom langchain.chains.question_answering import load_qa_chainfrom langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader# Define a text splitter to handle the 4096 token limit of OpenAItext_splitter = RecursiveCharacterTextSplitter(# We set a small chunk size for demonstrationchunk_size = 2000,chunk_overlap = 0,length_function = len,)# Initialize Pinecone with your API key and environmentpinecone.init(api_key = PINECONE_API_KEY,environment = PINECONE_ENV)# Define the index name for Pineconeindex_name = 'pine-search'# Create an OpenAI embedding object with your API keyembeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)# Set up an OpenAI LLM modelllm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)# Define a PDF loader and load the fileloader = PyPDFLoader("https://lawrence.eti.br/wp-content/uploads/2023/07/ManualdePatentes20210706.pdf")# Use the text splitter to split the loaded file content into manageable chunksbook_texts = text_splitter.split_documents(file_content)# Check if the index exists in Pineconeif index_name notin pinecone.list_indexes():print("Index does not exist: ", index_name)# Create a Pinecone vector search object from the text chunksbook_docsearch = Pinecone.from_texts([t.page_content for t in book_texts], embeddings, index_name = index_name)# Define your queryquery = "Como eu faço para depositar uma patente no Brasil?"# Use the Pinecone vector search to find documents similar to the querydocs = book_docsearch.similarity_search(query)# Set up a QA chain with the LLM model and the selected chain typechain = load_qa_chain(llm, chain_type="stuff")# Run the QA chain with the found documents and your query to get the answerchain.run(input_documents=docs, question=query)
Below is the application I developed for real-time evaluation of the PDF Search Engine
You can examine the web application that I’ve designed, enabling you to carry out real-time tests of the PDF search engine. This app provides you with the facility to pose questions about the data contained within BRPTO’S Basic Manual for Patent Protection. Click here to launch the application.
Conclusion
In this blog post, I have shown you how to build a simple but powerful PDF search engine using LangChain, Pinecone, and Open AI. This system can help you find the most relevant information from your PDF files in a fast and easy way. You can also extend this system to handle other types of documents, such as images, audio, or video, by using different data loaders and language models.
I hope you enjoyed this tutorial and learned something new. If you have any questions or feedback, please feel free to leave a comment below or contact me here. Thank you for reading!