Extracting information from unstructured text is easy with Open AI. All you need to do is give the instructions.

How does one successfully extract information from the unstructured text? Through natural language processing, or NLP. You may be wondering what that even means or how it can facilitate the extraction of information. All you need to do is give the instructions. This article will discuss how NLP facilitates the extraction process and how it is done – supervised and unsupervised learning.

What is Open AI?

Open AI is an artificial intelligence research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. Founded in December 2015, with initial funding of $1 billion from Sam Altman and several other investors, OpenAI has the stated goal of promoting friendly artificial intelligence to benefit humanity as a whole.

How does Open AI Work?

There are many different ways to extract information from unstructured text. The most common way is to use a keyword or keyphrase. This is where you give a specific word or phrase to the Open AI software, which will locate all instances of that word or phrase in the text. It will then return the results to you in an easily readable format.

Another way to extract information from unstructured text is to use a concept search. This is where you give a general concept or topic to the software, and it will locate all instances of that concept in the text. It will then return the results to you in an easily readable format.

The last way to extract information from unstructured text is to use a natural language processing model. This is where you provide the software with a large amount of text, and it will analyze the text’s grammar, syntax, and meaning. It will then return the results to you in an easily readable format.

Creating a System for Extracting Information from Unstructured Text with Open AI

If you have a lot of unstructured text and you need to extract information from it, Open AI can help. All you need to do is give the instructions to the software, and it will do the rest.

Open AI is especially useful for extracting information from unstructured text because it can handle various formats. For example, if you have a PDF document, Open AI can convert it into text that can be further processed.

 Open AI is also good at dealing with multiple languages. For example, if you have a document in English and another in Portuguese, Open AI can usually translate between the two languages and extract the desired information.

Putting it to work:

Open AI makes extracting information from unstructured text easy. All you need to do is give the instructions. Let’s go to the example. I selected the sub-judice patent publications extracted from the 10 latest BRPTO Brazilian gazettes. Note that everything is written in Brazilian Portuguese. If you want the dataset I used, you can click here to download it.

Python
import pandas as pd
import openai
import pyodbc

openai.api_key = "YOU HAVE TO INSERT HERE YOUR OPEN AI KEY"

# Define the funcion to ask the question and extract the information
def OpenAI_Question(question_type, openai_response ):
    response = openai.Completion.create(
      engine="text-davinci-003",
      prompt= question_type + chr(10) + openai_response + chr(10),
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0
    )
    return response['choices'] [0]['text']
    
def Extract_Process_Information( Text ):
    Resultado = OpenAI_Question("Extrair do texto o número do processo udicial, tipo da ação, tribunal, interessados, autor e réus:", Text)
          
    return Resultado
    
# Connect to my experiment database to get the complement of the sub-judice patent publications 
server = 'dbserverlaw.database.windows.net' 
database = 'db_lawrence_experiments' 
username = 'YOUR HAVE TO PUT YOU USER HERE' 
password = 'YOU HAVE TO PUT YOUT PASSWORD HERE'  
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()

# select 20 rows from SQL table to insert in dataframe.
query = "select top 20 Complemento from Patentes_SubJudce;"
df = pd.read_sql(query, cnxn)

# Show the results. Here you can do everything you want with the extract information.
print( "Question asked from OpenAI model text-davinci-003: Extrair do texto o número do processo judicial sem ser INPI, tipo da ação, tribunal, interessados, autor e réus.", chr(10))

for i in df.index:
    Extract = Extract_Process_Information(df['Complemento'][i])
    
    print ("Text ", i+1, ":")
    print( df['Complemento'][i], chr(10) )
    print( "Information extracted from the text ",i+1, ":")
    print( Extract.strip(), chr(10) )    

Let’s show the results of this Python script:

Question asked from OpenAI model text-davinci-003: “Extrair do texto o número do processo judicial sem ser INPI, tipo da ação, tribunal, interessados, autor e réus.

Text 1 :
Processo SEI Nº: 52402.011406/2022-11 NUP PRINCIPAL: 01032.546858/2021-44 NUP REMISSIVO: 00848.001324/2022-17 PROCESSO Nº: 5019398-85.2021.4.03.0000 AUTOR: ARTIPE PRODUTOS ORTOPEDICOS E ESPORTIVOS LTDA ? ME Acórdão: A Primeira Turma, por unanimidade, deu provimento ao agravo de instrumento para determinar a suspensão dos efeitos da patente de invenção discutida nos autos de origem.

Information extracted from the text 1 :
Número do processo judicial:
5019398-85.2021.4.03.0000
Tipo da ação: Agravo de Instrumento
Tribunal: Primeira Turma
Interessados: ARTIPE PRODUTOS ORTOPEDICOS E ESPORTIVOS LTDA ? ME
Autor: ARTIPE PRODUTOS ORTOPEDICOS E ESPORTIVOS LTDA ? ME
Réus: Não especificado

Text 2 :
Processo INPI nº 52400.000958/2008-57 NUP PRINCIPAL: 00408.005736/2017-48 NUP REMISSIVO: 00848.001319/2022-12 Origem : TRIBUNAL REGIONAL FEDERAL DA 2ª REGIÃO AGRAVANTE : BMZAK BENEFICIAMENTO METAL MECANICO LTDA – ME AGRAVADO : MUNDIAL S.A. – PRODUTOS DE CONSUMO INTERESSADO : INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL Decisão: 1) julgo PROCEDENTE o pedido autoral, resolvendo o mérito, nos termos do art.269, I, do CPC, para decretar a nulidade da patente de modelo de utilidade MU7801576-6 para ?disposição em botão metálico?; 2) reconheço a litispendência e julgo extinto o pedido reconvencional, sem resolução de mérito, nos termos do art.267, V, penúltima figura, do CPC. Deverá o INPI publicar a presente decisão na próxima RPI e em seu site oficial. Trânsito em julgado.

Information extracted from the text 2 :
Número do processo judicial:
00408.005736/2017-48
Tipo da ação: Ação de nulidade de patente
Tribunal: Tribunal Regional Federal da 2ª Região
Interessados: BMZAK Beneficiamento Metal Mecânico Ltda – ME; Mundial S.A – Produtos de Consumo; Instituto Nacional da Propriedade Industrial
Autor: BMZAK Beneficiamento Metal Mecânico Ltda – ME
Réus: Mundial S.A – Produtos de Consumo

Text 3 :
Processo INPI nº 52402.001592/2021-91 13ª Vara Federal do Rio de Janeiro PROCEDIMENTO COMUM Nº 5007472-60.2021.4.02.5101/RJ AUTOR: OTTA SUSHI COMERCIO DE ALIMENTOS LTDA RÉU: INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL RÉU: LKD ALIMENTOS SAUDÁVEIS LTDA. Sentença: Ante o exposto, Julgo improcedente o pedido de nulidade da patente de modelo de utilidade MU 8900712-3 para ?disposição construtiva introduzida em embalagem para acondicionamento de alimentos?, resolvendo o mérito (CPC/2015, art. 487, inciso I). Trânsito em julgado.

Information extracted from the text 3 :
Processo: 5007472-60.2021.4.02.5101/RJ
Tipo da Ação: Procedimento Comum
Tribunal: 13ª Vara Federal do Rio de Janeiro
Interessados: Otta Sushi Comercio de Alimentos Ltda e LKD Alimentos Saudáveis Ltda
Autor: Otta Sushi Comercio de Alimentos Ltda
Réus: INPI-Instituto Nacional da Propriedade Industrial e LKD Alimentos Saudáveis Ltda.

Text 4 :
Processo INPI nº 52402.005814/2019-20 9ª Vara Federal do Rio de Janeiro NUP: 00408.036343/2019-48 (REF. 5025815-75.2019.4.02.5101) EXEQUENTE: IMPLANTICA PATENT LTD (SOCIEDADE) EXECUTADO: INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL Decisão: Isto posto, julgo procedente o pedido, para decretar a nulidade dos atos administrativos do INPI que extinguiram as patentes de invenção PI0108142-0 e PI0108309-0 com base no art. 13 da Resolução INPI n. 113/2013 e determinar a consequente restauração das mesmas, nos moldes da fundamentação acima.

Information extracted from the text 4 :
Número do processo judicial:
00408.036343/2019-48
Tipo da ação: Execução
Tribunal: 9ª Vara Federal do Rio de Janeiro
Interessados: Implantaica Patent Ltd (Sociedade) e INPI-Instituto Nacional da Propriedade Industrial
Autor: Implantaica Patent Ltd (Sociedade)
Réus: INPI-Instituto Nacional da Propriedade Industrial

Text 5 :
Processo INPI nº 52402.005814/2019-20 9ª Vara Federal do Rio de Janeiro NUP: 00408.036343/2019-48 (REF. 5025815-75.2019.4.02.5101) EXEQUENTE: IMPLANTICA PATENT LTD (SOCIEDADE) EXECUTADO: INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL Decisão: Isto posto, julgo procedente o pedido, para decretar a nulidade dos atos administrativos do INPI que extinguiram as patentes de invenção PI0108142-0 e PI0108309-0 com base no art. 13 da Resolução INPI n. 113/2013 e determinar a consequente restauração das mesmas, nos moldes da fundamentação acima.

Information extracted from the text 5 :
Número do processo judicial:
5025815-75.2019.4.02.5101
Tipo da ação: Execução
Tribunal: 9ª Vara Federal do Rio de Janeiro
Interessados: Implantaica Patent LTD (Sociedade) e INPI – Instituto Nacional da Propriedade Industrial
Autor: Implantaica Patent LTD (Sociedade)
Réus: INPI – Instituto Nacional da Propriedade Industrial

Text 6 :
Processo INPI nº 52402.004535/2022-44 21ª Vara Federal Cível da SJDF PROCESSO JUDICIAL: 1006097-47.2022.4.01.3400 NUP: 00424.125631/2022-73 (REF. 1006097-47.2022.4.01.3400) INTERESSADOS: AGÊNCIA NACIONAL DE VIGILÂNCIA SANITÁRIA – ANVISA E OUTROS Decisão: Pelo exposto, DEFIRO o pedido de tutela provisória de urgência para determinar a suspensão dos efeitos do despacho 16.3 (publicado na RPI nº 2.629 de 25/05/21), que reduziu o prazo de vigência das patentes PI0212733-4 e BR 12 2012 023120 7, de modo que estas permaneçam vigentes até a prolação de sentença de mérito ? limitada a compensação de prazo requerida no pedido, qual seja, 663 (seiscentos e sessenta e três) dias para a PI0212733-4 e 1.594 (mil quinhentos e noventa e quatro) dias para a BR 12 2012 023120 7, bem como que o INPI publique, na primeira edição da RPI subsequente a sua intimação, a informação acerca da tutela concedida.

Information extracted from the text 6 :
Processo judicial:
1006097-47.2022.4.01.3400
Tipo da ação: Tutela provisória de urgência
Tribunal: 21ª Vara Federal Cível da SJDF
Interessados: Agência Nacional de Vigilância Sanitária – Anvisa e outros
Autor: Agência Nacional de Vigilância Sanitária – Anvisa e outros
Réus: INPI

Text 7 :
Processo INPI nº 52402.004535/2022-44 21ª Vara Federal Cível da SJDF PROCESSO JUDICIAL: 1006097-47.2022.4.01.3400 NUP: 00424.125631/2022-73 (REF. 1006097-47.2022.4.01.3400) INTERESSADOS: AGÊNCIA NACIONAL DE VIGILÂNCIA SANITÁRIA – ANVISA E OUTROS Decisão: Pelo exposto, DEFIRO o pedido de tutela provisória de urgência para determinar a suspensão dos efeitos do despacho 16.3 (publicado na RPI nº 2.629 de 25/05/21), que reduziu o prazo de vigência das patentes PI0212733-4 e BR 12 2012 023120 7, de modo que estas permaneçam vigentes até a prolação de sentença de mérito ? limitada a compensação de prazo requerida no pedido, qual seja, 663 (seiscentos e sessenta e três) dias para a PI0212733-4 e 1.594 (mil quinhentos e noventa e quatro) dias para a BR 12 2012 023120 7, bem como que o INPI publique, na primeira edição da RPI subsequente a sua intimação, a informação acerca da tutela concedida.

Information extracted from the text 7 :
Processo judicial:
1006097-47.2022.4.01.3400
Tipo da ação: Tutela provisória de urgência
Tribunal: 21ª Vara Federal Cível da SJDF
Interessados: Agência Nacional de Vigilância Sanitária – Anvisa e outros
Autor: Agência Nacional de Vigilância Sanitária – Anvisa e outros
Réus: INPI

Text 8 :
Processo INPI nº 52400.003545/2022-39 NUP: 00408.078470/2022-10 (REF. 0017246-69.2002.4.02.5101) Autor: Formax Quimiplan Componentes Para Calçados Ltda. Reús: Giulini Chemie GmbH e Instituto Nacional da Propriedade Industrial- INPI Sentença: Isto posto, JULGO IMPROCEDENTE o pedido de nulidade da patente de invenção PI 8506015-1, bem como o pedido de nulidade do privilégio decorrente da reivindicação n’ 1 da patente em tela, formulado alternativamente. Trânsito em julgado.

Information extracted from the text 8 :
Número do processo:
00408.078470/2022-10
Tipo da ação: Nulidade de patente
Tribunal: Tribunal Regional Federal da 2ª Região
Interessados: Formax Quimiplan Componentes Para Calçados Ltda. e Giulini Chemie GmbH
Autor:
Formax Quimiplan Componentes Para Calçados Ltda.
Réus: Giulini Chemie GmbH e Instituto Nacional da Propriedade Industrial- INPI

Text 9 :
Processo INPI nº 52402.005638/2020-60 13ª Vara Federal do Rio de PROCEDIMENTO COMUM Nº 5029675-50.2020.4.02.5101/RJ AUTOR: LIBBS FARMACEUTICA LTDA AUTOR: MABXIENCE RESEARCH SL RÉU: GENENTECH, INC RÉU: INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL Decisão: Isto posto, homologo a renúncia ao direito sobre o qual se funda a ação, extinguindo o processo com resolução de mérito (CPC/2015, art. 487, III, ‘c’). Tendo em vista a manifesta ausência de interesse recursal das partes litigantes, o que deriva da própria preclusão lógica inerente à renúncia, certifique-se, desde logo, o trânsito em julgado.

Information extracted from the text 9 :
Número do processo judicial:
5029675-50.2020.4.02.5101/RJ
Tipo da ação: Procedimento comum
Tribunal: 13ª Vara Federal do Rio de Janeiro
Interessados: Libbs Farmacêutica Ltda, Mabxience Research SL, Genentech, Inc. e INPI-Instituto Nacional da Propriedade Industrial
Autor: Libbs Farmacêutica Ltda
Réus: Mabxience Research SL, Genentech, Inc. e INPI-Instituto Nacional da Propriedade Industrial

Text 10 :
INPI nº 52402.011824/2022-08 Origem: JUÍZO FEDERAL DA 9ª VF DO RIO DE JANEIRO (TRF2) Processo Nº: 5076666-16.2022.4.02.5101 NULIDADE DA PATENTE DE MODELO DE UTILIDADE com pedido de Antecipação de Tutela Autor: M.A. ROSSINI LOPES – ME. Réu(s): ANDRÉ LOPES e INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL ? INPI

Information extracted from the text 10 :
Processo Nº: 5076666-16.2022.4.02.5101
Tipo da ação: NULIDADE DE PATENTE DE MODELO DE UTILIDADE com pedido de Antecipação de Tutela
Tribunal: JUÍZO FEDERAL DA 9ª VF DO RIO DE JANEIRO (TRF2)
Interessados: M.A. ROSSINI LOPES – ME., ANDRÉ LOPES e INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL ? INPI
Autor: M.A. ROSSINI LOPES – ME.
Réus: ANDRÉ LOPES e INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL ? INPI

Text 11 :
INPI nº 52402.011451/2022-67 Origem: 25ª Vara Federal do Rio de Janeiro Processo Nº: 5071020-25.2022.4.02.5101/RJ SUBJUDICE com pedido de Antecipação de Tutela Autor: OURO FINO SAUDE ANIMAL LTDA Réu(s): ZOETIS SERVICES LLC e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

Information extracted from the text 11 :
Número do processo judicial:
5071020-25.2022.4.02.5101/RJ
Tipo da ação: SUBJUDICE com pedido de Antecipação de Tutela
Tribunal: 25ª Vara Federal do Rio de Janeiro
Interessados: OURO FINO SAUDE ANIMAL LTDA, ZOETIS SERVICES LLC e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL
Autor: OURO FINO SAUDE ANIMAL LTDA
Réus: ZOETIS SERVICES LLC e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

Text 12 :
INPI nº 52402.011991/2022-41 Origem: 22ª VARA FEDERAL CÍVEL DA SJDF (TRF1) Processo Nº: 1047948-66.2022.4.01.3400 AÇÃO DE PROCEDIMENTO COMUM Autor: EUSA Pharma (UK) Limited Réu(s): INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

Information extracted from the text 12 :
Processo Nº:
1047948-66.2022.4.01.3400
Tipo da Ação: Ação de Procedimento Comum
Tribunal: 22ª Vara Federal Cível da SJDF (TRF1)
Interessados: EUSA Pharma (UK) Limited e Instituto Nacional da Propriedade Industrial
Autor: EUSA Pharma (UK) Limited
Réus: Instituto Nacional da Propriedade Industrial

Text 13 :
INPI nº 52402.010443/2022-01 Origem: 25ª Vara Federal do Rio de Janeiro Processo Nº: 5052162-43.2022.4.02.5101/RJ NULIDADE DA PATENTE DE INVENÇÃO com pedido de Antecipação de Tutela Autor: KOMATSU BRASIL INTERNATIONAL LTDA Réu(s): ESCO GROUP LLC e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

Information extracted from the text 13 :
Número do processo judicial:
5052162-43.2022.4.02.5101/RJ
Tipo da ação: NULIDADE DA PATENTE DE INVENÇÃO com pedido de Antecipação de Tutela
Tribunal: 25ª Vara Federal do Rio de Janeiro
Interessados: KOMATSU BRASIL INTERNATIONAL LTDA, ESCO GROUP LLC e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL
Autor: KOMATSU BRASIL INTERNATIONAL LTDA
Réus: ESCO GROUP LLC e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

Text 14 :
INPI nº 52402.013111/2022-71 Origem: a 22ª VARA CÍVEL FEDERAL DE SÃO PAULO Processo Nº: 5007277-58.2021.4.03.6100 SUBJUDICE com pedido de Antecipação de Tutela Autor: SYNGENTA SEEDS LTDA, SYNGENTA PARTICIPATIONS AG Réu(s): SEMPRE SEMENTES EIRELI, MINISTÉRIO DA AGRICULTURA, PECUÁRIA E ABASTECIMENTO ? MAPA, INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

Information extracted from the text 14 :
Processo nº
5007277-58.2021.4.03.6100
Tipo da Ação: Pedido de Antecipação de Tutela
Tribunal: 22ª Vara Cível Federal de São Paulo
Interessados: Syngenta Seeds Ltda., Syngenta Participations AG, Sempre Sementes Eireli, Ministério da Agricultura, Pecuária e Abastecimento ? MAPA, Instituto Nacional da Propriedade Industrial
Autor: Syngenta Seeds Ltda., Syngenta Participations AG
Réus: Sempre Sementes Eireli, Ministério da Agricultura, Pecuária e Abastecimento ? MAPA, Instituto Nacional da Propriedade Industrial

Text 15 :
“INPI nº 52402.011780/2022-16 Origem: 13ª Vara Federal do Rio de Janeiro Processo Nº: 5047067-32.2022.4.02.5101/RJ NULIDADE DA PATENTE DE INVENÇÃO Autor: COMPANHIA NITRO QUÍMICA BRASILEIRA Réu(s): ICL AMERICA DO SUL S.A. (nova denominação de COMPASS MINERALS AMÉRICA
DO SUL INDÚSTRIA E COMÉRCIO LTDA.) e Instituto Nacional da Propriedade Industrial ? INPI”

Information extracted from the text 15 :
Processo Nº:
5047067-32.2022.4.02.5101/RJ
Tipo da ação: NULIDADE DA PATENTE DE INVENÇÃO
Tribunal: 13ª Vara Federal do Rio de Janeiro
Interessados: Companhia Nitro Química Brasileira, ICL America do Sul S.A. (nova denominação de Compass Minerals América do Sul Indústria e Comércio Ltda.) e Instituto Nacional da Propriedade Industrial (INPI).
Autor: Companhia Nitro Química Brasileira
Réus: ICL America do Sul S.A. (nova denominação de Compass Minerals América do Sul Indústria e Comércio Ltda.) e Instituto Nacional da Propriedade Industrial (INPI).

Text 16 :
INPI nº 52402.012620/2022-86 Origem: 1ª Vara Federal de Curitiba Processo Nº: 5061501-95.2022.4.04.7000/PR NULIDADE DA PATENTE DE INVENÇÃO com pedido de Antecipação de Tutela Autor: S. Almeida Eventos Ltda. Réu(s): HOLMES PEDRO GIACOMET JUNIOR E Instituto Nacional da Propriedade Industrial – INPI

Information extracted from the text 16 :
Número do processo judicial:
5061501-95.2022.4.04.7000/PR
Ação: NULIDADE DA PATENTE DE INVENÇÃO
Tribunal: 1ª Vara Federal de Curitiba
Interessados: S. Almeida Eventos Ltda., HOLMES PEDRO GIACOMET JUNIOR E Instituto Nacional da Propriedade Industrial – INPI
Autor: S. Almeida Eventos Ltda.
Réus: HOLMES PEDRO GIACOMET JUNIOR E Instituto Nacional da Propriedade Industrial – INPI

Text 17 :
INPI nº 52402.012852/2022-34 Origem: 2ª Vara Federal de Blumenau Processo Nº: 5021248-32.2022.4.04.7205 NULIDADE DA PATENTE DE INVENÇÃO Autor: PRATIMIX INDUSTRIA E COMERCIO DE ACESSORIOS ELETRICOS E HIDRAULICOS LTDA, Réu(s): LORENZETTI SA INDÚSTRIAS BRASILEIRAS ELETROMETALURGICAS e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

Information extracted from the text 17 :
Processo Nº:
5021248-32.2022.4.04.7205
Tipo da ação: NULIDADE DA PATENTE DE INVENÇÃO
Tribunal: 2ª Vara Federal de Blumenau
Interessados: PRATIMIX INDUSTRIA E COMERCIO DE ACESSORIOS ELETRICOS E HIDRAULICOS LTDA, LORENZETTI SA INDÚSTRIAS BRASILEIRAS ELETROMETALURGICAS e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL
Autor: PRATIMIX INDUSTRIA E COMERCIO DE ACESSORIOS ELETRICOS E HIDRAULICOS LTDA,
Réus: LORENZETTI SA INDÚSTRIAS BRASILEIRAS ELETROMETALURGICAS e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

Text 18 :
INPI nº 52402.009647/2022-91 Origem: 31ª Vara Federal do Rio de Janeiro Processo Nº: 5059924-13.2022.4.02.5101 SUBJUDICE com pedido de Antecipação de Tutela Autor: EMERSON CORDEIRO DE OLIVEIRA Réu(s): MODULARE BRASIL ARTEFATOS PLÁSTICOS LTDA, MARIANAAZAMBUJA SOARES MUNARI e INSTITUTO NACIONAL DA PROPRIEDADEINDUSTRIAL ? INPI.

Information extracted from the text 18 :
Número do processo judicial:
5059924-13.2022.4.02.5101
Tipo da ação: SUBJUDICE com pedido de Antecipação de Tutela
Tribunal: 31ª Vara Federal do Rio de Janeiro
Interessados: Emerson Cordeiro de Oliveira, Modulare Brasil Artefatos Plásticos Ltda., Mariana Azambuja Soares Munari e Instituto Nacional da Propriedade Industrial – INPI.
Autor: Emerson Cordeiro de Oliveira
Réus: Modulare Brasil Artefatos Plásticos Ltda., Mariana Azambuja Soares Munari e Instituto Nacional da Propriedade Industrial – INPI.

Text 19 :
INPI nº 52402.011352/2022-85 Origem: JUÍZO FEDERAL DA 25ª VF DO RIO DE JANEIRO (TRF2) Processo Nº: 5036388-70.2022.4.02.5101 NULIDADE DA PATENTE DE INVENÇÃO com pedido de Antecipação de Tutela Autor: FALCON DISTRIBUICAO, ARMAZENAMENTO E TRANSPORTES S.A. Réu(s): DRYLOCK TECHNOLOGIES NV e INPI – INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

Information extracted from the text 19 :
Número do processo judicial:
5036388-70.2022.4.02.5101
Tipo da ação: Nulidade da patente de invenção com pedido de antecipação de tutela
Tribunal: Juízo Federal da 25ª VF do Rio de Janeiro (TRF2)
Interessados: Falcon Distribuição, Armazenamento e Transportes S.A.
Autor: Falcon Distribuição, Armazenamento e Transportes S.A.
Réus: Drylock Technologies NV e INPI – Instituto Nacional da Propriedade Industrial.

Text 20 :
INPI nº 52402.011631/2022-49Origem: Justiça Federal – 9ª Vara Federal do Rio de JaneiroProcedimento Comum nº 5098088-81.2021.4.02.5101/RJembargos de declaração opostos contra alegado equívoco na decisão proferidaautor: Adama Brasil S/Aréus: United Phosphorus Limited e INPI – Instituto Nacional da Propriedade Industrial

Information extracted from the text 20 :
Processo judicial:
5098088-81.2021.4.02.5101/RJ
Tipo da ação: Embargos de Declaração
Tribunal: Justiça Federal – 9ª Vara Federal do Rio de Janeiro
Interessados: Adama Brasil S/A, United Phosphorus Limited e INPI – Instituto Nacional da Propriedade Industrial
Autor: Adama Brasil S/A
Réus: United Phosphorus Limited e INPI – Instituto Nacional da Propriedade Industrial

As you can see, with a few instructions, you can easily and quickly perform many tasks efficiently compared to the traditional way of using an algorithm created by you in a programming language. By the traditional method, you would need to consider all the variations in the text.

You can test your experiment direct from Open AI Playground.

Screen extract from Open AI Playground

Conclusion

More and more, these AI models are getting more advanced. This example we used was done using GPT-3. The Open AI is working in GPT-4. The Open AI GPT-4 is considerably larger than the GPT-3 in terms of parameters and performance. While the GPT-3 only has 8 million parameters, the GPT-4 has 1.5 billion. This increase in size allows the GPT-4 to learn much faster and achieve better results on various tasks.

Follow the information about new Novembre 2022 update in GPT-3:

https://arstechnica.com/information-technology/2022/11/openai-conquers-rhyming-poetry-with-new-gpt-3-update/

Follow articles about the new GPT-4 still being created by Open AI:

https://towardsdatascience.com/gpt-4-is-coming-soon-heres-what-we-know-about-it-64db058cfd45
https://www.datacamp.com/blog/what-we-know-gpt4
https://www.technologyreview.com/2022/11/30/1063878/openai-still-fixing-gpt3-ai-large-language-model/

That’s it for today!

OpenAI Whisper – The Future of Conversational AI

OpenAI Whisper is a new artificial intelligence system that can achieves human level performance in speech recognition. This system was developed by OpenAI, an artificial intelligence research lab. The goal of this system is to improve the quality of speech-to-text systems. With a 1.6 billion parameters AI model that can transcribe and translate speech audio from 97 languages. Whisper was trained on 680,000 hours of audio data collected from the web and showed robust zero-shot performance on a wide range of automated speech recognition (ASR) tasks. This will benefit many applications, such as virtual assistants, smart speakers, and more.

This video can help you understand the benefits of the Whisper.

OpenAI introduced Whisper on September 21, 2022, in this article. This will accelerate the use of artificial intelligence in applications that need to make use of technology. Here are some examples:

You record in any language, and the API extracts the text.

Click on the image to open the app

In this example, the API extracts text from a YouTube video.

Click on the image to open the app

Let’s experiment using the OpenAI Whisper API in Python to extract the text from the YouTube video.

Python
# Author: Lawrence Teixeira
# Date: 02/11/2022

# Requirements to run this script:
#pip install git+https://github.com/openai/whisper.git
#pip install pytube

# import the necessary packages
import pytube as pt
import whisper

# download mp3 from youtube video (Indroductrion to Whisper: The speech recognition)
yt = pt.YouTube("https://www.youtube.com/watch?v=Bf6Z5bjlHcI")
stream = yt.streams.filter(only_audio=True)[0]
stream.download(filename="audio.mp3")

# load the model
model = whisper.load_model("medium")

# transcribe the audio file
result = model.transcribe("audio.mp3")

# print the text extracted from the video
print(result["text"])

Text extracted from the video “Introduction to Whisper: The speech recognition.”

“Whisper is an open source deep learning model for speech recognition that was released by Oppenai last week. Oppenai’s tests of Whisper show that it can do a good job of transcribing not just English audio, but also audio in a number of other languages. Developers and researchers who have worked with Whisper and seen what it can do are also impressed by it. But the release of Whisper may be just as important for what it tells us about how artificial intelligence AI research is changing, and what kinds of applications we can expect in the future. Whisper from Oppenai is open to all kinds of data. One of the most important things about Whisper is that it was trained with many different kinds of data. Whisper was trained on 680,000 hours of data from the web that was supervised by people who spoke different languages and did different tasks. A third of the training data is made up of audio examples that are not in English. Whisper can reliably transcribe English speech and perform at a state-of-the-art level with about 10 languages, an Oppenai representative told VentraBeat in written comments. It can also translate from those languages into English. Even though the lab’s analysis of languages other than English isn’t complete, people who have used it say it gives good results. Again, the AI research community has become more interested in different kinds of data. This year, Bloom was the first language model to work with 59 different languages. Meta is also working on a model that can translate between 200 different languages. By moving toward more data and language diversity, more people will be able to use and benefit from deep learning’s progress. Make your own test since Whisper is open source. Developers and users can choose to run it on their laptop, desktop workstation, mobile device, or a cloud server. OpenAI made Whisper in five different sizes. Each size traded accuracy for speed in a proportional way, with the smallest model being about 60 times faster than the largest. Developers who have used Whisper and seen what it can do are happy with it, and it can make cloud-based ASR services, which have been the main choice until now, less appealing. And Lobs expert Noah Giff told VentraBeat, At first glance, Whisper seems to be much more accurate than other SaaS products. Since it is free and can be programmed, it will probably be a very big problem for services that only do transcription. Whisper was released as an open source model that was already trained, and that anyone can download and run on any computer platform they want. In the past few months, commercial AI research labs have been moving in the direction of being more open to the public. You can make your own apps. There are already a number of ways to make it easier for people who don’t know how to set up and run machine learning models to use Whisper. One example is a project by journalist Peter Stern and GitHub engineer Christina Warren to make a free, secure, and easy to use transcription app for journalists based on Whisper. In the cloud, open source models like Whisper are making new things possible. Platforms like Hugging Face are used by developers to host Whisper and make it accessible through API calls. Jeff Bootyer, growth and product manager at Hugging Face, told VentraBeat, It takes a company 10 minutes to create their own transcription service powered by Whisper and start transcribing calls or audio content, even at a large scale. Hugging Face already has a number of services based on Whisper, such as an app that translates YouTube videos. Or, you can tweak existing apps to fit your needs. And fine-tuning, which is the process of taking a model that has already been trained and making it work best for a new application, is another benefit of open source models like Whisper. For example, Whisper can be tweaked to make ASR work better in a language that the current model doesn’t do as well with. Or, it can be tweaked to understand medical or technical terms better. Another interesting idea would be to fine-tune the model for tasks other than ASR, like verifying the speaker, finding sound events, and finding keywords. Hugging Face’s technical lead, Philip Schmidt, told VentraBeat that people have already told them that Whisper can be used as a plug-and-play service to get better results than before. When you put this together with fine-tuning the model, the performance will get even better. Fine-tuning for languages that were not well represented in the pre-training dataset can make a big difference in how well the system works.”

As you can see, the text is exactly what was spoken. Note that in this example, we use the intermediate model. Here are the models that we can use to increase the accuracy.

Available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.

For English-only applications, the .en models tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models.

Whisper’s performance varies widely depending on the language. The figure below shows a WER breakdown by languages of Fleur’s dataset using the large model. More WER and BLEU scores corresponding to the other models and datasets can be found in Appendix D of the paper.

The image is taken from the official Whisper documentation.

Conclusion: Although there is still some controversy around how well AI Whisper works, the concept behind it is something to think about. With more and more businesses moving towards automated marketing and customer service, AI Whisper could be a valuable tool for those looking to get ahead in the industry. Have you tried using AI Whisper or any other similar tools? Let us know in the comments!

Follow the official Whisper references:

Project link: https://openai.com/blog/whisper/
Code: https://github.com/openai/whisper

That’s it for today!

Where is the most relevant information for data analysis in Law and Intellectual Property?

Strategic information that is relevant for data-based decision-making in the areas of law and intellectual property is most often stored in PDF documents. Information such as who was the judge who decided a lawsuit, what was the reason for rejection, in the case of patents, who were the examiners who signed a technical examination report or decision, what was the reason, and what articles were used as a basis for the rejection of a patent are just a few examples.


Information is usually stored in an unstructured way, and a simple OCR procedure is often not enough. Nowadays we have a lot of APIs that use artificial intelligence that we can use to extract information in a structured way. Here’s an example of form-aware APIs. These tools can extract, for example, a table in table form from a PDF document. There are several solutions on the market. The solutions I’ve had the opportunity to test are Google Document AI and Azure Form Recognizer.

Let’s take a look at the pros and cons of each option to help you decide.

Google Document AI Pros:

  • integrates with Google Drive, making it easy to use for businesses that already use Google products
  • offers a free tier with limited features for businesses on a budget
  • an easy-to-use interface makes it quick to get started with little training required

Google Document AI Cons:

  • lacks some of the more advanced features offered by competitors, making it less suitable for businesses with complex needs
  • not as widely used as some competitors, making it harder to find support and resources if you encounter problems
  • pricing can be expensive for businesses that need more than the free tier offers

Azure Form Recognizer Pros:

  • offers more advanced features than Google Document AI, making it better suited for businesses with complex needs
  • widely used, meaning there’s plenty of support and resources available if you encounter problems
  • pricing is based on usage, so you only pay for what you need

Azure Form Recognizer Cons:

  • not as easy to use as Google Document AI so it may require more training for employees
  • doesn’t integrate with other Microsoft products as seamlessly as Google Document AI integrates with Google products

I tested using the Azure Form Recognizer API on a patent technical examination report downloaded from Brazilian Patent and Trademark Office (BRPTO). Documents are normally in the format below. If you want to see the file in full click here.

If we simply perform an OCR on these tables, the data looks like this:

Quadro 2 – Considerações referentes aos Artigos 10, 18, 22 e 32 da Lei n.o 9.279 de 14 demaio de 1996 – LPI Artigos da LPISim NãoA matéria enquadra-se no art. 10 da LPI (não se considera invenção)XA matéria enquadra-se no art. 18 da LPI (não é patenteável)XO pedido apresenta Unidade de Invenção (art. 22 da LPI)XO pedido está de acordo com disposto no art. 32 da LPIXComentários/Justificativas

Quadro 3 – Considerações referentes aos Artigos 24 e 25 da LPIArtigos da LPISim NãoO relatório descritivo está de acordo com disposto no art. 24 da LPIXO quadro reivindicatório está de acordo com disposto no art. 25 da LPIX

We could not efficiently and accurately identify the options indicated in the tables. So the best solution is to use an API that recognizes tables as shown below:

Click on the image for a full screen
Click on the image for a full screen
Click on the image for a full screen
Click on the image for a full screen
Click on the image for a full screen

You can see that the columns in the tables are recognized perfectly, and we extracted the data exactly as it is in the table converted to JSON format.

If you want, you can download the JSON file here.

From these form recognition APIs, we can create an algorithm to perform a mass reading and save the structured information in a Data Lake, Database, or whatever format you need to use in your data analysis.


If you liked the post and want me to make an example of the algorithm in Python, write below in the comments that I will be happy to share it with you.

That’s it for today!

How to use Python in Google Colab integrated directly with Power BI to analyze patent data

This blog post will show you how to load and transform patent data and connect Power BI with Google Colab. Google Colab is a free cloud service that allows you to run Jupyter notebooks in the cloud. Jupyter notebooks are a great way to share your code and data analysis with others. Power BI is a business intelligence tool that allows you to visualize your data and create reports. Connecting Power BI with Google Colab allows you to easily share your data visualizations with others. Let’s get started!

What is a patent?

A patent is an exclusive right granted for an invention, which is a product or a process that provides, in general, a new way of doing something or offers a new technical solution to a problem. To get a patent, technical information about the invention must be disclosed to the public in a patent application.

What is WIPO?

WIPO is the global forum for intellectual property (IP) services, policy, information, and cooperation. WIPO’s activities include hosting forums to discuss and shape international IP rules and policies, providing global services that register and protect IP in different countries, resolving transboundary IP disputes, helping connect IP systems through uniform standards and infrastructure, and serving as a general reference database on all IP matters; this includes providing reports and statistics on the state of IP protection or innovation both globally and in specific countries.[7] WIPO also works with governments, nongovernmental organizations (NGOs), and individuals to utilize IP for socioeconomic development. If you need more information about WIPO, click here.

This video can demonstrate the Power BI functionality we will use today

Now, you understand what a patent is and what WIPO is. Let’s start our experiment!

First, we will load the patent data from WIPO. In this experiment, we will use the authority file from 2022.

Python
from powerbiclient import Report, models
from powerbiclient.authentication import DeviceCodeLoginAuthentication
import pandas as pd
from google.colab import drive
from google.colab import output
from urllib import request
import zipfile
import requests

# mount Google Drive
drive.mount('/content/gdrive')

file_url = "https://patentscope.wipo.int/search/static/authority/2022.zip"
	
r = requests.get(file_url, stream = True)

with open("/content/gdrive/My Drive/2022.zip", "wb") as file:
	for block in r.iter_content(chunk_size = 1024):
		if block:
			file.write(block)
   
compressed_file = zipfile.ZipFile('/content/gdrive/My Drive/2022.zip')

csv_file = compressed_file.open('2022.csv')

data = pd.read_csv(csv_file, delimiter=";", names=["Publication Number","Publication Date","Title","Kind Code","Application No","Classification","Applicant","Url"])

#Show the head data
data.head()

Now, we have the data let’s do some transformation to prepare to load in the Power BI report.

Python
# Transformations of the csv file dowloaded from wipo

#remove the two fisrt lines
data = data.iloc[1:]
data = data.iloc[1:]

#create a new column with the Classification name
data["Classification_Name"] = data["Classification"].str[:1]

#Modify this column with the classification description
data["Classification_Name"] = data["Classification_Name"].replace({
    'A': 'Human Necessities', 
    'B': 'Performing Operations and Transporting', 
    'C': 'Chemistry and Metallurgy', 
    'D': 'Textiles and Paper', 
    'E': 'Fixed Constructions', 
    'F': 'Mechanical Engineering', 
    'G': 'Physics', 
    'H': 'Electricity'
  }
)

#Show again the head data
data.head()

#Save the Excel file in google drive to share with the Power BI report.
data.to_excel("gdrive/MyDrive/datasets/Result_WIPO2022.xlsx")

After that, we will connect to Power BI and show the report inside Google Colab.

Python
# Import the DeviceCodeLoginAuthentication class to authenticate against Power BI and initiate the Micrsofot device authentication
device_auth = DeviceCodeLoginAuthentication()

group_id="YOU HAVE TO PUT HERE YOUR POWER BI GROUP ID OR WORKSPACE ID"
report_id="YOU HAVE TO PUT HERE YOUR POWER BI REPORT ID"

report = Report(group_id=group_id, report_id=report_id, auth=device_auth)
report.set_size(1024, 1600)
output.enable_custom_widget_manager()

# Show the power BI report with the wipo downloaded data.
report

Click here, to see this report in full-screen mode.

Follow here the Google Colab file with the Python code. If you want the Power BI report click here.

Conclusion

In this blog post, we showed you how to load data from external datasets, and transform and load in Power BI reports inside Google Colab. By following these steps, you can start using Google Colab and Power BI to analyze your data with Python and easily share it with others!

That’s it for today!