Implementing Data Governance in Power BI: A Step-by-Step Guide

As data plays a crucial role in decision-making and data-driven insights, organizations require a robust data governance framework to manage and monitor their data assets. Power BI offers various features and tools that aid in implementing data governance and ensuring data accuracy, reliability, and security.

As data becomes increasingly critical to organizations of all sizes and industries, managing this data effectively and securely becomes just as important. A crucial aspect of data management is data governance, which is defining and enforcing policies, procedures, and standards for data management. This article will explore data governance basics, how to implement it in Power BI, and the advantages of using Power BI Premium.

What is Data Governance?

Data governance is the set of processes, policies, and standards organizations use to manage their data effectively. It encompasses everything from data quality and security to data privacy and retention. Effective data governance is crucial for organizations to ensure that their data is accurate, secure, and accessible. In addition, it helps organizations make informed decisions, reduce risks associated with poor data quality, and maintain compliance with legal and regulatory requirements.

How to Implement Data Governance in Power BI

Power BI provides various features and tools to help implement data governance. These include Dataflows, Datamarts, Sensitivity labels, Endorsement, Discovery, and Row-Level-Security(RLS). Dataflows allow organizations to connect, clean, and transform data, while Datamarts provide a centralized data repository. Sensitivity labels help to classify and protect sensitive data, while Endorsement allows organizations to enforce data quality standards. Finally, Discovery helps organizations manage, monitor, and understand their data assets. Let’s explain each of them.

Dataflows

dataflow is a collection of tables created and managed in workspaces in the Power BI service. A table is a set of columns used to store data, much like a table within a database. You can add and edit tables in your dataflow and manage data refresh schedules directly from the workspace in which your dataflow was created.

As data volume grows, so does the challenge of wrangling that data into well-formed, actionable information. We want data ready for analytics to populate visuals, reports, and dashboards, so we can quickly turn our volumes of data into actionable insights. With self-service data prep for big data in Power BI, you can go from data to Power BI insights with just a few actions.

When to use dataflows

Dataflows are designed to support the following scenarios:

Create reusable transformation logic that many datasets and reports inside Power BI can share. Dataflows promote the reusability of the underlying data elements, preventing the need to create separate connections with your cloud or on-premises data sources.

Expose the data in your Azure Data Lake Gen 2 storage, enabling you to connect other Azure services to the raw underlying data.

Create a single source of truth by forcing analysts to connect to the dataflows rather than connecting to the underlying systems. This single source gives you control over which data is accessed and how data is exposed to report creators. You can also map the data to industry standard definitions, enabling you to create tidy curated views, which can work with other services and products in the Power Platform.

Suppose you want to work with large data volumes and perform ETL at scale; dataflows with Power BI Premium scale more efficiently and give you more flexibility. Dataflows support a wide range of cloud and on-premises sources.

Prevent analysts from having direct access to the underlying data source. Since report creators can build on top of dataflows, it might be more convenient for you to allow access to underlying data sources only to a few individuals and then provide access to the dataflows for analysts to build on. This approach reduces the load to the underlying systems and gives administrators finer control of when the systems get loaded from refreshes.

    You can use Power BI Desktop and the Power BI service with dataflows to create datasets, reports, dashboards, and apps that use the Common Data Model. You can gain deep insights into your business activities from these resources. Dataflow refresh scheduling is managed directly from the workspace in which your dataflow was created, just like your datasets.

    Click here to learn how to create a Dataflow in Power BI.

    Datamarts

    Datamarts are self-service analytics solutions that enable users to store and explore data in a fully managed database.

    When to use Datamarts

    Datamarts are targeted toward interactive data workloads for self-service scenarios. For example, suppose you’re working in accounting or finance. In that case, you can build your data models and collections, which you can then use to self-serve business questions and answers through T-SQL and visual query experiences. In addition, you can still use those data collections for more traditional Power BI reporting experiences. Datamarts are recommended for customers who need domain-oriented, decentralized data ownership and architecture, such as users who need data as a product or a self-service data platform.

    Datamarts are designed to support the following scenarios:

    Departmental self-service data: Centralize small to moderate data volume (approximately 100 GB) in a self-service fully managed SQL database. Datamarts enable you to designate a single store for self-service departmental downstream reporting needs (such as Excel, Power BI reports, and others), thereby reducing the infrastructure in self-service solutions.

    Relational database analytics with Power BI: Access a datamart’s data using external SQL clients. Azure Synapse and other services/tools that use T-SQL can also use datamarts in Power BI.

    End-to-end semantic models: Enable Power BI creators to build end-to-end solutions without dependencies on other tooling or IT teams. Datamarts eliminates managing orchestration between dataflows and datasets through auto-generated datasets while providing visual experiences for querying data and ad-hoc analysis, all backed by Azure SQL DB.

    Click here if you want to know how to create a Datamart.

    Sensitivity labels

    A Sensitivity label is an information icon that users can apply in the Power BI Desktop or the Power BI Service. They are essentially digital stamps that can be applied to a resource to classify and restrict critical content when shared outside Power BI.

    Click here if you want more information about implementing sensitivity labels.

    Endorsement

    Power BI provides two ways to endorse your valuable, high-quality content to increase its visibility: promotion and certification.
    Promotion: Promotion is a way to highlight the content you think is valuable and worthwhile for others to use. It encourages the collaborative use and spread of content within an organization.
    Any content owner and member with write permissions on the workspace where the content is located can promote the content when they think it’s good enough for sharing.
    Certification: Certification means that the content meets the organization’s quality standards and can be regarded as reliable, authoritative, and ready for use.
    Only authorized reviewers (defined by the Power BI administrator) can certify content. Content owners who wish to see their content certified and are not authorized to certify it themselves must follow their organization’s guidelines about getting their content certified.

    Click here to learn how to endorse your content in Power BI.

    Dataset Discovery

    The Power BI dataset discovery hub empowers Power BI and Microsoft Teams users to discover and re-use organizational and curated datasets and answer their business questions in Power BI or Excel. The hub will empower data owners to manage their assets in a central location.

    Click here to learn more about dataset discovery.

    Row-Level-Security (RLS)

    Row-level security (RLS) with Power BI can be used to restrict data access for given users. Filters restrict data access at the row level, and you can define filters within roles. In the Power BI service, members of a workspace have access to datasets in the workspace. RLS doesn’t restrict this data access.

    Click here to learn more about Row-level security

    What Is Self-Service in Power BI?

    Self-service business intelligence (BI) is a data analytics method that allows business users (e.g., business analysts, managers, and executives) to access and explore datasets without experience in BI, data mining, and statistical analysis. Users can run queries and customize data visualization, dashboards, and reports to support real-time data-driven decision-making.

    Power BI offers robust self-service capabilities. You can tap into data from on-premise, and cloud-based data sources (e.g., Dynamics 365, Salesforce, Azure SQL Data Warehouse, Excel, SharePoint), then filter, sort, analyze, and visualize the information without the help of a BI or IT team.

    Using the Power Query experience, business analysts can directly ingest, transform, integrate, and enrich big data in the Power BI web service. The ingested data can then be shared with other users across various Power BI models, reports, and dashboards.

    How vital is Self-Service in Power BI?

    In many businesses, productivity and agility suffer due to a lengthy process for BI-related data requests. For example, when Alice asks Bob a question, Bob has to wait for the BI/IT team to pull the data. This can take several weeks and multiple meetings, slowing the decision-making process.

    But with Power BI self-service, Bob can quickly retrieve real-time data, and Alice can immediately drill down into relevant datasets during the first meeting. This results in a more efficient discussion and a potential solution that can be implemented immediately.

    The significance of Power BI self-service goes beyond just real-time insights, collaboration, and data reuse. It helps business users develop the habit of relying on data when making decisions. Without easy access to data analytics, they may rely on instincts or experience, leading to suboptimal outcomes. But with real-time data at their fingertips, users can make data-driven decisions, establishing a pattern of data-informed decision-making.

    Implementing Effective Data Governance in a Power BI Self-Service Environment

    Data Governance is critical in implementing a self-service culture in Power BI as it provides a framework for defining, maintaining, and enforcing data management policies. The following are critical components of a data governance plan in Power BI:

    1. Data Quality: Define data quality and accuracy standards to ensure that the data used is reliable and trustworthy.
    2. Data Security: Implement security measures to ensure that sensitive data is protected and only accessible by authorized users.
    3. Data Lineage: Define the lineage of the data sources used in Power BI to ensure that the data can be traced back to its source.
    4. Data Ownership: Assign ownership of data sources and ensure that data owners are responsible for maintaining the accuracy of their data.
    5. Data Stewardship: Designate data stewards responsible for maintaining data quality and ensuring compliance with data management policies.
    6. Data Access Control: Implement access controls to ensure that only authorized users can access sensitive data.
    7. Data Auditing: Implement auditing and monitoring processes to track changes to the data and ensure compliance with data management policies.

    By implementing these key components, organizations can establish a strong foundation for a self-service culture in Power BI while ensuring that the data is secure, accurate, and trustworthy.

    Maximizing Your Data Governance with Power BI Premium

    From scalability to security, Power BI Premium offers a range of features that can help organizations manage their data more effectively. With dedicated capacity, IT departments can ensure consistent performance for their teams. Advanced security features also guarantee data privacy and protection. Follow below the ten advantages of implementing data governance with Power Bi Premium:

    1. Scalability: Power BI Premium can handle large amounts of data and high concurrent usage.
    2. Dedicated Capacity: Dedicated resources for Power BI Premium ensure consistent performance.
    3. IT Governance: IT departments can centrally manage and govern Power BI deployments.
    4. Data Privacy & Security: Advanced security features ensure data privacy and protection.
    5. Shared Workspaces: Teams can collaborate on data and reports in a secure environment.
    6. Unrestricted Data Sources: Power BI Premium supports a broader range of data sources than Power BI Pro.
    7. Dynamic Row-Level Security: Secure access to sensitive data can be managed dynamically.
    8. On-Premises Data Connectivity: Power BI Premium supports connectivity to on-premises data sources.
    9. Long-Term Data Retention: Power BI Premium enables organizations to retain data for extended periods.
    10. Lower TCO: Power BI Premium can provide lower total ownership costs than purchasing individual Power BI Pro licenses.

    10 Effective Strategies for Implementing Data Governance in Power BI

    1. Creating Dataflows for cleaning and transforming data.
    2. Implementing Sensitivity labels to classify and protect sensitive data.
    3. Using Datamarts for centralizing data and improving data management.
    4. Enforcing data quality standards with Endorsement.
    5. Monitoring data assets with Discovery.
    6. Implementing data privacy and security with Power BI Premium.
    7. Improving report refresh times and performance with Power BI Premium.
    8. Sharing reports and dashboards with a larger audience with Power BI Premium.
    9. Utilizing Power BI Premium’s increased capacity for large datasets.
    10. Improving collaboration and data sharing with Power BI Premium’s multi-user authoring feature.

    Video talking about Building a Data Governance Plan for Your Power BI Environment.

    Conclusion:

    Data governance is an essential aspect of data management, helping organizations to ensure that their data is accurate, secure, and accessible. Power BI provides several features to help organizations implement data governance, including Power BI Premium, dataflows, and Datamarts. With these features, organizations can automate collecting and transforming data, reduce the risk of manual errors, and maintain compliance with legal and regulatory requirements. Whether you’re just starting to explore Power BI or are already using it to manage your data, implementing data governance is a crucial step toward effective data management.

    It’s very interesting to look at the Power BI adoption roadmap.

    Matthew Roche’s Blog from Microsoft is a massive reference to Data Culture and Governance. This guy explains everything about Dataflows here.

    If you have any questions discussed in this post or need help, feel free to contact me at this link.

    That’s it for today!

      These 5 Tech Skills Will Be In Demand In 2023

      With changing technology available in 2023, having a list to show you the top five tech skills in demand maximizes your chances of landing a good job. If you want to stay on the cutting edge of technological changes in the job market, these skills are a must-have to give you an edge over other people applying for jobs.

      A blog article about the top in-demand tech skills for jobs in the future. It briefly describes all five skills and how you can hone them to be more marketable.

      The skills that will have the most significant demand in 2023 are more than computers – an organization’s management has a big say in the skill set that their employees should know, and these skills can change from year to year. Find out what you need to add to your resume if you want to apply for one of the hottest jobs on the market!

      With technology advancing rapidly, the skills needed to succeed in those fields are likewise shifting. And what are those skills? Let’s find out!

      What Will Be Future Jobs In 2023?

      In 2023, the most in-demand jobs will likely be in artificial intelligence (AI), big data, and cloud computing. These three areas are experiencing the most rapid growth and are expected to continue for the foreseeable future.

      AI is already being used in various ways, such as to create personal assistant applications, improve search engine results, and target online ads. The potential uses for AI are virtually limitless, and as its capabilities continue to increase, so will the number of businesses and industries adopting it.

      Big data is another area with a lot of potentials. Companies are just beginning to scratch the surface of what they can do with all the data they collect. Currently, it is mainly used for marketing purposes. Still, it could be used to predict consumer behavior, improve product design, or identify new business opportunities.

      Data Communicator/ Storyteller

      As technology continues to evolve, so do the skills that employers are looking for in their employees. In the coming years, one of the most essential skills in demand is communicating data effectively.

      With the ever-increasing amount of data collected and stored, it is becoming more and more difficult for businesses to make sense of it all. That’s where data communicators come in. Data communicators are experts at taking complex data sets and communicating them in a way that is easy to understand.

      Not only do they need to be able to understand and interpret data, but they also need to be able to tell a story with it. The best data communicators can take data and turn it into an engaging story that can help organizations make better decisions.

      If you have strong communication skills and are interested in working with data, then a career as a data communicator may be right for you!

      Data Analyst: A data analyst analyzes, processes, and interprets data to find trends, patterns, and insights. Data analysts use their skills to help organizations make better decisions by providing them with actionable information.

      Data storytellers use various communicative methods, such as written communication and visualizations, to convey insights. Tools like PowerBI, QlikView, MicroStrategy, Google Data Studio, and Tableau help them find the most effective and accurate ways of conveying information.

      To be a successful data analyst, you must have strong analytical and problem-solving skills. You must also be able to effectively communicate your findings to others. See below some Data Communicator and Storyteller skills.

      Data visualization: Data communicators and storytellers should be skilled in creating visualizations that clearly and effectively communicate data insights. This includes choosing the appropriate chart or graph type, using adequate labeling and formatting, and selecting an appropriate color scheme.

      Writing: Writing clearly and concisely is essential for communicating data insights to a wide range of audiences. This includes explaining complex concepts in simple terms and using appropriate language for the audience.

      Storytelling: Data communicators and storytellers should be skilled in using storytelling techniques to engage and inform their audience. This includes understanding how to structure a story, use compelling narratives to convey data insights, and use visual aids to support the story.

      Presentation skills: Data communicators and storytellers should be skilled in presenting data insights effectively, whether in person or online. This includes understanding how to use visual aids, engage with the audience, and adapt the presentation to different audiences and contexts.

      Data literacy: Understanding and interpreting data is essential for data communicators and storytellers. This includes understanding key concepts such as statistical significance and being able to critically evaluate data sources and methods.

      If you are interested in a career that combines your love of numbers with your communication skills, then a career as a data analyst may be the perfect fit for you!

      UX Design / Web Development

      User experience (UX) design and the closely related field of user interface (UI) design will become increasingly valuable skills as businesses worldwide transform into tech companies. No matter your role on a team, you’re expected to know how to use technology. UX is what makes technology work for everyone, even when they don’t have coding knowledge. This becomes even more important in low-code/no-code environments, where businesses can build applications without hiring an engineer. Enterprises realize that good experiences lead to more engaged customers and employees. This isn’t just a trend that helps designers—it will help business owners retain their customers and make their employees happier going through their daily tasks.

      The field of web development is constantly changing, with new technologies and trends always emerging. But some core skills will always be in demand. If you’re looking to get into web development, or move up in your career, make sure you have these skills:

      1.HTML and CSS: These are the foundation languages of the web. Every website is built with HTML and CSS, so if you want to be a web developer, you need to know them inside out.

      2.JavaScript: JavaScript is a programming language that helps make websites interactive. It’s used to add features like menus, forms, and animations.

      3. Web Standards: Websites must be built using web standards to work correctly on all devices and browsers. This includes proper code structure and formatting, semantic markup, and ensuring your CSS is compatible with different browsers.

      4. Responsive Design: With more people than ever accessing the internet on mobile devices, websites must be designed to be responsive – that is, they look good and work well on any screen size. This means using flexible layouts, media queries, and other techniques to ensure your site looks great on any device.

      5. User Experience (UX): A good user experience is essential for any website or app. As a web developer, you must understand how users interact with websites and design your sites accordingly. This includes things

      Cyber Security

      Cyber security is one of the most in-demand tech skills of the future. With the increasing amount of data being stored and shared online, companies are looking for ways to protect their information from cyber attacks. As a result, the demand for cybersecurity professionals is expected to grow.

      Information extracted from this article.

      Cyber security specialists are responsible for developing and implementing security measures to protect computer networks and systems from unauthorized access or damage. They may also be required to monitor network activity for suspicious activity and respond to incidents when they occur.

      Here are some Cybersecurity skills.

      Network security: Involves protecting networks, devices, and data from unauthorized access or attacks. This includes understanding how to secure networks and devices, as well as how to detect and respond to security threats.

      Security protocols: Cybersecurity professionals should be familiar with various security protocols, including encryption, access control, and authentication, to protect data and systems from cyber threats.

      Risk assessment and management: Cybersecurity professionals need to be able to identify potential security risks and implement strategies to mitigate them. This includes understanding how to conduct risk assessments and develop risk management plans.

      Security incident response: When a security incident occurs, it is important for cybersecurity professionals to respond quickly and effectively. This includes understanding how to identify the cause of an incident, contain it, and restore affected systems.

      Compliance: Cybersecurity professionals must be familiar with relevant laws, regulations, and industry standards to ensure that their organization complies with all relevant requirements. This includes understanding data protection laws and industry-specific regulations.

      To succeed in this field, you must have strong technical skills and be up-to-date on the latest security threats. You will also need to be able to think creatively to develop new solutions to address evolving security challenges.

      Digital Marketing

      Digital marketing is one of the most in-demand tech skills today. With the rise of online marketing and the growth of the digital economy, businesses are increasingly looking for candidates with strong digital marketing skills.

      There are several reasons why digital marketing skills are in high demand. First, the growth of the internet and mobile devices has made it easier for businesses to reach their target audiences through digital channels. Second, as more businesses move into the online space, they need skilled marketers to help them navigate the complex world of digital marketing. Finally, as traditional advertising channels become less effective, businesses are turning to digital marketing to reach their customers and grow their business.

      Many skills are essential for developing a solid foundation in digital marketing. Here are five key skills that can help you succeed in this field:

      Data analysis and interpretation: Digital marketing relies heavily on data to guide strategy and measure the effectiveness of campaigns. Therefore, analyzing and interpreting data accurately is a crucial skill.

      Content creation and management: Compelling, relevant content is crucial for attracting and retaining customers. This includes writing copy for websites and social media and creating visual content such as images and videos.

      SEO: Search engine optimization (SEO) involves optimizing a website and its content to improve its ranking in search engine results pages. This includes researching and using relevant keywords and ensuring that a website is mobile-friendly and has fast loading times.

      Advertising: Digital marketing includes advertising on platforms such as Google and social media. This includes understanding how to create and target ads and measuring their effectiveness.

      Social media marketing: Social media is a powerful tool for connecting with customers and building brand awareness. Developing expertise in social media marketing involves understanding how to create and manage social media profiles and creating and sharing content that resonates with specific audiences.

      If you’re looking to start or enhance your career in tech, developing solid digital marketing skills is a great place to start. Here are some tips to get you started:

      1. Familiarize yourself with different digital marketing channels.
      2. Learn how to create effective campaigns using different digital marketing tools.
      3. Understand how to measure and analyze your results to optimize your campaigns.
      4. Stay up-to-date on the latest trends and technologies in digital marketing.
      5. Get experience by working on projects for real businesses or organizations.

      Artificial Intelligence

      Artificial intelligence plays a crucial role in the skills I mentioned before, specifically the power to work alongside AI in a manner that is commonly described as “augmented working.” Data communicators have tools that suggest the most effective forms of visualization and storytelling to communicate their insights. Cyber security professionals can use AI to analyze network traffic and spot potential attacks before they cause damage. UX designers use AI-assisted user behavior analytics to determine which features and functionality should be emphasized electronically. Finally, digital marketers have many AI tools for predicting audience behavior and developing copy and content.

      In recent years, there has been a lot of hype surrounding artificial intelligence (AI). And with good reason – AI has the potential to revolutionize several industries, from healthcare and finance to manufacturing and logistics.

      But what does AI entail? And what skills do you need to get a job in this field?

      Here’s a quick overview of AI, along with some of the most in-demand AI jobs and skills:

      What is artificial intelligence?

      At its core, artificial intelligence is all about using computers to simulate or carry out human tasks. This can involve anything from understanding natural language and recognizing objects to making decisions and planning actions.

      There are different types of AI, but some of the most common are machine learning, deep learning, natural language processing, and computer vision.

      AI jobs in demand

      As AI continues gaining traction, the demand for AI-related jobs is rising. According to Indeed, job postings for AI roles have increased by 119% since 2015. And LinkedIn’s 2018 Emerging Jobs Report found that roles related to machine learning are among the fastest-growing jobs in the US.

      Some of the most in-demand AI jobs include:

      Data Scientist: A data scientist is a professional responsible for collecting, analyzing, and interpreting large amounts of data to identify trends and patterns. They use statistical methods, machine learning techniques, and domain knowledge to extract valuable insights from data and communicate their findings to stakeholders through reports, presentations, and visualizations.

      Machine Learning Engineer: A machine learning engineer designs, builds and maintains machine learning systems. They work closely with data scientists to understand the requirements of a machine-learning project and use their programming skills to implement and deploy machine-learning models. They may also be responsible for evaluating these models’ performance and making necessary improvements.

      Research Scientist: A research scientist is a professional who conducts research in a particular field, such as computer science, biology, or physics. They may work in academia, government, or industry and use a variety of methods, including experimentation, simulation, and data analysis, to advance the state of knowledge in their field.

      Data Analyst: A data analyst is a professional responsible for collecting, processing, and analyzing data to support decision-making and strategic planning. They may use various tools and techniques, such as SQL, Excel, and statistical software, to manipulate and visualize data and communicate their findings through reports and visualizations.

      Business Intelligence Analyst: A business intelligence analyst is a professional responsible for collecting, analyzing, and interpreting data to support business decision-making. They may use various tools and techniques, such as SQL, Excel, and business intelligence software, to extract and analyze data from various sources and present their findings to stakeholders through reports, dashboards, and visualizations.

      Let’s see the Bernard Marr video on Youtube about these skills.

      Video extract from this Forbes article.

      Conclusion

      As the world progresses, so too does the technology we use. It’s crucial to stay ahead of the curve and learn new skills that will be in demand in future years. The skills listed in this article will be in high demand in 2023, so start learning them now! Who knows, you might even be able to get a head start on your competition.

      The tech industry is constantly evolving, so it’s essential to stay ahead of the curve. The skills listed in this article will be in high demand in 2023, so start learning them now! You might even be able to get a head start on your competition.

      As technology rapidly evolves, keeping your skills up-to-date is essential to stay ahead. The five tech skills mentioned in this article will be in high demand in 2023, so if you don’t have them already, now is the time to start learning. With these skills under your belt, you’ll be well-positioned to take advantage of the many opportunities coming your way in the next few years. Do you have any of these tech skills? Are there other skills you think will be in high demand in 2023? Let us know in the comments below!

      That’s it for today!

      Extracting information from unstructured text is easy with Open AI. All you need to do is give the instructions.

      How does one successfully extract information from the unstructured text? Through natural language processing, or NLP. You may be wondering what that even means or how it can facilitate the extraction of information. All you need to do is give the instructions. This article will discuss how NLP facilitates the extraction process and how it is done – supervised and unsupervised learning.

      What is Open AI?

      Open AI is an artificial intelligence research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. Founded in December 2015, with initial funding of $1 billion from Sam Altman and several other investors, OpenAI has the stated goal of promoting friendly artificial intelligence to benefit humanity as a whole.

      How does Open AI Work?

      There are many different ways to extract information from unstructured text. The most common way is to use a keyword or keyphrase. This is where you give a specific word or phrase to the Open AI software, which will locate all instances of that word or phrase in the text. It will then return the results to you in an easily readable format.

      Another way to extract information from unstructured text is to use a concept search. This is where you give a general concept or topic to the software, and it will locate all instances of that concept in the text. It will then return the results to you in an easily readable format.

      The last way to extract information from unstructured text is to use a natural language processing model. This is where you provide the software with a large amount of text, and it will analyze the text’s grammar, syntax, and meaning. It will then return the results to you in an easily readable format.

      Creating a System for Extracting Information from Unstructured Text with Open AI

      If you have a lot of unstructured text and you need to extract information from it, Open AI can help. All you need to do is give the instructions to the software, and it will do the rest.

      Open AI is especially useful for extracting information from unstructured text because it can handle various formats. For example, if you have a PDF document, Open AI can convert it into text that can be further processed.

       Open AI is also good at dealing with multiple languages. For example, if you have a document in English and another in Portuguese, Open AI can usually translate between the two languages and extract the desired information.

      Putting it to work:

      Open AI makes extracting information from unstructured text easy. All you need to do is give the instructions. Let’s go to the example. I selected the sub-judice patent publications extracted from the 10 latest BRPTO Brazilian gazettes. Note that everything is written in Brazilian Portuguese. If you want the dataset I used, you can click here to download it.

      Python
      import pandas as pd
      import openai
      import pyodbc
      
      openai.api_key = "YOU HAVE TO INSERT HERE YOUR OPEN AI KEY"
      
      # Define the funcion to ask the question and extract the information
      def OpenAI_Question(question_type, openai_response ):
          response = openai.Completion.create(
            engine="text-davinci-003",
            prompt= question_type + chr(10) + openai_response + chr(10),
            temperature=0.7,
            max_tokens=256,
            top_p=1,
            frequency_penalty=0,
            presence_penalty=0
          )
          return response['choices'] [0]['text']
          
      def Extract_Process_Information( Text ):
          Resultado = OpenAI_Question("Extrair do texto o número do processo udicial, tipo da ação, tribunal, interessados, autor e réus:", Text)
                
          return Resultado
          
      # Connect to my experiment database to get the complement of the sub-judice patent publications 
      server = 'dbserverlaw.database.windows.net' 
      database = 'db_lawrence_experiments' 
      username = 'YOUR HAVE TO PUT YOU USER HERE' 
      password = 'YOU HAVE TO PUT YOUT PASSWORD HERE'  
      cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
      cursor = cnxn.cursor()
      
      # select 20 rows from SQL table to insert in dataframe.
      query = "select top 20 Complemento from Patentes_SubJudce;"
      df = pd.read_sql(query, cnxn)
      
      # Show the results. Here you can do everything you want with the extract information.
      print( "Question asked from OpenAI model text-davinci-003: Extrair do texto o número do processo judicial sem ser INPI, tipo da ação, tribunal, interessados, autor e réus.", chr(10))
      
      for i in df.index:
          Extract = Extract_Process_Information(df['Complemento'][i])
          
          print ("Text ", i+1, ":")
          print( df['Complemento'][i], chr(10) )
          print( "Information extracted from the text ",i+1, ":")
          print( Extract.strip(), chr(10) )    

      Let’s show the results of this Python script:

      Question asked from OpenAI model text-davinci-003: “Extrair do texto o número do processo judicial sem ser INPI, tipo da ação, tribunal, interessados, autor e réus.

      Text 1 :
      Processo SEI Nº: 52402.011406/2022-11 NUP PRINCIPAL: 01032.546858/2021-44 NUP REMISSIVO: 00848.001324/2022-17 PROCESSO Nº: 5019398-85.2021.4.03.0000 AUTOR: ARTIPE PRODUTOS ORTOPEDICOS E ESPORTIVOS LTDA ? ME Acórdão: A Primeira Turma, por unanimidade, deu provimento ao agravo de instrumento para determinar a suspensão dos efeitos da patente de invenção discutida nos autos de origem.

      Information extracted from the text 1 :
      Número do processo judicial:
      5019398-85.2021.4.03.0000
      Tipo da ação: Agravo de Instrumento
      Tribunal: Primeira Turma
      Interessados: ARTIPE PRODUTOS ORTOPEDICOS E ESPORTIVOS LTDA ? ME
      Autor: ARTIPE PRODUTOS ORTOPEDICOS E ESPORTIVOS LTDA ? ME
      Réus: Não especificado

      Text 2 :
      Processo INPI nº 52400.000958/2008-57 NUP PRINCIPAL: 00408.005736/2017-48 NUP REMISSIVO: 00848.001319/2022-12 Origem : TRIBUNAL REGIONAL FEDERAL DA 2ª REGIÃO AGRAVANTE : BMZAK BENEFICIAMENTO METAL MECANICO LTDA – ME AGRAVADO : MUNDIAL S.A. – PRODUTOS DE CONSUMO INTERESSADO : INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL Decisão: 1) julgo PROCEDENTE o pedido autoral, resolvendo o mérito, nos termos do art.269, I, do CPC, para decretar a nulidade da patente de modelo de utilidade MU7801576-6 para ?disposição em botão metálico?; 2) reconheço a litispendência e julgo extinto o pedido reconvencional, sem resolução de mérito, nos termos do art.267, V, penúltima figura, do CPC. Deverá o INPI publicar a presente decisão na próxima RPI e em seu site oficial. Trânsito em julgado.

      Information extracted from the text 2 :
      Número do processo judicial:
      00408.005736/2017-48
      Tipo da ação: Ação de nulidade de patente
      Tribunal: Tribunal Regional Federal da 2ª Região
      Interessados: BMZAK Beneficiamento Metal Mecânico Ltda – ME; Mundial S.A – Produtos de Consumo; Instituto Nacional da Propriedade Industrial
      Autor: BMZAK Beneficiamento Metal Mecânico Ltda – ME
      Réus: Mundial S.A – Produtos de Consumo

      Text 3 :
      Processo INPI nº 52402.001592/2021-91 13ª Vara Federal do Rio de Janeiro PROCEDIMENTO COMUM Nº 5007472-60.2021.4.02.5101/RJ AUTOR: OTTA SUSHI COMERCIO DE ALIMENTOS LTDA RÉU: INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL RÉU: LKD ALIMENTOS SAUDÁVEIS LTDA. Sentença: Ante o exposto, Julgo improcedente o pedido de nulidade da patente de modelo de utilidade MU 8900712-3 para ?disposição construtiva introduzida em embalagem para acondicionamento de alimentos?, resolvendo o mérito (CPC/2015, art. 487, inciso I). Trânsito em julgado.

      Information extracted from the text 3 :
      Processo: 5007472-60.2021.4.02.5101/RJ
      Tipo da Ação: Procedimento Comum
      Tribunal: 13ª Vara Federal do Rio de Janeiro
      Interessados: Otta Sushi Comercio de Alimentos Ltda e LKD Alimentos Saudáveis Ltda
      Autor: Otta Sushi Comercio de Alimentos Ltda
      Réus: INPI-Instituto Nacional da Propriedade Industrial e LKD Alimentos Saudáveis Ltda.

      Text 4 :
      Processo INPI nº 52402.005814/2019-20 9ª Vara Federal do Rio de Janeiro NUP: 00408.036343/2019-48 (REF. 5025815-75.2019.4.02.5101) EXEQUENTE: IMPLANTICA PATENT LTD (SOCIEDADE) EXECUTADO: INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL Decisão: Isto posto, julgo procedente o pedido, para decretar a nulidade dos atos administrativos do INPI que extinguiram as patentes de invenção PI0108142-0 e PI0108309-0 com base no art. 13 da Resolução INPI n. 113/2013 e determinar a consequente restauração das mesmas, nos moldes da fundamentação acima.

      Information extracted from the text 4 :
      Número do processo judicial:
      00408.036343/2019-48
      Tipo da ação: Execução
      Tribunal: 9ª Vara Federal do Rio de Janeiro
      Interessados: Implantaica Patent Ltd (Sociedade) e INPI-Instituto Nacional da Propriedade Industrial
      Autor: Implantaica Patent Ltd (Sociedade)
      Réus: INPI-Instituto Nacional da Propriedade Industrial

      Text 5 :
      Processo INPI nº 52402.005814/2019-20 9ª Vara Federal do Rio de Janeiro NUP: 00408.036343/2019-48 (REF. 5025815-75.2019.4.02.5101) EXEQUENTE: IMPLANTICA PATENT LTD (SOCIEDADE) EXECUTADO: INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL Decisão: Isto posto, julgo procedente o pedido, para decretar a nulidade dos atos administrativos do INPI que extinguiram as patentes de invenção PI0108142-0 e PI0108309-0 com base no art. 13 da Resolução INPI n. 113/2013 e determinar a consequente restauração das mesmas, nos moldes da fundamentação acima.

      Information extracted from the text 5 :
      Número do processo judicial:
      5025815-75.2019.4.02.5101
      Tipo da ação: Execução
      Tribunal: 9ª Vara Federal do Rio de Janeiro
      Interessados: Implantaica Patent LTD (Sociedade) e INPI – Instituto Nacional da Propriedade Industrial
      Autor: Implantaica Patent LTD (Sociedade)
      Réus: INPI – Instituto Nacional da Propriedade Industrial

      Text 6 :
      Processo INPI nº 52402.004535/2022-44 21ª Vara Federal Cível da SJDF PROCESSO JUDICIAL: 1006097-47.2022.4.01.3400 NUP: 00424.125631/2022-73 (REF. 1006097-47.2022.4.01.3400) INTERESSADOS: AGÊNCIA NACIONAL DE VIGILÂNCIA SANITÁRIA – ANVISA E OUTROS Decisão: Pelo exposto, DEFIRO o pedido de tutela provisória de urgência para determinar a suspensão dos efeitos do despacho 16.3 (publicado na RPI nº 2.629 de 25/05/21), que reduziu o prazo de vigência das patentes PI0212733-4 e BR 12 2012 023120 7, de modo que estas permaneçam vigentes até a prolação de sentença de mérito ? limitada a compensação de prazo requerida no pedido, qual seja, 663 (seiscentos e sessenta e três) dias para a PI0212733-4 e 1.594 (mil quinhentos e noventa e quatro) dias para a BR 12 2012 023120 7, bem como que o INPI publique, na primeira edição da RPI subsequente a sua intimação, a informação acerca da tutela concedida.

      Information extracted from the text 6 :
      Processo judicial:
      1006097-47.2022.4.01.3400
      Tipo da ação: Tutela provisória de urgência
      Tribunal: 21ª Vara Federal Cível da SJDF
      Interessados: Agência Nacional de Vigilância Sanitária – Anvisa e outros
      Autor: Agência Nacional de Vigilância Sanitária – Anvisa e outros
      Réus: INPI

      Text 7 :
      Processo INPI nº 52402.004535/2022-44 21ª Vara Federal Cível da SJDF PROCESSO JUDICIAL: 1006097-47.2022.4.01.3400 NUP: 00424.125631/2022-73 (REF. 1006097-47.2022.4.01.3400) INTERESSADOS: AGÊNCIA NACIONAL DE VIGILÂNCIA SANITÁRIA – ANVISA E OUTROS Decisão: Pelo exposto, DEFIRO o pedido de tutela provisória de urgência para determinar a suspensão dos efeitos do despacho 16.3 (publicado na RPI nº 2.629 de 25/05/21), que reduziu o prazo de vigência das patentes PI0212733-4 e BR 12 2012 023120 7, de modo que estas permaneçam vigentes até a prolação de sentença de mérito ? limitada a compensação de prazo requerida no pedido, qual seja, 663 (seiscentos e sessenta e três) dias para a PI0212733-4 e 1.594 (mil quinhentos e noventa e quatro) dias para a BR 12 2012 023120 7, bem como que o INPI publique, na primeira edição da RPI subsequente a sua intimação, a informação acerca da tutela concedida.

      Information extracted from the text 7 :
      Processo judicial:
      1006097-47.2022.4.01.3400
      Tipo da ação: Tutela provisória de urgência
      Tribunal: 21ª Vara Federal Cível da SJDF
      Interessados: Agência Nacional de Vigilância Sanitária – Anvisa e outros
      Autor: Agência Nacional de Vigilância Sanitária – Anvisa e outros
      Réus: INPI

      Text 8 :
      Processo INPI nº 52400.003545/2022-39 NUP: 00408.078470/2022-10 (REF. 0017246-69.2002.4.02.5101) Autor: Formax Quimiplan Componentes Para Calçados Ltda. Reús: Giulini Chemie GmbH e Instituto Nacional da Propriedade Industrial- INPI Sentença: Isto posto, JULGO IMPROCEDENTE o pedido de nulidade da patente de invenção PI 8506015-1, bem como o pedido de nulidade do privilégio decorrente da reivindicação n’ 1 da patente em tela, formulado alternativamente. Trânsito em julgado.

      Information extracted from the text 8 :
      Número do processo:
      00408.078470/2022-10
      Tipo da ação: Nulidade de patente
      Tribunal: Tribunal Regional Federal da 2ª Região
      Interessados: Formax Quimiplan Componentes Para Calçados Ltda. e Giulini Chemie GmbH
      Autor:
      Formax Quimiplan Componentes Para Calçados Ltda.
      Réus: Giulini Chemie GmbH e Instituto Nacional da Propriedade Industrial- INPI

      Text 9 :
      Processo INPI nº 52402.005638/2020-60 13ª Vara Federal do Rio de PROCEDIMENTO COMUM Nº 5029675-50.2020.4.02.5101/RJ AUTOR: LIBBS FARMACEUTICA LTDA AUTOR: MABXIENCE RESEARCH SL RÉU: GENENTECH, INC RÉU: INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL Decisão: Isto posto, homologo a renúncia ao direito sobre o qual se funda a ação, extinguindo o processo com resolução de mérito (CPC/2015, art. 487, III, ‘c’). Tendo em vista a manifesta ausência de interesse recursal das partes litigantes, o que deriva da própria preclusão lógica inerente à renúncia, certifique-se, desde logo, o trânsito em julgado.

      Information extracted from the text 9 :
      Número do processo judicial:
      5029675-50.2020.4.02.5101/RJ
      Tipo da ação: Procedimento comum
      Tribunal: 13ª Vara Federal do Rio de Janeiro
      Interessados: Libbs Farmacêutica Ltda, Mabxience Research SL, Genentech, Inc. e INPI-Instituto Nacional da Propriedade Industrial
      Autor: Libbs Farmacêutica Ltda
      Réus: Mabxience Research SL, Genentech, Inc. e INPI-Instituto Nacional da Propriedade Industrial

      Text 10 :
      INPI nº 52402.011824/2022-08 Origem: JUÍZO FEDERAL DA 9ª VF DO RIO DE JANEIRO (TRF2) Processo Nº: 5076666-16.2022.4.02.5101 NULIDADE DA PATENTE DE MODELO DE UTILIDADE com pedido de Antecipação de Tutela Autor: M.A. ROSSINI LOPES – ME. Réu(s): ANDRÉ LOPES e INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL ? INPI

      Information extracted from the text 10 :
      Processo Nº: 5076666-16.2022.4.02.5101
      Tipo da ação: NULIDADE DE PATENTE DE MODELO DE UTILIDADE com pedido de Antecipação de Tutela
      Tribunal: JUÍZO FEDERAL DA 9ª VF DO RIO DE JANEIRO (TRF2)
      Interessados: M.A. ROSSINI LOPES – ME., ANDRÉ LOPES e INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL ? INPI
      Autor: M.A. ROSSINI LOPES – ME.
      Réus: ANDRÉ LOPES e INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL ? INPI

      Text 11 :
      INPI nº 52402.011451/2022-67 Origem: 25ª Vara Federal do Rio de Janeiro Processo Nº: 5071020-25.2022.4.02.5101/RJ SUBJUDICE com pedido de Antecipação de Tutela Autor: OURO FINO SAUDE ANIMAL LTDA Réu(s): ZOETIS SERVICES LLC e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

      Information extracted from the text 11 :
      Número do processo judicial:
      5071020-25.2022.4.02.5101/RJ
      Tipo da ação: SUBJUDICE com pedido de Antecipação de Tutela
      Tribunal: 25ª Vara Federal do Rio de Janeiro
      Interessados: OURO FINO SAUDE ANIMAL LTDA, ZOETIS SERVICES LLC e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL
      Autor: OURO FINO SAUDE ANIMAL LTDA
      Réus: ZOETIS SERVICES LLC e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

      Text 12 :
      INPI nº 52402.011991/2022-41 Origem: 22ª VARA FEDERAL CÍVEL DA SJDF (TRF1) Processo Nº: 1047948-66.2022.4.01.3400 AÇÃO DE PROCEDIMENTO COMUM Autor: EUSA Pharma (UK) Limited Réu(s): INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

      Information extracted from the text 12 :
      Processo Nº:
      1047948-66.2022.4.01.3400
      Tipo da Ação: Ação de Procedimento Comum
      Tribunal: 22ª Vara Federal Cível da SJDF (TRF1)
      Interessados: EUSA Pharma (UK) Limited e Instituto Nacional da Propriedade Industrial
      Autor: EUSA Pharma (UK) Limited
      Réus: Instituto Nacional da Propriedade Industrial

      Text 13 :
      INPI nº 52402.010443/2022-01 Origem: 25ª Vara Federal do Rio de Janeiro Processo Nº: 5052162-43.2022.4.02.5101/RJ NULIDADE DA PATENTE DE INVENÇÃO com pedido de Antecipação de Tutela Autor: KOMATSU BRASIL INTERNATIONAL LTDA Réu(s): ESCO GROUP LLC e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

      Information extracted from the text 13 :
      Número do processo judicial:
      5052162-43.2022.4.02.5101/RJ
      Tipo da ação: NULIDADE DA PATENTE DE INVENÇÃO com pedido de Antecipação de Tutela
      Tribunal: 25ª Vara Federal do Rio de Janeiro
      Interessados: KOMATSU BRASIL INTERNATIONAL LTDA, ESCO GROUP LLC e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL
      Autor: KOMATSU BRASIL INTERNATIONAL LTDA
      Réus: ESCO GROUP LLC e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

      Text 14 :
      INPI nº 52402.013111/2022-71 Origem: a 22ª VARA CÍVEL FEDERAL DE SÃO PAULO Processo Nº: 5007277-58.2021.4.03.6100 SUBJUDICE com pedido de Antecipação de Tutela Autor: SYNGENTA SEEDS LTDA, SYNGENTA PARTICIPATIONS AG Réu(s): SEMPRE SEMENTES EIRELI, MINISTÉRIO DA AGRICULTURA, PECUÁRIA E ABASTECIMENTO ? MAPA, INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

      Information extracted from the text 14 :
      Processo nº
      5007277-58.2021.4.03.6100
      Tipo da Ação: Pedido de Antecipação de Tutela
      Tribunal: 22ª Vara Cível Federal de São Paulo
      Interessados: Syngenta Seeds Ltda., Syngenta Participations AG, Sempre Sementes Eireli, Ministério da Agricultura, Pecuária e Abastecimento ? MAPA, Instituto Nacional da Propriedade Industrial
      Autor: Syngenta Seeds Ltda., Syngenta Participations AG
      Réus: Sempre Sementes Eireli, Ministério da Agricultura, Pecuária e Abastecimento ? MAPA, Instituto Nacional da Propriedade Industrial

      Text 15 :
      “INPI nº 52402.011780/2022-16 Origem: 13ª Vara Federal do Rio de Janeiro Processo Nº: 5047067-32.2022.4.02.5101/RJ NULIDADE DA PATENTE DE INVENÇÃO Autor: COMPANHIA NITRO QUÍMICA BRASILEIRA Réu(s): ICL AMERICA DO SUL S.A. (nova denominação de COMPASS MINERALS AMÉRICA
      DO SUL INDÚSTRIA E COMÉRCIO LTDA.) e Instituto Nacional da Propriedade Industrial ? INPI”

      Information extracted from the text 15 :
      Processo Nº:
      5047067-32.2022.4.02.5101/RJ
      Tipo da ação: NULIDADE DA PATENTE DE INVENÇÃO
      Tribunal: 13ª Vara Federal do Rio de Janeiro
      Interessados: Companhia Nitro Química Brasileira, ICL America do Sul S.A. (nova denominação de Compass Minerals América do Sul Indústria e Comércio Ltda.) e Instituto Nacional da Propriedade Industrial (INPI).
      Autor: Companhia Nitro Química Brasileira
      Réus: ICL America do Sul S.A. (nova denominação de Compass Minerals América do Sul Indústria e Comércio Ltda.) e Instituto Nacional da Propriedade Industrial (INPI).

      Text 16 :
      INPI nº 52402.012620/2022-86 Origem: 1ª Vara Federal de Curitiba Processo Nº: 5061501-95.2022.4.04.7000/PR NULIDADE DA PATENTE DE INVENÇÃO com pedido de Antecipação de Tutela Autor: S. Almeida Eventos Ltda. Réu(s): HOLMES PEDRO GIACOMET JUNIOR E Instituto Nacional da Propriedade Industrial – INPI

      Information extracted from the text 16 :
      Número do processo judicial:
      5061501-95.2022.4.04.7000/PR
      Ação: NULIDADE DA PATENTE DE INVENÇÃO
      Tribunal: 1ª Vara Federal de Curitiba
      Interessados: S. Almeida Eventos Ltda., HOLMES PEDRO GIACOMET JUNIOR E Instituto Nacional da Propriedade Industrial – INPI
      Autor: S. Almeida Eventos Ltda.
      Réus: HOLMES PEDRO GIACOMET JUNIOR E Instituto Nacional da Propriedade Industrial – INPI

      Text 17 :
      INPI nº 52402.012852/2022-34 Origem: 2ª Vara Federal de Blumenau Processo Nº: 5021248-32.2022.4.04.7205 NULIDADE DA PATENTE DE INVENÇÃO Autor: PRATIMIX INDUSTRIA E COMERCIO DE ACESSORIOS ELETRICOS E HIDRAULICOS LTDA, Réu(s): LORENZETTI SA INDÚSTRIAS BRASILEIRAS ELETROMETALURGICAS e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

      Information extracted from the text 17 :
      Processo Nº:
      5021248-32.2022.4.04.7205
      Tipo da ação: NULIDADE DA PATENTE DE INVENÇÃO
      Tribunal: 2ª Vara Federal de Blumenau
      Interessados: PRATIMIX INDUSTRIA E COMERCIO DE ACESSORIOS ELETRICOS E HIDRAULICOS LTDA, LORENZETTI SA INDÚSTRIAS BRASILEIRAS ELETROMETALURGICAS e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL
      Autor: PRATIMIX INDUSTRIA E COMERCIO DE ACESSORIOS ELETRICOS E HIDRAULICOS LTDA,
      Réus: LORENZETTI SA INDÚSTRIAS BRASILEIRAS ELETROMETALURGICAS e INPI-INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

      Text 18 :
      INPI nº 52402.009647/2022-91 Origem: 31ª Vara Federal do Rio de Janeiro Processo Nº: 5059924-13.2022.4.02.5101 SUBJUDICE com pedido de Antecipação de Tutela Autor: EMERSON CORDEIRO DE OLIVEIRA Réu(s): MODULARE BRASIL ARTEFATOS PLÁSTICOS LTDA, MARIANAAZAMBUJA SOARES MUNARI e INSTITUTO NACIONAL DA PROPRIEDADEINDUSTRIAL ? INPI.

      Information extracted from the text 18 :
      Número do processo judicial:
      5059924-13.2022.4.02.5101
      Tipo da ação: SUBJUDICE com pedido de Antecipação de Tutela
      Tribunal: 31ª Vara Federal do Rio de Janeiro
      Interessados: Emerson Cordeiro de Oliveira, Modulare Brasil Artefatos Plásticos Ltda., Mariana Azambuja Soares Munari e Instituto Nacional da Propriedade Industrial – INPI.
      Autor: Emerson Cordeiro de Oliveira
      Réus: Modulare Brasil Artefatos Plásticos Ltda., Mariana Azambuja Soares Munari e Instituto Nacional da Propriedade Industrial – INPI.

      Text 19 :
      INPI nº 52402.011352/2022-85 Origem: JUÍZO FEDERAL DA 25ª VF DO RIO DE JANEIRO (TRF2) Processo Nº: 5036388-70.2022.4.02.5101 NULIDADE DA PATENTE DE INVENÇÃO com pedido de Antecipação de Tutela Autor: FALCON DISTRIBUICAO, ARMAZENAMENTO E TRANSPORTES S.A. Réu(s): DRYLOCK TECHNOLOGIES NV e INPI – INSTITUTO NACIONAL DA PROPRIEDADE INDUSTRIAL

      Information extracted from the text 19 :
      Número do processo judicial:
      5036388-70.2022.4.02.5101
      Tipo da ação: Nulidade da patente de invenção com pedido de antecipação de tutela
      Tribunal: Juízo Federal da 25ª VF do Rio de Janeiro (TRF2)
      Interessados: Falcon Distribuição, Armazenamento e Transportes S.A.
      Autor: Falcon Distribuição, Armazenamento e Transportes S.A.
      Réus: Drylock Technologies NV e INPI – Instituto Nacional da Propriedade Industrial.

      Text 20 :
      INPI nº 52402.011631/2022-49Origem: Justiça Federal – 9ª Vara Federal do Rio de JaneiroProcedimento Comum nº 5098088-81.2021.4.02.5101/RJembargos de declaração opostos contra alegado equívoco na decisão proferidaautor: Adama Brasil S/Aréus: United Phosphorus Limited e INPI – Instituto Nacional da Propriedade Industrial

      Information extracted from the text 20 :
      Processo judicial:
      5098088-81.2021.4.02.5101/RJ
      Tipo da ação: Embargos de Declaração
      Tribunal: Justiça Federal – 9ª Vara Federal do Rio de Janeiro
      Interessados: Adama Brasil S/A, United Phosphorus Limited e INPI – Instituto Nacional da Propriedade Industrial
      Autor: Adama Brasil S/A
      Réus: United Phosphorus Limited e INPI – Instituto Nacional da Propriedade Industrial

      As you can see, with a few instructions, you can easily and quickly perform many tasks efficiently compared to the traditional way of using an algorithm created by you in a programming language. By the traditional method, you would need to consider all the variations in the text.

      You can test your experiment direct from Open AI Playground.

      Screen extract from Open AI Playground

      Conclusion

      More and more, these AI models are getting more advanced. This example we used was done using GPT-3. The Open AI is working in GPT-4. The Open AI GPT-4 is considerably larger than the GPT-3 in terms of parameters and performance. While the GPT-3 only has 8 million parameters, the GPT-4 has 1.5 billion. This increase in size allows the GPT-4 to learn much faster and achieve better results on various tasks.

      Follow the information about new Novembre 2022 update in GPT-3:

      https://arstechnica.com/information-technology/2022/11/openai-conquers-rhyming-poetry-with-new-gpt-3-update/

      Follow articles about the new GPT-4 still being created by Open AI:

      https://towardsdatascience.com/gpt-4-is-coming-soon-heres-what-we-know-about-it-64db058cfd45
      https://www.datacamp.com/blog/what-we-know-gpt4
      https://www.technologyreview.com/2022/11/30/1063878/openai-still-fixing-gpt3-ai-large-language-model/

      That’s it for today!

      OpenAI Whisper – The Future of Conversational AI

      OpenAI Whisper is a new artificial intelligence system that can achieves human level performance in speech recognition. This system was developed by OpenAI, an artificial intelligence research lab. The goal of this system is to improve the quality of speech-to-text systems. With a 1.6 billion parameters AI model that can transcribe and translate speech audio from 97 languages. Whisper was trained on 680,000 hours of audio data collected from the web and showed robust zero-shot performance on a wide range of automated speech recognition (ASR) tasks. This will benefit many applications, such as virtual assistants, smart speakers, and more.

      This video can help you understand the benefits of the Whisper.

      OpenAI introduced Whisper on September 21, 2022, in this article. This will accelerate the use of artificial intelligence in applications that need to make use of technology. Here are some examples:

      You record in any language, and the API extracts the text.

      Click on the image to open the app

      In this example, the API extracts text from a YouTube video.

      Click on the image to open the app

      Let’s experiment using the OpenAI Whisper API in Python to extract the text from the YouTube video.

      Python
      # Author: Lawrence Teixeira
      # Date: 02/11/2022
      
      # Requirements to run this script:
      #pip install git+https://github.com/openai/whisper.git
      #pip install pytube
      
      # import the necessary packages
      import pytube as pt
      import whisper
      
      # download mp3 from youtube video (Indroductrion to Whisper: The speech recognition)
      yt = pt.YouTube("https://www.youtube.com/watch?v=Bf6Z5bjlHcI")
      stream = yt.streams.filter(only_audio=True)[0]
      stream.download(filename="audio.mp3")
      
      # load the model
      model = whisper.load_model("medium")
      
      # transcribe the audio file
      result = model.transcribe("audio.mp3")
      
      # print the text extracted from the video
      print(result["text"])

      Text extracted from the video “Introduction to Whisper: The speech recognition.”

      “Whisper is an open source deep learning model for speech recognition that was released by Oppenai last week. Oppenai’s tests of Whisper show that it can do a good job of transcribing not just English audio, but also audio in a number of other languages. Developers and researchers who have worked with Whisper and seen what it can do are also impressed by it. But the release of Whisper may be just as important for what it tells us about how artificial intelligence AI research is changing, and what kinds of applications we can expect in the future. Whisper from Oppenai is open to all kinds of data. One of the most important things about Whisper is that it was trained with many different kinds of data. Whisper was trained on 680,000 hours of data from the web that was supervised by people who spoke different languages and did different tasks. A third of the training data is made up of audio examples that are not in English. Whisper can reliably transcribe English speech and perform at a state-of-the-art level with about 10 languages, an Oppenai representative told VentraBeat in written comments. It can also translate from those languages into English. Even though the lab’s analysis of languages other than English isn’t complete, people who have used it say it gives good results. Again, the AI research community has become more interested in different kinds of data. This year, Bloom was the first language model to work with 59 different languages. Meta is also working on a model that can translate between 200 different languages. By moving toward more data and language diversity, more people will be able to use and benefit from deep learning’s progress. Make your own test since Whisper is open source. Developers and users can choose to run it on their laptop, desktop workstation, mobile device, or a cloud server. OpenAI made Whisper in five different sizes. Each size traded accuracy for speed in a proportional way, with the smallest model being about 60 times faster than the largest. Developers who have used Whisper and seen what it can do are happy with it, and it can make cloud-based ASR services, which have been the main choice until now, less appealing. And Lobs expert Noah Giff told VentraBeat, At first glance, Whisper seems to be much more accurate than other SaaS products. Since it is free and can be programmed, it will probably be a very big problem for services that only do transcription. Whisper was released as an open source model that was already trained, and that anyone can download and run on any computer platform they want. In the past few months, commercial AI research labs have been moving in the direction of being more open to the public. You can make your own apps. There are already a number of ways to make it easier for people who don’t know how to set up and run machine learning models to use Whisper. One example is a project by journalist Peter Stern and GitHub engineer Christina Warren to make a free, secure, and easy to use transcription app for journalists based on Whisper. In the cloud, open source models like Whisper are making new things possible. Platforms like Hugging Face are used by developers to host Whisper and make it accessible through API calls. Jeff Bootyer, growth and product manager at Hugging Face, told VentraBeat, It takes a company 10 minutes to create their own transcription service powered by Whisper and start transcribing calls or audio content, even at a large scale. Hugging Face already has a number of services based on Whisper, such as an app that translates YouTube videos. Or, you can tweak existing apps to fit your needs. And fine-tuning, which is the process of taking a model that has already been trained and making it work best for a new application, is another benefit of open source models like Whisper. For example, Whisper can be tweaked to make ASR work better in a language that the current model doesn’t do as well with. Or, it can be tweaked to understand medical or technical terms better. Another interesting idea would be to fine-tune the model for tasks other than ASR, like verifying the speaker, finding sound events, and finding keywords. Hugging Face’s technical lead, Philip Schmidt, told VentraBeat that people have already told them that Whisper can be used as a plug-and-play service to get better results than before. When you put this together with fine-tuning the model, the performance will get even better. Fine-tuning for languages that were not well represented in the pre-training dataset can make a big difference in how well the system works.”

      As you can see, the text is exactly what was spoken. Note that in this example, we use the intermediate model. Here are the models that we can use to increase the accuracy.

      Available models and languages

      There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.

      For English-only applications, the .en models tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models.

      Whisper’s performance varies widely depending on the language. The figure below shows a WER breakdown by languages of Fleur’s dataset using the large model. More WER and BLEU scores corresponding to the other models and datasets can be found in Appendix D of the paper.

      The image is taken from the official Whisper documentation.

      Conclusion: Although there is still some controversy around how well AI Whisper works, the concept behind it is something to think about. With more and more businesses moving towards automated marketing and customer service, AI Whisper could be a valuable tool for those looking to get ahead in the industry. Have you tried using AI Whisper or any other similar tools? Let us know in the comments!

      Follow the official Whisper references:

      Project link: https://openai.com/blog/whisper/
      Code: https://github.com/openai/whisper

      That’s it for today!