Open Interpreter’s ChatGPT-Like Interface: The Future of Natural Language Code Execution

There’s an ever-increasing need for practical, adaptable, and user-friendly tools in the dynamic world of software development and programming. Developers are looking for solutions that allow them to operate effortlessly in their local development setups without being hampered by the confines of online hosted platforms. This article delves into a revolutionary answer to these demands – Open Interpreter.

While the Code Interpreter from OpenAI, based on the GPT-4 architecture, is undeniably transformative, it has limitations. Being a hosted solution, OpenAI’s offering runs in a distant, regulated environment, restricting developers from harnessing the full capabilities of their local configurations. Among the challenges faced are no access to the internet, a confined list of pre-loaded packages, a cap on uploads at 100 MB, and a runtime restriction of 120 seconds. Moreover, if the session terminates, any produced files or links vanish, creating obstacles for extended assignments.

With Open Interpreter, developers can fully utilize their computer’s resources, side-stepping the limitations of online services.

Understanding Open Interpreter

At its core, Open Interpreter is a bridge between human language and computer code. Instead of manually writing out every line of code, users can describe their intentions in natural language, and Open Interpreter translates that into executable code. This offers an entirely new programming approach, simplifying and streamlining the process.

Features of Open Interpreter

  1. Code Interpretation from Natural Language: The primary selling point is that this feature allows users to describe a task in English, and the tool will convert it into executable code. It significantly reduces barriers to programming for non-developers.
  2. ChatGPT-Like Interface: With a user-friendly, chatbot-like interface, users can have interactive programming sessions, making the coding experience more intuitive.
  3. Data Handling Capabilities: Beyond simple code interpretation, Open Interpreter is a powerhouse for data-related tasks. It provides a comprehensive toolset for data professionals, from data visualization to in-depth analysis.
  4. Open-Source Nature: Open Interpreter is a gift to the developer community. As an open-source project, developers can contribute to its growth, refining its features and expanding its capabilities.
  5. Local Execution: In an age where data privacy is paramount, Open Interpreter allows for local code execution, ensuring that sensitive data remains on the user’s machine.

What can I do with Open Interpreter?

  1. Educational Purposes: This can revolutionize teaching programming and other computer-related tasks. Students can communicate with their machines more intuitively, eliminating the steep learning curve associated with traditional programming.
  2. Data Science & Analysis: As it can plot, clean, and analyze large datasets, researchers and data scientists can streamline their workflow using natural language to manipulate and visualize their data.
  3. Automation: For individuals who aren’t traditionally programmers, Open Interpreter might offer a more accessible avenue for automating tasks on their computers, from simple file manipulations to more complex functions like web scraping.
  4. Content Creation: The ability to create and edit photos, videos, and PDFs using natural language commands could revolutionize how content creators work. Imagine telling your computer how to edit a video or photo without manually going through the process.
  5. Integration with Other Tools: Mention of it working with Code Llama or GPT-4 by using an OpenAI API key suggests that it can be integrated with other tools, increasing its utility.
  6. Accessibility: For people with disabilities, communicating with a computer using natural language could make computing more accessible, especially if paired with voice recognition software.
  7. Security: As with any powerful tool, there are potential risks. Running code or commands without understanding the implications can lead to unintended consequences. Users should be educated about the risks, especially when executing commands that can modify their system or data.
  8. Open Source Community: The rapid growth in stars on GitHub indicates a strong community interest. This suggests that developers worldwide might contribute to its development, making the tool even more robust and versatile.

How do you install and use the Open Interpreter?

At this moment, you have to install it inside a Python environment.

  1. First, you have to install Python; in my case, I have Python 3.11.2 installed.
  2. In this step, it’s not necessary, but I recommend you install the Visual Studio Code. Press shift+control+p and create a Python virtual environment.
  3. I created the “requirements.txt file” with the ChatGPT Code Interpreter libraries.
Python
pandas
numpy
scipy
xarray
#Machine Learning:
scikit-learn
xgboost
keras
torch
#Natural Language Processing:
nltk
spacy
textblob
gensim
#Data Visualization:
matplotlib
seaborn
plotly
bokeh
#Web Scraping and Network:
requests
urllib3
aiohttp
beautifulsoup4
#Deep Learning:
keras
torch
theano
#Image Processing:
pillow
imageio
opencv-python
scikit-image
#Audio Processing:
librosa
pyaudio
soundfile
#File Format:
openpyxl
xlrd
pyPDF2
python-docx
#Database:
sqlalchemy
psycopg2
mysql-connector-python
#Web Development:
flask
django
tornado
quart
#Others:
pytest
joblib
pytz
pyyaml

python-docx
pdfkit
wkhtmltopdf
pandoc 
pdfplumber  
reportlab
open-interpreter

Type pip install -r requirements.txt to install the most used Python libraries.

PowerShell
pip install -r requirements.txt

After that, install the Open Interpreter.

PowerShell
pip install open-interpreter

After installation, run interpreter:

PowerShell
interpreter -y

Now, you are free to use and test the Code Interpreter. If you want more detailed information, go to the GitHub oficial page.

Insert the prompt:

create a csv file with a year from 2000 to 2023 and total of sales amount column with ramdom numbers

Note that a file called sales_data.csv was automatically created in the folder.

Now type this prompt: create a line visual to show this sales_data.csv and plot on the screen.

You can do whatever you want. You also can create a code to use Open Interpreter, like the example below.

Python
import interpreter

interpreter.chat("Plot AAPL and META's normalized stock prices") # Executes a single command
interpreter.chat() # Starts an interactive chat

Type py teste.py to execute this code.

Python
PS D:\interpreter> & d:/interpreter/.venv/Scripts/Activate.ps1
(.venv) PS D:\interpreter> py teste.py

▌ Model set to GPT-4
Tip: To run locally, use interpreter --local

Open Interpreter will require approval before running code. Use interpreter -y to bypass this.

Press CTRL-C to exit.

  To plot AAPL and META's normalized stock prices, we will need to follow these steps:

   1 Install necessary packages. We will need pandas, yfinance, and matplotlib for Python.
   2 Download the stock prices for AAPL and META using yfinance.
   3 Normalize the stock prices.
   4 Plot the normalized stock prices.

  Let's start with the first step, which is installing the necessary packages.

  import pkg_resources                                                                                                                                                          
                                                                                                                                                                                
  REQUIRED_PACKAGES = ['pandas', 'yfinance', 'matplotlib']                                                                                                                      
                                                                                                                                                                                
  for package in REQUIRED_PACKAGES:                                                                                                                                             
      try:                                                                                                                                                                      
          dist = pkg_resources.get_distribution(package)                                                                                                                        
          print('{} ({}) is installed'.format(dist.key, dist.version))                                                                                                          
      except pkg_resources.DistributionNotFound:                                                                                                                                
          print('{} is NOT installed'.format(package))                                                                                                                          

  import pkg_resources                                                                                                                                                          
                                                                                                                                                                                
  REQUIRED_PACKAGES = ['pandas', 'yfinance', 'matplotlib']                                                                                                                      
                                                                                                                                                                                
  for package in REQUIRED_PACKAGES:                                                                                                                                             
          dist = pkg_resources.get_distribution(package)                                                                                                                        
          print('{} ({}) is installed'.format(dist.key, dist.version))                                                                                                          
      except pkg_resources.DistributionNotFound:                                                                                                                                
          print('{} is NOT installed'.format(package))                                                                                                                          


  pandas (2.1.0) is installed
  yfinance (0.2.28) is installed
  matplotlib (3.8.0) is installed

  Great! All the necessary packages are already installed.

  Now, let's move on to the next step: downloading the stock prices for AAPL (Apple Inc.) and META (Meta Platforms Inc., formerly Facebook Inc.) using the yfinance package.    
  We will download the last 5 years of daily closing prices.

  import yfinance as yf                                                                                                                                                         
                                                                                                                                                                                
  # Define the ticker symbols                                                                                                                                                   
  symbols = ['AAPL', 'META']                                                                                                                                                    
                                                                                                                                                                                
  # Download the stock data                                                                                                                                                     
  stock_data = yf.download(symbols, start='2017-01-01', end='2022-12-31')['Close']                                                                                              
                                                                                                                                                                                
  # Print the first few rows of the data                                                                                                                                        
  stock_data.head()                                                                                                                                                             

  import yfinance as yf                                                                                                                                                         
                                                                                                                                                                                
  # Define the ticker symbols                                                                                                                                                   
  symbols = ['AAPL', 'META']                                                                                                                                                    
                                                                                                                                                                                
  stock_data = yf.download(symbols, start='2017-01-01', end='2022-12-31')['Close']                                                                                              
                                                                                                                                                                                
  # Print the first few rows of the data                                                                                                                                        
  stock_data.head()                                                                                                                                                             

  [                       0%%                      ]
  [*********************100%%**********************]  2 of 2 completed
  AAPL        META
  Date
  2017-01-03  29.037500  116.860001
  2017-01-04  29.004999  118.690002
  2017-01-05  29.152500  120.669998
  2017-01-06  29.477501  123.410004
  2017-01-09  29.747499  124.900002

  We have successfully downloaded the stock prices for AAPL and META.

  The next step is to normalize these stock prices. Normalizing the stock prices allows us to compare the two stocks on the same scale. We will normalize the prices such that  
  the first price is 1 and all subsequent prices are relative to the first price.

  normalized_data = stock_data / stock_data.iloc[0]                                                                                                                             
                                                                                                                                                                                
  # Print the first few rows of the normalized data                                                                                                                             
  normalized_data.head()                                                                                                                                                        

  normalized_data = stock_data / stock_data.iloc[0]                                                                                                                             
                                                                                                                                                                                
  # Print the first few rows of the normalized data                                                                                                                             
  normalized_data.head()                                                                                                                                                        

  AAPL      META
  Date
  2017-01-03  1.000000  1.000000
  2017-01-04  0.998881  1.015660
  2017-01-05  1.003960  1.032603
  2017-01-06  1.015153  1.056050


  The stock prices for AAPL and META have been successfully normalized.

  Now, let's move on to the final step: plotting the normalized stock prices. We will use the matplotlib package for this.

  import matplotlib.pyplot as plt                                                                                                                                               
                                                                                                                                                                                
  # Create a new figure                                                                                                                                                         
  plt.figure(figsize=(14, 7))                                                                                                                                                   
                                                                                                                                                                                
  # Plot the normalized stock prices                                                                                                                                            
  for symbol in symbols:                                                                                                                                                        
      plt.plot(normalized_data.index, normalized_data[symbol], label=symbol)                                                                                                    
                                                                                                                                                                                
  # Add a title and labels                                                                                                                                                      
  plt.title('Normalized Stock Prices: AAPL vs META')                                                                                                                            
  plt.xlabel('Date')                                                                                                                                                            
  plt.ylabel('Normalized Price')                                                                                                                                                
  plt.legend()                                                                                                                                                                  
                                                                                                                                                                                
  # Save the plot as an image                                                                                                                                                   
  plt.savefig('normalized_stock_prices.png')                                                                                                                                    
                                                                                                                                                                                
  # Close the plot                                                                                                                                                              
  plt.close()                                                                                                                                                                   

  import matplotlib.pyplot as plt                                                                                                                                               
                                                                                                                                                                                
  # Create a new figure                                                                                                                                                         
  plt.figure(figsize=(14, 7))                                                                                                                                                   
                                                                                                                                                                                
  # Plot the normalized stock prices                                                                                                                                            
  for symbol in symbols:                                                                                                                                                        
      plt.plot(normalized_data.index, normalized_data[symbol], label=symbol)                                                                                                    
                                                                                                                                                                                
  # Add a title and labels                                                                                                                                                      
  plt.title('Normalized Stock Prices: AAPL vs META')                                                                                                                            
  plt.xlabel('Date')                                                                                                                                                            
  plt.ylabel('Normalized Price')                                                                                                                                                
  plt.legend()                                                                                                                                                                  
                                                                                                                                                                                
  # Save the plot as an image                                                                                                                                                   
  plt.savefig('normalized_stock_prices.png')                                                                                                                                    
                                                                                                                                                                                
  # Close the plot                                                                                                                                                              
  plt.close()█                                                                                                                                                                  

  <Figure size 1400x700 with 0 Axes>
  [<matplotlib.lines.Line2D object at 0x0000024DE75D0B90>]
  [<matplotlib.lines.Line2D object at 0x0000024DE7604990>]
  Text(0.5, 1.0, 'Normalized Stock Prices: AAPL vs META')
  Text(0.5, 0, 'Date')
  Text(0, 0.5, 'Normalized Price')
  <matplotlib.legend.Legend object at 0x0000024DE75DAB50>

Let’s try asking for a search on the internet: What are the last 10 BBC news headlines?

Python
(.venv) PS D:\interpreter> interpreter -y
> What are the last 10 BBC news headlines?

  To get the last 10 BBC news headlines, we can use web scraping techniques. Python has several libraries for this purpose, such as BeautifulSoup and requests. Here is the     
  plan:

   1 Install necessary Python packages (requests and BeautifulSoup) if they are not already installed.
   2 Fetch the BBC News homepage HTML using requests.
   3 Parse the HTML using BeautifulSoup to find the news headlines.
   4 Extract and return the last 10 news headlines.

  Let's start with the first step, which is to check if the necessary Python packages are installed. If not, we will install them.



  import pkg_resources                                                                                                                                                          
                                                                                                                                                                                
  REQUIRED_PACKAGES = ['beautifulsoup4', 'requests']                                                                                                                            
                                                                                                                                                                                
  for package in REQUIRED_PACKAGES:                                                                                                                                             
      try:                                                                                                                                                                      
          dist = pkg_resources.get_distribution(package)                                                                                                                        
          print('{} ({}) is installed'.format(dist.key, dist.version))                                                                                                          
      except pkg_resources.DistributionNotFound:                                                                                                                                
          print('{} is NOT installed'.format(package))                                                                                                                          


  beautifulsoup4 (4.12.2) is installed
  requests (2.31.0) is installed


  Great! Both 'beautifulsoup4' and 'requests' packages are already installed.

  Now, let's move to the next step of the plan: Fetch the BBC News homepage HTML using requests. We will send a GET request to the BBC News homepage URL and print the status   
  of the response to ensure we are able to connect to the page.

  import requests                                                                                                                                                               
                                                                                                                                                                                
  url = 'https://www.bbc.co.uk/news'                                                                                                                                            
  response = requests.get(url)                                                                                                                                                  
                                                                                                                                                                                
  print('Status code:', response.status_code)                                                                                                                                   

  Status code: 200


  The status code 200 indicates that we successfully fetched the HTML of the BBC News homepage.

  Now, let's move to the next step of the plan: Parse the HTML using BeautifulSoup to find the news headlines. We will parse the HTML and find the elements that contain the    
  news headlines. For this, we need to know the structure of the BBC News homepage. The headlines are usually contained in HTML elements with specific class names. We will     
  try to find these elements and print some of them to verify that we are correctly extracting the headlines.

  from bs4 import BeautifulSoup                                                                                                                                                 
                                                                                                                                                                                
  soup = BeautifulSoup(response.text, 'html.parser')                                                                                                                            
                                                                                                                                                                                
  # BBC News headlines are usually in h3 elements with class 'gs-c-promo-heading__title'                                                                                        
                                                                                                                                                                                
  # Print the first 5 headlines to verify                                                                                                                                       
  for headline in headlines[:5]:                                                                                                                                                
      print(headline.text.strip())                                                                                                                                              

  Russell Brand accused of rape and sexual assault
  Russell Brand accused of rape and sexual assault
  A barren wasteland with a lingering smell of death
  Diamonds and diplomacy: How a US envoy fell from grace
  'Everything has just been completely ripped away'


  We have successfully extracted the news headlines from the BBC News homepage. The headlines printed above are a sample of the headlines we found.

  Now, let's move to the final step of the plan: Extract and return the last 10 news headlines. We will extract the text from the last 10 headline elements and return them.    

  # Get the last 10 headlines                                                                                                                                                   
  last_10_headlines = [headline.text.strip() for headline in headlines[-10:]]                                                                                                   
                                                                                                                                                                                
  last_10_headlines                                                                                                                                                             


  ['How old is too old to lead?', 'The companies staying fully remote', "A 'beach towel revolt' sweeping Greece", 'Sexton breaks record as eight-try Ireland beat Tonga',       
  'Wales struggle past Portugal at World Cup', 'Brighton beat Man Utd as fans boo Ten Hag', 'Barcelona crush Betis to go top', 'Sainz on Singapore pole as angry Verstappen     
  struggles', "'We have a good vibe' - Ten Hag avoids Sancho questioning", 'Newcastle beat Brentford to ease pressure on Howe']

  Here are the last 10 BBC news headlines:

    1 'How old is too old to lead?'
    2 'The companies staying fully remote'
    3 "A 'beach towel revolt' sweeping Greece"
    4 'Sexton breaks record as eight-try Ireland beat Tonga'
    5 'Wales struggle past Portugal at World Cup'
    6 'Brighton beat Man Utd as fans boo Ten Hag'
    7 'Barcelona crush Betis to go top'
    8 'Sainz on Singapore pole as angry Verstappen struggles'
    9 "'We have a good vibe' - Ten Hag avoids Sancho questioning"
   10 'Newcastle beat Brentford to ease pressure on Howe'

Can you create a PDF file with these 10 BBC news headlines to finish the test?

Python
> can you crate a pdf file with this 10 BBC news headlines?

  Sure, we can create a PDF file with these headlines. We will use the Python library reportlab for this purpose. Here is the plan:

   1 Install the necessary Python package (reportlab) if it is not already installed.
   2 Create a PDF file using reportlab and add the headlines to it.
   3 Save the PDF file.

  Let's start with the first step, which is to check if the necessary Python package is installed. If not, we will install it.


  REQUIRED_PACKAGES = ['reportlab']                                                                                                                                             
                                                                                                                                                                                
  for package in REQUIRED_PACKAGES:                                                                                                                                             
      try:                                                                                                                                                                      
          dist = pkg_resources.get_distribution(package)                                                                                                                        
          print('{} ({}) is installed'.format(dist.key, dist.version))                                                                                                          
      except pkg_resources.DistributionNotFound:                                                                                                                                
          print('{} is NOT installed'.format(package))                                                                                                                          


  reportlab (4.0.4) is installed


  Great! The 'reportlab' package is already installed.

  Now, let's move to the next step of the plan: Create a PDF file using reportlab and add the headlines to it. We will create a new PDF file, add each headline as a separate   
  paragraph, and then save the PDF file.



  from reportlab.lib.pagesizes import letter                                                                                                                                    
  from reportlab.pdfgen import canvas                                                                                                                                           
  from reportlab.lib.units import inch                                                                                                                                          
                                                                                                                                                                                
  # Create a new PDF file                                                                                                                                                       
  c = canvas.Canvas('BBC_headlines.pdf', pagesize=letter)                                                                                                                       
                                                                                                                                                                                
  # Set the font and size                                                                                                                                                       
  c.setFont('Helvetica', 12)                                                                                                                                                    
                                                                                                                                                                                
  # Add each headline as a separate paragraph                                                                                                                                   
  y = 750  # Start from the top of the page                                                                                                                                     
  for i, headline in enumerate(last_10_headlines, 1):                                                                                                                           
      c.drawString(50, y, f'{i}. {headline}')                                                                                                                                   
      y -= 25  # Move down for the next headline                                                                                                                                
                                                                                                                                                                                
  # Save the PDF file                                                                                                                                                           
  c.save()                                                                                                                                                                      

  The PDF file 'BBC_headlines.pdf' with the last 10 BBC news headlines has been created successfully. You can find it in the current working directory (D:\interpreter).        

It’s fantastic. Can you imagine how many things you can do with this?

They also have a Google Colab version with many examples of use.

I created a GitHub repository with this project. If you want, you can click here to download or clone in this hyperlink: https://github.com/LawrenceTeixeira/OpenInterpreter.git

Follow bellow the official Open Interpreter website.

https://openinterpreter.com/

Open Interpreter 🖥️ ChatGPT Code Interpreter You Can Run LOCALLY!

How Does Open Interpreter Stand Out?

While numerous code interpreters are available, Open Interpreter sets itself apart in its approach to bridging human language and computer code. The emphasis on natural language processing combined with the power of local execution ensures that it’s both accessible to newcomers and powerful enough for seasoned developers.

Conclusion

Open Interpreter stands as a monumental advancement in local development platforms. Its superior features equip developers to operate productively and with adaptability, unfettered by the confines of online services. With Open Interpreter by your side, step into the next era of software development and discover unparalleled avenues in your local development odyssey. It is more than just a tool; it’s a step forward in the evolution of programming. Combining the powers of natural language processing, AI, and code interpretation paves the way for a more inclusive and streamlined coding environment. As the world embraces digital solutions, tools like Open Interpreter make technology accessible to everyone.

That’s it for today!

Sources:

The Open Interpreter Project

KillianLucas/open-interpreter: OpenAI’s Code Interpreter in your terminal, running locally (github.com)

https://beebom.com/how-run-open-interpreter-run-chatgpt-code-interpreter-locally/

https://openaimaster.com/what-is-open-interpreter/

Unlocking the Power of GPT-4: An Introduction to Free ChatBot AI

As technology advances at an unprecedented rate, there is an increasing demand for tools that can simplify, innovate, and augment human experiences. One such technological marvel is ChatGPT, based on OpenAI’s GPT-4 architecture. With the introduction of the Free ChatBot AI, a complimentary version of ChatGPT, everyone can now experience the power of conversational AI without breaking the bank. In this post, we will explore what this free ChatBot AI is and how you can harness its capabilities.

What is Free ChatBot AI?

Free ChatBot AI is a conversational model based on OpenAI’s cutting-edge GPT-4 technology. It retains much of the paid version’s functionality, accuracy, and contextual understanding but is offered without a price tag. The motivation behind introducing a free version was to democratize access to AI, ensuring that businesses, developers, students, and hobbyists alike can taste what state-of-the-art AI conversational models can achieve.

How to use Free ChatBot AI?

Using Free ChatBot AI is a straightforward process:

Access: Navigate to the official website of the Free ChatBot AI version. It’s not necessary to create a login to use.

Prompt: Start by entering a prompt or a question. For instance, you might type, “Tell me a fun fact about dolphins.” The more specific and clear your prompt, the better and more accurate the response you can expect.

Response: After inputting your prompt, the AI will process the information and provide an answer in seconds. Seeing the model generate responses that often feel incredibly human-like is fascinating.

Refinement: If the answer isn’t quite what you expected, you can refine your question or ask follow-up questions to get the desired information.

Begin with any prompt you choose. Let’s try this: “Write a persuasive email to convince potential customers to try our service. My service is IT consulting”.

You can ask Free ChatBot AI to create code. Let’s try this: “Create a Python function that takes in a list of numbers and returns the average, median, and mode of the list. The function should be able to handle large datasets and return the results as variables”.

You can create prompts to ask Free ChatBot AI to act like you want. Let’s try this: click “+ New Prompt” and write, “I want you to act as an English translator, spelling corrector and improver. I will speak to you in any language and you will detect the language, translate it and answer in the corrected and improved version of my text, in English. I want you to replace my simplified A0-level words and sentences with more beautiful and elegant, upper level English words and sentences. Keep the meaning same, but make them more literary. I want you to only reply the correction, the improvements and nothing else, do not write explanations.”

Click save.

Now you have the prompt saved. If you insert “/” in the text bar, the prompt you save will appear. Select one of them and start the prompt.

You can import and export to a file all prompt histories and configurations. I save my prompts to share with you in this link.

You can also search on Google by clicking in the icon below selecting “Google Search”. After that you can ask gpt to create you text.

There are many other options: Clear the conversations, change the theme to light or dark mode, create folders to organize your chats and prompts, and much more.

Conclusion

The Free ChatBot AI, based on the GPT-4 architecture, is a testament to our strides in the AI world. It provides a unique opportunity for many to interact with and benefit from advanced AI without any financial commitment. Whether you’re a curious individual, a student, a developer, or a business owner, this tool can revolutionize how you seek information, automate tasks, or engage with users. Explore the fascinating world of conversational AI with Free ChatBot AI. What do you think about it? I would be happy to hear from you!

What are you waiting for? Go to the Free ChatBot AI app and have fun!

That’s it for today!

Data Wrangler in Microsoft Fabric: A New Tool for Accelerating Data Preparation. Experience the Power Query Feel but with Python Code Output

In the modern digital era, the importance of streamlined data preparation cannot be emphasized enough. For data scientists and analysts, a large portion of time is dedicated to data cleansing and preparation, often termed ‘wrangling.’ Microsoft’s introduction of Data Wrangler in its Fabric suite seems like an answer to this age-old challenge. It promises Power Query’s intuitiveness and Python code outputs’ flexibility. Dive in to uncover the magic of this new tool.

Data preparation is a time-consuming and error-prone task. It often involves cleaning, transforming, and merging data from multiple sources. This can be a daunting task, even for experienced data scientists.

What is Data Wrangler?

Data Wrangler is a state-of-the-art tool Microsoft offers in its Fabric suite explicitly designed for data professionals. At its core, it aims to simplify the data preparation process by automating tedious tasks. Much like Power Query, it offers a user-friendly interface, but what sets it apart is that it can generate Python code as an output. As users interact with the GUI, Python code snippets are generated behind the scenes, making integrating various data science workflows easier.

Advantages of Data Wrangler

  1. User-Friendly Interface: Offers an intuitive GUI for those not comfortable with coding.
  2. Python Code Output: Generates Python code in real-time, allowing flexibility and easy integration.
  3. Time-Saving: Reduces the time spent on data preparation dramatically.
  4. Replicability: Since Python code is generated, it ensures replicable data processing steps.
  5. Integration with Fabric Suite: Can be effortlessly integrated with other tools within the Microsoft Fabric suite.
  6. No-code to Low-code Transition: Ideal for those wanting to transition from a no-code environment to a more code-centric one.

How to use Data Wrangler?

You have to click on Data Science inside the Power BI Service.

You have to select the Notebook button.

You have to insert this code above after the upload of the CSV file in the LakeHouse.

Python
import pandas as pd

# Read a CSV into a Pandas DataFrame from e.g. a public blob store
df = pd.read_csv("/lakehouse/default/Files/Top_1000_Companies_Dataset.csv")

You have to click in the Lauch Data Wrangler and then select the data frame “df”.

On this screen, you can do all transformations you need.

In the end this code will be generate.

Python
# Code generated by Data Wrangler for pandas DataFrame

def clean_data(df):
    # Drop columns: 'company_name', 'url' and 6 other columns
    df = df.drop(columns=['company_name', 'url', 'city', 'state', 'country', 'employees', 'linkedin_url', 'founded'])
    # Drop columns: 'GrowjoRanking', 'Previous Ranking' and 10 other columns
    df = df.drop(columns=['GrowjoRanking', 'Previous Ranking', 'job_openings', 'keywords', 'LeadInvestors', 'Accelerator', 'valuation', 'btype', 'total_funding', 'product_url', 'growth_percentage', 'contact_info'])
    # Drop column: 'indeed_url'
    df = df.drop(columns=['indeed_url'])
    # Performed 1 aggregation grouped on column: 'Industry'
    df = df.groupby(['Industry']).agg(estimated_revenues_sum=('estimated_revenues', 'sum')).reset_index()
    # Sort by column: 'estimated_revenues_sum' (descending)
    df = df.sort_values(['estimated_revenues_sum'], ascending=[False])
    return df

df_clean = clean_data(df.copy())
df_clean.head()

After that, you can create or add to a pipeline or schedule a moment to execute this transformation automatically.

Data Wrangler Extension for Visual Studio Code

Data Wrangler is a code-centric data cleaning tool integrated into VS Code and Jupyter Notebooks. Data Wrangler aims to increase the productivity of data scientists doing data cleaning by providing a rich user interface that automatically generates Pandas code and shows insightful column statistics and visualizations.

This document will cover how to:

  • Install and setup Data Wrangler
  • Launch Data Wrangler from a notebook
  • Use Data Wrangler to explore your data
  • Perform operations on your data
  • Edit and export code for data wrangling to a notebook
  • Troubleshooting and providing feedback

Setting up your environment

  1. If you have not already done so, install Python.
    IMPORTANT: Data Wrangler only supports Python version 3.8 or higher.
  2. Install Visual Studio Code.
  3. Install the Data Wrangler extension for VS Code from the Visual Studio Marketplace. For additional details on installing extensions, see Extension Marketplace. The Data Wrangler extension is named Data Wrangler, and Microsoft publishes it.

When you launch Data Wrangler for the first time, it will ask you which Python kernel you would like to connect to. It will also check your machine and environment to see if any required Python packages are installed (e.g., Pandas).

Here is a list of the required versions for Python and Python packages, along with whether they are automatically installed by Data Wrangler:

NameMinimum required versionAutomatically installed
Python3.8No
pandas0.25.2Yes
regex*2020.11.13Yes

* We use the open-source regex package to be able to use Unicode properties (for example, /\p{Lowercase_Letter}/), which aren’t supported by Python’s built-in regex module (re). Unicode properties make it easier and cleaner to support foreign characters in regular expressions.

If they are not found in your environment, Data Wrangler will attempt to install them for you via pip. If Data Wrangler cannot install dependencies, the easiest workaround is to run the pip install and then relaunch Data Wrangler manually. These dependencies are required for Data Wrangler such that it can generate Python and Pandas code.

Connecting to a Python kernel

There are currently two ways to connect to a Python kernel, as shown in the quick pick below.
image

1. Connect using a local Python interpreter

If this option is selected, the kernel connection is created using the Jupyter and Python extensions. We recommend this option for a simple setup and a quick way to start with Data Wrangler.

2. Connect using Jupyter URL and token

A kernel connection is created using JupyterLab APIs if this option is selected. Note that this option has performance benefits since it bypasses some initialization and kernel discovery processes. However, it will also require separate Jupyter Notebook server user management. We recommend this option generally in two cases: 1) if there are blocking issues in the first method and 2) for power users who would like to reduce the cold-start time of Data Wrangler.

To set up a Jupyter Notebook server and use it with this option, follow the steps below:

  1. Install Jupyter. We recommend installing the accessible version of Anaconda with Jupyter installed. Alternatively, follow the official instructions to install it.
  2. In the appropriate environment (e.g., in an Anaconda prompt if Anaconda is used), launch the server with the following command (replace the jupyter token with your secure token):
    jupyter notebook --no-browser --NotebookApp.token='<your-jupyter-token>'
  3. In Data Wrangler, connect using the address of the spawned server. E.g., http://localhost:8888, and pass in the token used in the previous step. Once configured, this information is cached locally and can automatically be reused for future connections.

Launching Data Wrangler

Once Data Wrangler has been successfully installed, there are 2 ways to launch it in VS Code.

Launching Data Wrangler from a Jupyter Notebook

If you are in a Jupyter Notebook working with Pandas data frames, you’ll now see a “Launch Data Wrangler” button appear after running specific operations on your data frame, such as df.head(). Clicking the button will open a new tab in VS Code with the Data Wrangler interface in a sandboxed environment.

Important note:
We currently only accept the following formats for launching:

  • df
  • df.head()
  • df.tail()

Where df is the name of the data frame variable. The code above should appear at the end of a cell without any comments or other code after it.

image

Launching Data Wrangler directly from a CSV file

You can also launch Data Wrangler directly from a local CSV file. To do so, open any VS Code folder with the CSV dataset you’d like to explore. In the File Explorer panel, right-click the. CSV dataset and click “Open in Data Wrangler.”

image

Using Data Wrangler

image

The Data Wrangler interface is divided into 6 components, described below.

The Quick Insights header lets you quickly see valuable information about each column. Depending on the column’s datatype, Quick Insights will show the distribution of the data, the frequency of data points, and missing and unique values.

The Data Grid gives you a scrollable pane to view your entire dataset. Additionally, when selecting an operation to perform, a preview will be illustrated in the data grid, highlighting the modified columns.

The Operations Panel is where you can search through Data Wrangler’s built-in data operations. The operations are organized by their top-level category.

The Summary Panel shows detailed summary statistics for your dataset or a specific column if one is selected. Depending on the data type, it will show information such as min, max values, datatype of the column, skew, and more.

The Operation History Panel shows a human-readable list of all the operations previously applied in the current Data Wrangling session. It enables users to undo specific operations or edit the most recent operation. Selecting a step will highlight the data grid changes and show the generated code associated with that operation.

The Code Preview section will show the Python and Pandas code that Data Wrangler has generated when an operation is selected. It will remain blank when no operation is selected. The code can even be edited by the user, and the data grid will highlight the effect on the data.

Example: Filtering a column

Let’s go through a simple example using Data Wrangler with the Titanic dataset to filter adult passengers on the ship.

We’ll start by looking at the quick insights of the Age column, and we’ll notice the distribution of the ages and that the minimum age is 0.42. For more information, we can glance at the Summary panel to see that the datatype is a float, along with additional statistics such as the passengers’ mean and median age.

image

To filter for only adult passengers, we can go to the Operation Panel and search for the keyword “Filter” to find the Filter operation. (You can also expand the “Sort and filter” category to find it.)

image

Once we select an operation, we are brought into the Operation Preview state, where parameters can be modified to see how they affect the underlying dataset before applying the operation. In this example, we want to filter the dataset only to include adults, so we’ll want to filter the Age column to only include values greater than or equal to 18.

image

Once the parameters are entered in the operation panel, we can see a preview of what will happen to the data. We’ll notice that the minimum value in age is now 18 in the Quick Insights, along with a visual preview of the rows that are being removed, highlighted in red. Finally, we’ll also notice the Code Preview section automatically shows the code that Data Wrangler produced to execute this Filter operation. We can edit this code by changing the filtered age to 21, and the data grid will automatically update accordingly.

After confirming that the operation has the intended effect, we can click Apply.

Editing and exporting code

Each step of the generated code can be modified. Changes to the data will be highlighted in the grid view as you make changes.

Once you’re done with your data cleaning steps in Data Wrangler, there are 3 ways to export your cleaned dataset from Data Wrangler.

  1. Export code back to Notebook and exit: This creates a new cell in your Jupyter Notebook with all the data cleaning code you generated packaged into a clean Python function.
  2. Export data as CSV: This saves the cleaned dataset as a new CSV file onto your machine.
  3. Copy code to clipboard: This copies all the code generated by Data Wrangler for the data cleaning operations.
image

Note: If you launched Data Wrangler directly from a CSV, the first export option will be to export the code into a new Jupyter Notebook.

Data Wrangler operations

These are the Data Wrangler operations currently supported in the initial launch of Data Wrangler (with many more to be added soon).

OperationDescription
Sort valuesSort column(s) ascending or descending
FilterFilter rows based on one or more conditions
Calculate text lengthCreate new column with values equal to the length of each string value in a text column
One-hot encodeSplit categorical data into a new column for each category
Multi-label binarizerSplit categorical data into a new column for each category using a delimiter
Create column from formulaCreate a column using a custom Python formula
Change column typeChange the data type of a column
Drop columnDelete one or more columns
Select columnChoose one or more columns to keep and delete the rest
Rename columnRename one or more columns
Drop missing valuesRemove rows with missing values
Drop duplicate rowsDrops all rows that have duplicate values in one or more columns
Fill missing valuesReplace cells with missing values with a new value
Find and replaceSplit a column into several columns based on a user-defined delimiter
Group by column and aggregateGroup by columns and aggregate results
Strip whitespaceCapitalize the first character of a string with the option to apply to all words.
Split textRemove whitespace from the beginning and end of the text
Convert text to capital caseAutomatically create a column when a pattern is detected from your examples.
Convert text to lowercaseConvert text to lowercase
Convert text to uppercaseConvert text to UPPERCASE
String transform by exampleAutomatically perform string transformations when a pattern is detected from the examples you provide
DateTime formatting by exampleAutomatically perform DateTime formatting when a pattern is detected from the examples you provide
New column by exampleAutomatically create a column when a pattern is detected from the examples you provide.
Scale min/max valuesScale a numerical column between a minimum and maximum value
Custom operationAutomatically create a new column based on examples and the derivation of existing column(s)

Limitations

Data Wrangler currently supports only Pandas DataFrames. Support for Spark DataFrames is in progress.
Data Wrangler’s display works better on large monitors, although different interface portions can be minimized or hidden to accommodate smaller screens.

Conclusion

Data Wrangler in Microsoft Fabric is undeniably a game-changer in data preparation. It combines the best of both worlds by offering the simplicity of Power Query with the robustness and flexibility of Python. As data continues to grow in importance, tools like Data Wrangler that simplify and expedite the data preparation process will be indispensable for organizations aiming to stay ahead.

That’s it for today!

Sources:

https://medium.com/towards-data-engineering/data-wrangler-in-fabric-simplifying-data-prep-with-no-code-ab4fe7429b49

https://radacad.com/fabric-data-wrangler-a-tool-for-data-scientist

https://learn.microsoft.com/en-us/fabric/data-science/data-wrangler

https://marketplace.visualstudio.com/items?itemName=ms-toolsai.datawrangler

https://github.com/microsoft/vscode-data-wrangler

Presenting the First Edition of the New ChatBot for Making Questions about Trademarks, Patents, and More from the Brazilian Patent and Trademark Office (BRPTO)

It is with great delight that I introduce the innovative ChatBot developed with artificial intelligence to answer queries concerning the official manuals provided by the National Institute of Industrial Property (BRPTO) on trademarks, patents, industrial designs, and more.

These manuals, presented in Brazilian Portuguese PDF format, are readily accessible in the respective sections of the INPI website. What’s unique about this initiative is the inclusivity that transcends language barriers. In the app, you can ask questions in any language, and the ChatBot will comprehend and respond in the language used in the query.

A vital feature of this AI-powered ChatBot is its ability to provide the exact source of the information extracted. Each time the ChatBot responds to a query, it will present the hyperlink to the PDF from where the information was derived. This enhances transparency and allows users to delve into more details if they wish.

Moreover, within the left-side options in the app, you can access the complete manuals utilized to create the AI model. This provides direct access to the information and serves as a testament to the quality and reliability of the data used.

How do you get to the BRPTO’s Chatbot app?

To try out the app, click here to reach out to me. Please include “BRPTO Chatbot” in your message.

Here are a few examples of the questions you can ask:

  1. What is a patent?
  2. What is a trademark?
  3. What are the fees I need to pay to file a patent?
  4. What is a contract?
  5. What is copyright?
  6. What are the rules of geographical indications?
  7. How was the intellectual protection of circuit topology established?
  8. What are the filing procedures for an industrial design?

What are the limitations?

  • If the National Institute of Industrial Property (INPI) makes any updates to a manual, it will be necessary to reprocess the Artificial Intelligence model and roll out a new update for the app.
  • The chatbot is only limited to answering questions related to the manuals specified on the left side of the app.
  • Any questions outside the context of these manuals will not be answered.
  • As we are using a free version of the vector database (Pinecone), there might be some confusion in the responses.

How the app was created?

The app was created based on the guidelines provided in my previous post that I copy below.

Conclusion

The AI ChatBot is an innovative solution that aims to democratize access to information and make it more convenient for the public to learn and understand essential aspects of industrial property rights. So feel free to explore, ask, and learn!

That’s it for today!