Data Wrangler in Microsoft Fabric: A New Tool for Accelerating Data Preparation. Experience the Power Query Feel but with Python Code Output

In the modern digital era, the importance of streamlined data preparation cannot be emphasized enough. For data scientists and analysts, a large portion of time is dedicated to data cleansing and preparation, often termed ‘wrangling.’ Microsoft’s introduction of Data Wrangler in its Fabric suite seems like an answer to this age-old challenge. It promises Power Query’s intuitiveness and Python code outputs’ flexibility. Dive in to uncover the magic of this new tool.

Data preparation is a time-consuming and error-prone task. It often involves cleaning, transforming, and merging data from multiple sources. This can be a daunting task, even for experienced data scientists.

What is Data Wrangler?

Data Wrangler is a state-of-the-art tool Microsoft offers in its Fabric suite explicitly designed for data professionals. At its core, it aims to simplify the data preparation process by automating tedious tasks. Much like Power Query, it offers a user-friendly interface, but what sets it apart is that it can generate Python code as an output. As users interact with the GUI, Python code snippets are generated behind the scenes, making integrating various data science workflows easier.

Advantages of Data Wrangler

  1. User-Friendly Interface: Offers an intuitive GUI for those not comfortable with coding.
  2. Python Code Output: Generates Python code in real-time, allowing flexibility and easy integration.
  3. Time-Saving: Reduces the time spent on data preparation dramatically.
  4. Replicability: Since Python code is generated, it ensures replicable data processing steps.
  5. Integration with Fabric Suite: Can be effortlessly integrated with other tools within the Microsoft Fabric suite.
  6. No-code to Low-code Transition: Ideal for those wanting to transition from a no-code environment to a more code-centric one.

How to use Data Wrangler?

You have to click on Data Science inside the Power BI Service.

You have to select the Notebook button.

You have to insert this code above after the upload of the CSV file in the LakeHouse.

import pandas as pd

# Read a CSV into a Pandas DataFrame from e.g. a public blob store
df = pd.read_csv("/lakehouse/default/Files/Top_1000_Companies_Dataset.csv")

You have to click in the Lauch Data Wrangler and then select the data frame “df”.

On this screen, you can do all transformations you need.

In the end this code will be generate.

# Code generated by Data Wrangler for pandas DataFrame

def clean_data(df):
    # Drop columns: 'company_name', 'url' and 6 other columns
    df = df.drop(columns=['company_name', 'url', 'city', 'state', 'country', 'employees', 'linkedin_url', 'founded'])
    # Drop columns: 'GrowjoRanking', 'Previous Ranking' and 10 other columns
    df = df.drop(columns=['GrowjoRanking', 'Previous Ranking', 'job_openings', 'keywords', 'LeadInvestors', 'Accelerator', 'valuation', 'btype', 'total_funding', 'product_url', 'growth_percentage', 'contact_info'])
    # Drop column: 'indeed_url'
    df = df.drop(columns=['indeed_url'])
    # Performed 1 aggregation grouped on column: 'Industry'
    df = df.groupby(['Industry']).agg(estimated_revenues_sum=('estimated_revenues', 'sum')).reset_index()
    # Sort by column: 'estimated_revenues_sum' (descending)
    df = df.sort_values(['estimated_revenues_sum'], ascending=[False])
    return df

df_clean = clean_data(df.copy())

After that, you can create or add to a pipeline or schedule a moment to execute this transformation automatically.

Data Wrangler Extension for Visual Studio Code

Data Wrangler is a code-centric data cleaning tool integrated into VS Code and Jupyter Notebooks. Data Wrangler aims to increase the productivity of data scientists doing data cleaning by providing a rich user interface that automatically generates Pandas code and shows insightful column statistics and visualizations.

This document will cover how to:

  • Install and setup Data Wrangler
  • Launch Data Wrangler from a notebook
  • Use Data Wrangler to explore your data
  • Perform operations on your data
  • Edit and export code for data wrangling to a notebook
  • Troubleshooting and providing feedback

Setting up your environment

  1. If you have not already done so, install Python.
    IMPORTANT: Data Wrangler only supports Python version 3.8 or higher.
  2. Install Visual Studio Code.
  3. Install the Data Wrangler extension for VS Code from the Visual Studio Marketplace. For additional details on installing extensions, see Extension Marketplace. The Data Wrangler extension is named Data Wrangler, and Microsoft publishes it.

When you launch Data Wrangler for the first time, it will ask you which Python kernel you would like to connect to. It will also check your machine and environment to see if any required Python packages are installed (e.g., Pandas).

Here is a list of the required versions for Python and Python packages, along with whether they are automatically installed by Data Wrangler:

NameMinimum required versionAutomatically installed

* We use the open-source regex package to be able to use Unicode properties (for example, /\p{Lowercase_Letter}/), which aren’t supported by Python’s built-in regex module (re). Unicode properties make it easier and cleaner to support foreign characters in regular expressions.

If they are not found in your environment, Data Wrangler will attempt to install them for you via pip. If Data Wrangler cannot install dependencies, the easiest workaround is to run the pip install and then relaunch Data Wrangler manually. These dependencies are required for Data Wrangler such that it can generate Python and Pandas code.

Connecting to a Python kernel

There are currently two ways to connect to a Python kernel, as shown in the quick pick below.

1. Connect using a local Python interpreter

If this option is selected, the kernel connection is created using the Jupyter and Python extensions. We recommend this option for a simple setup and a quick way to start with Data Wrangler.

2. Connect using Jupyter URL and token

A kernel connection is created using JupyterLab APIs if this option is selected. Note that this option has performance benefits since it bypasses some initialization and kernel discovery processes. However, it will also require separate Jupyter Notebook server user management. We recommend this option generally in two cases: 1) if there are blocking issues in the first method and 2) for power users who would like to reduce the cold-start time of Data Wrangler.

To set up a Jupyter Notebook server and use it with this option, follow the steps below:

  1. Install Jupyter. We recommend installing the accessible version of Anaconda with Jupyter installed. Alternatively, follow the official instructions to install it.
  2. In the appropriate environment (e.g., in an Anaconda prompt if Anaconda is used), launch the server with the following command (replace the jupyter token with your secure token):
    jupyter notebook --no-browser --NotebookApp.token='<your-jupyter-token>'
  3. In Data Wrangler, connect using the address of the spawned server. E.g., http://localhost:8888, and pass in the token used in the previous step. Once configured, this information is cached locally and can automatically be reused for future connections.

Launching Data Wrangler

Once Data Wrangler has been successfully installed, there are 2 ways to launch it in VS Code.

Launching Data Wrangler from a Jupyter Notebook

If you are in a Jupyter Notebook working with Pandas data frames, you’ll now see a “Launch Data Wrangler” button appear after running specific operations on your data frame, such as df.head(). Clicking the button will open a new tab in VS Code with the Data Wrangler interface in a sandboxed environment.

Important note:
We currently only accept the following formats for launching:

  • df
  • df.head()
  • df.tail()

Where df is the name of the data frame variable. The code above should appear at the end of a cell without any comments or other code after it.


Launching Data Wrangler directly from a CSV file

You can also launch Data Wrangler directly from a local CSV file. To do so, open any VS Code folder with the CSV dataset you’d like to explore. In the File Explorer panel, right-click the. CSV dataset and click “Open in Data Wrangler.”


Using Data Wrangler


The Data Wrangler interface is divided into 6 components, described below.

The Quick Insights header lets you quickly see valuable information about each column. Depending on the column’s datatype, Quick Insights will show the distribution of the data, the frequency of data points, and missing and unique values.

The Data Grid gives you a scrollable pane to view your entire dataset. Additionally, when selecting an operation to perform, a preview will be illustrated in the data grid, highlighting the modified columns.

The Operations Panel is where you can search through Data Wrangler’s built-in data operations. The operations are organized by their top-level category.

The Summary Panel shows detailed summary statistics for your dataset or a specific column if one is selected. Depending on the data type, it will show information such as min, max values, datatype of the column, skew, and more.

The Operation History Panel shows a human-readable list of all the operations previously applied in the current Data Wrangling session. It enables users to undo specific operations or edit the most recent operation. Selecting a step will highlight the data grid changes and show the generated code associated with that operation.

The Code Preview section will show the Python and Pandas code that Data Wrangler has generated when an operation is selected. It will remain blank when no operation is selected. The code can even be edited by the user, and the data grid will highlight the effect on the data.

Example: Filtering a column

Let’s go through a simple example using Data Wrangler with the Titanic dataset to filter adult passengers on the ship.

We’ll start by looking at the quick insights of the Age column, and we’ll notice the distribution of the ages and that the minimum age is 0.42. For more information, we can glance at the Summary panel to see that the datatype is a float, along with additional statistics such as the passengers’ mean and median age.


To filter for only adult passengers, we can go to the Operation Panel and search for the keyword “Filter” to find the Filter operation. (You can also expand the “Sort and filter” category to find it.)


Once we select an operation, we are brought into the Operation Preview state, where parameters can be modified to see how they affect the underlying dataset before applying the operation. In this example, we want to filter the dataset only to include adults, so we’ll want to filter the Age column to only include values greater than or equal to 18.


Once the parameters are entered in the operation panel, we can see a preview of what will happen to the data. We’ll notice that the minimum value in age is now 18 in the Quick Insights, along with a visual preview of the rows that are being removed, highlighted in red. Finally, we’ll also notice the Code Preview section automatically shows the code that Data Wrangler produced to execute this Filter operation. We can edit this code by changing the filtered age to 21, and the data grid will automatically update accordingly.

After confirming that the operation has the intended effect, we can click Apply.

Editing and exporting code

Each step of the generated code can be modified. Changes to the data will be highlighted in the grid view as you make changes.

Once you’re done with your data cleaning steps in Data Wrangler, there are 3 ways to export your cleaned dataset from Data Wrangler.

  1. Export code back to Notebook and exit: This creates a new cell in your Jupyter Notebook with all the data cleaning code you generated packaged into a clean Python function.
  2. Export data as CSV: This saves the cleaned dataset as a new CSV file onto your machine.
  3. Copy code to clipboard: This copies all the code generated by Data Wrangler for the data cleaning operations.

Note: If you launched Data Wrangler directly from a CSV, the first export option will be to export the code into a new Jupyter Notebook.

Data Wrangler operations

These are the Data Wrangler operations currently supported in the initial launch of Data Wrangler (with many more to be added soon).

Sort valuesSort column(s) ascending or descending
FilterFilter rows based on one or more conditions
Calculate text lengthCreate new column with values equal to the length of each string value in a text column
One-hot encodeSplit categorical data into a new column for each category
Multi-label binarizerSplit categorical data into a new column for each category using a delimiter
Create column from formulaCreate a column using a custom Python formula
Change column typeChange the data type of a column
Drop columnDelete one or more columns
Select columnChoose one or more columns to keep and delete the rest
Rename columnRename one or more columns
Drop missing valuesRemove rows with missing values
Drop duplicate rowsDrops all rows that have duplicate values in one or more columns
Fill missing valuesReplace cells with missing values with a new value
Find and replaceSplit a column into several columns based on a user-defined delimiter
Group by column and aggregateGroup by columns and aggregate results
Strip whitespaceCapitalize the first character of a string with the option to apply to all words.
Split textRemove whitespace from the beginning and end of the text
Convert text to capital caseAutomatically create a column when a pattern is detected from your examples.
Convert text to lowercaseConvert text to lowercase
Convert text to uppercaseConvert text to UPPERCASE
String transform by exampleAutomatically perform string transformations when a pattern is detected from the examples you provide
DateTime formatting by exampleAutomatically perform DateTime formatting when a pattern is detected from the examples you provide
New column by exampleAutomatically create a column when a pattern is detected from the examples you provide.
Scale min/max valuesScale a numerical column between a minimum and maximum value
Custom operationAutomatically create a new column based on examples and the derivation of existing column(s)


Data Wrangler currently supports only Pandas DataFrames. Support for Spark DataFrames is in progress.
Data Wrangler’s display works better on large monitors, although different interface portions can be minimized or hidden to accommodate smaller screens.


Data Wrangler in Microsoft Fabric is undeniably a game-changer in data preparation. It combines the best of both worlds by offering the simplicity of Power Query with the robustness and flexibility of Python. As data continues to grow in importance, tools like Data Wrangler that simplify and expedite the data preparation process will be indispensable for organizations aiming to stay ahead.

That’s it for today!


OpenAI has unveiled a groundbreaking new feature, the Code Interpreter, accessible to all ChatGPT Plus users. Check out my experiments using the 2739 edition of BRPTO’s Patent Gazette

Code Interpreter is an innovative extension of ChatGPT, now available to all subscribers of the ChatGPT Plus service. This tool boasts the ability to execute code, work with uploaded files, analyze data, create charts, edit files, and carry out mathematical computations. The implications of this are profound, not just for academics and coders, but for anyone looking to streamline their research processes. Code Interpreter transcends the traditional scope of AI assistants, which have primarily been limited to generating text responses. It leverages large language models, the AI technology underpinning ChatGPT, to provide a general-purpose toolbox for problem-solving.

What is the Code Interpreter?

The Code Interpreter Plugin for ChatGPT is a multifaceted addition that provides the AI chatbot with the capacity to handle data and perform a broad range of tasks. This plugin equips ChatGPT with the ability to generate and implement code in natural language, thereby streamlining data evaluation, file conversions, and more. Pioneering users have experienced its effectiveness in activities like generating GIFs and examining musical preferences. The potential of the Code Interpreter Plugin is enormous, having the capability to revolutionize coding processes and unearth novel uses. By capitalizing on ChatGPT’s capabilities, users can harness the power of this plugin, sparking a voyage of discovery and creativity.

Professor Ethan Mollick from the Wharton School of the University of Pennsylvania shares his experiences with using the Code Interpreter

Artificial intelligence is rapidly revolutionizing every aspect of our lives, particularly in the world of data analytics and computational tasks. This transition was recently illuminated by Wharton Professor Ethan Mollick who commented, “Things that took me weeks to master in my PhD were completed in seconds by the AI.” This is not just a statement about time saved or operational efficiency, but it speaks volumes about the growing capabilities of AI technologies, specifically OpenAI’s new tool for ChatGPT – Code Interpreter.

Mollick, an early adopter of AI and an esteemed academic at the Wharton School of the University of Pennsylvania lauded Code Interpreter as the most significant application of AI in the sphere of complex knowledge work. Not only does it complete intricate tasks in record time, but Mollick also noticed fewer errors than those typically expected from human analysts.

One might argue that Code Interpreter transcends the traditional scope of AI assistants, which have primarily been limited to generating text responses. It leverages large language models, the AI technology underpinning ChatGPT, to provide a general-purpose toolbox for problem-solving.

Mollick commended Code Interpreter’s use of Python, a versatile programming language known for its application in software building and data analysis. He pointed out that it closes some of the gaps in language models as the output is not entirely text-based. The code is processed through Python, which promptly flags any errors.

In practice, when given a dataset on superheroes, Code Interpreter could clean and merge the data seamlessly, with an admirable effort to maintain accuracy. This process would have been an arduous task otherwise. Additionally, it allows a back-and-forth interaction during data visualization, accommodating various alterations and enhancements.

Remarkably, Code Interpreter doesn’t just perform pre-set analyses but recommends pertinent analytical approaches. For instance, it conducted predictive modeling to anticipate a hero’s potential powers based on other factors. Mollick was struck by the AI’s human-like reasoning about data, noting the AI’s observation that the powers were often visually noticeable as they derived from the comic book medium.

Beyond its technical capabilities, Code Interpreter democratizes access to complex data analysis, making it accessible to more people, thereby transforming the future of work. It saves time and reduces the tedium of repetitive tasks, enabling individuals to focus on more fulfilling, in-depth work.

Here are 10 examples of how you can use Code Interpreter for data analysis:

  1. Analyzing customer feedback data to identify trends and patterns.
  2. Creating interactive dashboards and reports for business intelligence purposes.
  3. Cleaning and transforming datasets for machine learning models.
  4. Extracting insights from social media data to inform marketing strategies.
  5. Generating charts and graphs to visualize sales data.
  6. Analyzing website traffic data to optimize the user experience.
  7. Creating custom functions and scripts for specific data analysis tasks.
  8. Performing statistical analysis on survey data.
  9. Automating repetitive data analysis tasks with Python scripts.
  10. Creating custom visualizations for presentations and reports.

How to use it? Follow my own experiments

Initially, you must upload a file of your choice. Following this, you are permitted to pose queries concerning it. The interpreter for the code will generate and run a Python script to address nearly all inquiries you have. The entire process is incredibly swift. I conducted a few trials using the XML file from BRTPTO’s 2739 Patent Gazette. The experience was truly incredible. There’s no need for any understanding of coding or Python. The code interpreter simply provides the results. If you wish, you can view the Python script.

By selecting the “Show Work” option, the Python script will become visible to you.

He executed the work flawlessly!

This is the screen of the CSV file.

Here is the Python code generated by the Code Interpreter to create the CSV file:

import pandas as pd
import xml.etree.ElementTree as ET

# Load the XML file
tree = ET.parse("/mnt/data/Patente_2739_04072023.xml")
root = tree.getroot()

# Function to extract data from XML into a dictionary
def extract_data(root):
    data = []
    for despacho in root.findall('despacho'):
        row = {}
        row['despacho_codigo'] = despacho.find('codigo').text
        row['despacho_titulo'] = despacho.find('titulo').text
        row['despacho_comentario'] = despacho.find('comentario').text if despacho.find('comentario') is not None else ""
        processo_patente = despacho.find('processo-patente')
        row['processo_numero'] = processo_patente.find('numero').text
        row['processo_data_deposito'] = processo_patente.find('data-deposito').text if processo_patente.find('data-deposito') is not None else ""
        row['processo_concessao'] = processo_patente.find('concessao').text if processo_patente.find('concessao') is not None else ""
        titulars = processo_patente.findall('titular-lista/titular')
        row['titular_nome_completo'] = "; ".join([titular.find('nome-completo').text for titular in titulars])
        row['titular_endereco_pais'] = "; ".join([titular.find('endereco/pais/sigla').text for titular in titulars])
    return data

# Extract data from XML
data = extract_data(root)

# Convert to DataFrame
df = pd.DataFrame(data)

# Save DataFrame to CSV
df.to_csv('/mnt/data/patente.csv', index=False)

df.head()  # Display the first few rows of the DataFrame

# Extract distinct application/patent numbers
distinct_application_numbers = df['processo_numero'].unique()

# Convert to DataFrame
df_distinct_application_numbers = pd.DataFrame(distinct_application_numbers, columns=['processo_numero'])

# Save DataFrame to CSV
df_distinct_application_numbers.to_csv('/mnt/data/distinct_application_numbers.csv', index=False)

df_distinct_application_numbers.head()  # Display the first few rows of the DataFrame

This video can demonstrate the capabilities of the Code Interpreter.

You can find more information on the official Open AI site by clicking here.


Code Interpreter is a powerful tool that is making data analysis accessible for everyone with ChatGPT Plus. By allowing users to run code snippets within their chat sessions, it enables them to perform a wide range of data analysis tasks quickly and easily. Whether you’re analyzing customer feedback data or creating custom visualizations for presentations and reports, Code Interpreter has something to offer everyone.

Code Interpreter invites us to consider how we can leverage such advancements across various sectors impacted by AI. Indeed, Code Interpreter signifies the dawn of a new era in artificial intelligence and computational capabilities. So why not give it a try today?

That’s it for today!


Wharton professor sees future of work in new ChatGPT tool | Fortune

The Future of Data Analytics: An Introduction to Microsoft Fabric

Microsoft Fabric, launched on May 24-25 of 2023 at the Microsoft Build event, is an end-to-end data and analytics platform that combines Microsoft’s OneLake data lake, Power BI, Azure Synapse, and Azure Data Factory into a unified software as a service (SaaS) platform. It’s a one-stop solution designed to serve various data professionals including data engineers, data warehousing professionals, data scientists, data analysts, and business users, enabling them to collaborate effectively within the platform to foster a healthy data culture across their organizations​​.

What are the Microsoft Fabric key features?

Data Factory – Microsoft’s Azure Data Factory is a powerful tool that combines the simplicity of Power Query with Azure Data Factory’s scale. It provides over 200 native connectors for data linkage from on-premises and cloud-based sources. Data Factory enables the scheduling and orchestration of notebooks and Spark jobs.

Data Engineering – Leveraging the extensive capabilities of Spark, data engineering in Microsoft Fabric provides premier authoring experiences and facilitates large-scale data transformations. It plays a crucial role in democratizing data through the lakehouse model. Moreover, integration with

Data Science – The data science capability in Microsoft Fabric aids in building, deploying, and operationalizing machine learning models within the Fabric framework. It interacts with Azure Machine Learning for built-in experiment tracking and model registry, empowering data scientists to enhance organizational data with predictions that business analysts can incorporate into their BI reports, thereby transitioning from descriptive to predictive insights.

Data Warehouse – The data warehousing component of Microsoft Fabric offers top-tier SQL performance and scalability. It features a full separation of computing and storage for independent scaling and native data storage in the open Delta Lake format.

Real-Time Analytics – Observational data, acquired from diverse sources like apps, IoT devices, human interactions, and more, represents the fastest-growing data category. This semi-structured, high-volume data, often in JSON or Text format with varying schemas, presents challenges for conventional data warehousing platforms. However, Microsoft Fabric’s Real-Time Analytics offers a superior solution for analyzing such data.

Power BI – Recognised as a leading Business Intelligence platform worldwide, Power BI in Microsoft Fabric enables business owners to access all Fabric data swiftly and intuitively for data-driven decision-making.

What are the Advantages of Microsoft Fabric?

Unified Platform: Microsoft Fabric provides a unified platform for different data analytics workloads such as data integration, engineering, warehousing, data science, real-time analytics, and business intelligence. This can foster a well-functioning data culture across the organization as data engineers, warehousing professionals, data scientists, data analysts, and business users can collaborate within Fabric​​.

Multi-cloud Support: Fabric is designed with a multi-cloud approach in mind, with support for data in Amazon S3 and (soon) Google Cloud Platform. This means that users are not restricted to using data only from Microsoft’s ecosystem, providing flexibility​.

Accessibility: Microsoft Fabric is currently available in public preview, and anyone can try the service without providing their credit card information. Starting from July 1, Fabric will be enabled for all Power BI tenants​.

AI Integration: The private preview of Copilot in Power BI will combine advanced generative AI with data, enabling users to simply describe the insights they need or ask a question about their data, and Copilot will analyze and pull the correct data into a report, turning data into actionable insights instantly​​.

Microsoft Fabric – Licensing and Pricing

Microsoft Fabric capacities are available for purchase in the Azure portal. These capacities provide the compute resources for all the experiences in Fabric from the Data Factory to ingest and transform to Data Engineering, Data Science, Data Warehouse, Real-Time Analytics, and all the way to Power BI for data visualization. A single capacity can power all workloads concurrently and does not need to be pre-allocated across the workloads. Moreover, a single capacity can be shared among multiple users and projects, without any limitations on the number of workspaces or creators that can utilize it.

To gain access to Microsoft Fabric, you have three options:

  1. Leverage your existing Power BI Premium subscription by turning on the Fabric preview switch. All Power BI Premium capacities can instantly power all the Fabric workloads with no additional action required. If you already have a Power BI Premium subscription, you can simply turn on the Fabric preview switch. This means you can enable Microsoft Fabric’s capabilities as part of your existing Power BI Premium subscription without having to do anything else. All the capacities you have with your Power BI Premium subscription can be used to power the full range of workloads in Microsoft Fabric. In other words, you can use your existing Power BI Premium resources to run all of the data and analytics tasks that Microsoft Fabric can handle.
  2. Start a Fabric trial if your tenant supports trials. If you’re not sure about committing to Microsoft Fabric yet, you can start a trial if your tenant (an instance of Azure Active Directory) supports it. A trial allows you to test the service before deciding to purchase. During the trial period, you can explore the full capabilities of Microsoft Fabric, such as data ingestion, data transformation, data engineering, data science, data warehouse operations, real-time analytics, and data visualization with Power BI.
  3. Purchase a Fabric pay-as-you-go capacity from the Azure portal. If you decide that Microsoft Fabric suits your needs and you don’t have a Power BI Premium subscription, you can directly purchase a Fabric capacity on a pay-as-you-go basis from the Azure portal. The pay-as-you-go model is flexible because it allows you to pay for only the compute and storage resources you use. Microsoft Fabric capacities come in different sizes, from F2 to F2048, representing 2 – 2048 Capacity Units (CU). Your bill will be determined by the amount of computing you provision (i.e., the size of the capacity you choose) and the amount of storage you use in OneLake, the data lake built into Microsoft Fabric. This model also allows you to easily scale your capacities up and down to adjust their computing power, and even pause your capacities when not in use to save on your bills​​.

Microsoft Fabric is a unified product for all your data and analytics workloads. Rather than provisioning and managing separate compute for each workload, with Fabric, your bill is determined by two variables: the amount of compute you provision and the amount of storage you use.

Follow the capacities that you can buy in the Azure portal:

Check out this video from Guy and Cube which breaks down the details on pricing and licensing.

How to activate the Microsoft Fabric Trial version?

Step 1

Login to Microsoft Power BI with your Developer Account

You will observe that asides from the OneLake icon at the top left, everything looks normal if you are familiar with Power BI Service.

Step 2

Enable Microsoft Fabric for your Tenant

Your Screen will Look like this

So far, we’ve only enabled Microsoft Fabric at the tenant level. This doesn’t give full access to Fabric resources as can be seen in the illustration below

So, Let’s upgrade the Power BI License to Microsoft Fabric Trial

For a smoother experience, You should create a new Workspace and add Microsoft Fabric Trial License as can be seen below

As you can see, while creating a new Workspace, you can now Assign Fabric Trial License to it. Upon creation, we are able to take full advantage of Microsoft Fabric

This video by Guy and Cube explains the steps for getting the Microsoft Fabric Trial.


Microsoft Fabric is currently in preview but already represents a significant advancement in the field of data and analytics, offering a unified platform that brings together various tools and services. It enables a smooth and collaborative experience for a variety of data professionals, fostering a data-driven culture within organizations. Let´s wait for the next steps from Microsoft.

That’s it for today!

Implementing Data Governance in Power BI: A Step-by-Step Guide

As data plays a crucial role in decision-making and data-driven insights, organizations require a robust data governance framework to manage and monitor their data assets. Power BI offers various features and tools that aid in implementing data governance and ensuring data accuracy, reliability, and security.

As data becomes increasingly critical to organizations of all sizes and industries, managing this data effectively and securely becomes just as important. A crucial aspect of data management is data governance, which is defining and enforcing policies, procedures, and standards for data management. This article will explore data governance basics, how to implement it in Power BI, and the advantages of using Power BI Premium.

What is Data Governance?

Data governance is the set of processes, policies, and standards organizations use to manage their data effectively. It encompasses everything from data quality and security to data privacy and retention. Effective data governance is crucial for organizations to ensure that their data is accurate, secure, and accessible. In addition, it helps organizations make informed decisions, reduce risks associated with poor data quality, and maintain compliance with legal and regulatory requirements.

How to Implement Data Governance in Power BI

Power BI provides various features and tools to help implement data governance. These include Dataflows, Datamarts, Sensitivity labels, Endorsement, Discovery, and Row-Level-Security(RLS). Dataflows allow organizations to connect, clean, and transform data, while Datamarts provide a centralized data repository. Sensitivity labels help to classify and protect sensitive data, while Endorsement allows organizations to enforce data quality standards. Finally, Discovery helps organizations manage, monitor, and understand their data assets. Let’s explain each of them.


dataflow is a collection of tables created and managed in workspaces in the Power BI service. A table is a set of columns used to store data, much like a table within a database. You can add and edit tables in your dataflow and manage data refresh schedules directly from the workspace in which your dataflow was created.

As data volume grows, so does the challenge of wrangling that data into well-formed, actionable information. We want data ready for analytics to populate visuals, reports, and dashboards, so we can quickly turn our volumes of data into actionable insights. With self-service data prep for big data in Power BI, you can go from data to Power BI insights with just a few actions.

When to use dataflows

Dataflows are designed to support the following scenarios:

Create reusable transformation logic that many datasets and reports inside Power BI can share. Dataflows promote the reusability of the underlying data elements, preventing the need to create separate connections with your cloud or on-premises data sources.

Expose the data in your Azure Data Lake Gen 2 storage, enabling you to connect other Azure services to the raw underlying data.

Create a single source of truth by forcing analysts to connect to the dataflows rather than connecting to the underlying systems. This single source gives you control over which data is accessed and how data is exposed to report creators. You can also map the data to industry standard definitions, enabling you to create tidy curated views, which can work with other services and products in the Power Platform.

Suppose you want to work with large data volumes and perform ETL at scale; dataflows with Power BI Premium scale more efficiently and give you more flexibility. Dataflows support a wide range of cloud and on-premises sources.

Prevent analysts from having direct access to the underlying data source. Since report creators can build on top of dataflows, it might be more convenient for you to allow access to underlying data sources only to a few individuals and then provide access to the dataflows for analysts to build on. This approach reduces the load to the underlying systems and gives administrators finer control of when the systems get loaded from refreshes.

    You can use Power BI Desktop and the Power BI service with dataflows to create datasets, reports, dashboards, and apps that use the Common Data Model. You can gain deep insights into your business activities from these resources. Dataflow refresh scheduling is managed directly from the workspace in which your dataflow was created, just like your datasets.

    Click here to learn how to create a Dataflow in Power BI.


    Datamarts are self-service analytics solutions that enable users to store and explore data in a fully managed database.

    When to use Datamarts

    Datamarts are targeted toward interactive data workloads for self-service scenarios. For example, suppose you’re working in accounting or finance. In that case, you can build your data models and collections, which you can then use to self-serve business questions and answers through T-SQL and visual query experiences. In addition, you can still use those data collections for more traditional Power BI reporting experiences. Datamarts are recommended for customers who need domain-oriented, decentralized data ownership and architecture, such as users who need data as a product or a self-service data platform.

    Datamarts are designed to support the following scenarios:

    Departmental self-service data: Centralize small to moderate data volume (approximately 100 GB) in a self-service fully managed SQL database. Datamarts enable you to designate a single store for self-service departmental downstream reporting needs (such as Excel, Power BI reports, and others), thereby reducing the infrastructure in self-service solutions.

    Relational database analytics with Power BI: Access a datamart’s data using external SQL clients. Azure Synapse and other services/tools that use T-SQL can also use datamarts in Power BI.

    End-to-end semantic models: Enable Power BI creators to build end-to-end solutions without dependencies on other tooling or IT teams. Datamarts eliminates managing orchestration between dataflows and datasets through auto-generated datasets while providing visual experiences for querying data and ad-hoc analysis, all backed by Azure SQL DB.

    Click here if you want to know how to create a Datamart.

    Sensitivity labels

    A Sensitivity label is an information icon that users can apply in the Power BI Desktop or the Power BI Service. They are essentially digital stamps that can be applied to a resource to classify and restrict critical content when shared outside Power BI.

    Click here if you want more information about implementing sensitivity labels.


    Power BI provides two ways to endorse your valuable, high-quality content to increase its visibility: promotion and certification.
    Promotion: Promotion is a way to highlight the content you think is valuable and worthwhile for others to use. It encourages the collaborative use and spread of content within an organization.
    Any content owner and member with write permissions on the workspace where the content is located can promote the content when they think it’s good enough for sharing.
    Certification: Certification means that the content meets the organization’s quality standards and can be regarded as reliable, authoritative, and ready for use.
    Only authorized reviewers (defined by the Power BI administrator) can certify content. Content owners who wish to see their content certified and are not authorized to certify it themselves must follow their organization’s guidelines about getting their content certified.

    Click here to learn how to endorse your content in Power BI.

    Dataset Discovery

    The Power BI dataset discovery hub empowers Power BI and Microsoft Teams users to discover and re-use organizational and curated datasets and answer their business questions in Power BI or Excel. The hub will empower data owners to manage their assets in a central location.

    Click here to learn more about dataset discovery.

    Row-Level-Security (RLS)

    Row-level security (RLS) with Power BI can be used to restrict data access for given users. Filters restrict data access at the row level, and you can define filters within roles. In the Power BI service, members of a workspace have access to datasets in the workspace. RLS doesn’t restrict this data access.

    Click here to learn more about Row-level security

    What Is Self-Service in Power BI?

    Self-service business intelligence (BI) is a data analytics method that allows business users (e.g., business analysts, managers, and executives) to access and explore datasets without experience in BI, data mining, and statistical analysis. Users can run queries and customize data visualization, dashboards, and reports to support real-time data-driven decision-making.

    Power BI offers robust self-service capabilities. You can tap into data from on-premise, and cloud-based data sources (e.g., Dynamics 365, Salesforce, Azure SQL Data Warehouse, Excel, SharePoint), then filter, sort, analyze, and visualize the information without the help of a BI or IT team.

    Using the Power Query experience, business analysts can directly ingest, transform, integrate, and enrich big data in the Power BI web service. The ingested data can then be shared with other users across various Power BI models, reports, and dashboards.

    How vital is Self-Service in Power BI?

    In many businesses, productivity and agility suffer due to a lengthy process for BI-related data requests. For example, when Alice asks Bob a question, Bob has to wait for the BI/IT team to pull the data. This can take several weeks and multiple meetings, slowing the decision-making process.

    But with Power BI self-service, Bob can quickly retrieve real-time data, and Alice can immediately drill down into relevant datasets during the first meeting. This results in a more efficient discussion and a potential solution that can be implemented immediately.

    The significance of Power BI self-service goes beyond just real-time insights, collaboration, and data reuse. It helps business users develop the habit of relying on data when making decisions. Without easy access to data analytics, they may rely on instincts or experience, leading to suboptimal outcomes. But with real-time data at their fingertips, users can make data-driven decisions, establishing a pattern of data-informed decision-making.

    Implementing Effective Data Governance in a Power BI Self-Service Environment

    Data Governance is critical in implementing a self-service culture in Power BI as it provides a framework for defining, maintaining, and enforcing data management policies. The following are critical components of a data governance plan in Power BI:

    1. Data Quality: Define data quality and accuracy standards to ensure that the data used is reliable and trustworthy.
    2. Data Security: Implement security measures to ensure that sensitive data is protected and only accessible by authorized users.
    3. Data Lineage: Define the lineage of the data sources used in Power BI to ensure that the data can be traced back to its source.
    4. Data Ownership: Assign ownership of data sources and ensure that data owners are responsible for maintaining the accuracy of their data.
    5. Data Stewardship: Designate data stewards responsible for maintaining data quality and ensuring compliance with data management policies.
    6. Data Access Control: Implement access controls to ensure that only authorized users can access sensitive data.
    7. Data Auditing: Implement auditing and monitoring processes to track changes to the data and ensure compliance with data management policies.

    By implementing these key components, organizations can establish a strong foundation for a self-service culture in Power BI while ensuring that the data is secure, accurate, and trustworthy.

    Maximizing Your Data Governance with Power BI Premium

    From scalability to security, Power BI Premium offers a range of features that can help organizations manage their data more effectively. With dedicated capacity, IT departments can ensure consistent performance for their teams. Advanced security features also guarantee data privacy and protection. Follow below the ten advantages of implementing data governance with Power Bi Premium:

    1. Scalability: Power BI Premium can handle large amounts of data and high concurrent usage.
    2. Dedicated Capacity: Dedicated resources for Power BI Premium ensure consistent performance.
    3. IT Governance: IT departments can centrally manage and govern Power BI deployments.
    4. Data Privacy & Security: Advanced security features ensure data privacy and protection.
    5. Shared Workspaces: Teams can collaborate on data and reports in a secure environment.
    6. Unrestricted Data Sources: Power BI Premium supports a broader range of data sources than Power BI Pro.
    7. Dynamic Row-Level Security: Secure access to sensitive data can be managed dynamically.
    8. On-Premises Data Connectivity: Power BI Premium supports connectivity to on-premises data sources.
    9. Long-Term Data Retention: Power BI Premium enables organizations to retain data for extended periods.
    10. Lower TCO: Power BI Premium can provide lower total ownership costs than purchasing individual Power BI Pro licenses.

    10 Effective Strategies for Implementing Data Governance in Power BI

    1. Creating Dataflows for cleaning and transforming data.
    2. Implementing Sensitivity labels to classify and protect sensitive data.
    3. Using Datamarts for centralizing data and improving data management.
    4. Enforcing data quality standards with Endorsement.
    5. Monitoring data assets with Discovery.
    6. Implementing data privacy and security with Power BI Premium.
    7. Improving report refresh times and performance with Power BI Premium.
    8. Sharing reports and dashboards with a larger audience with Power BI Premium.
    9. Utilizing Power BI Premium’s increased capacity for large datasets.
    10. Improving collaboration and data sharing with Power BI Premium’s multi-user authoring feature.

    Video talking about Building a Data Governance Plan for Your Power BI Environment.


    Data governance is an essential aspect of data management, helping organizations to ensure that their data is accurate, secure, and accessible. Power BI provides several features to help organizations implement data governance, including Power BI Premium, dataflows, and Datamarts. With these features, organizations can automate collecting and transforming data, reduce the risk of manual errors, and maintain compliance with legal and regulatory requirements. Whether you’re just starting to explore Power BI or are already using it to manage your data, implementing data governance is a crucial step toward effective data management.

    It’s very interesting to look at the Power BI adoption roadmap.

    Matthew Roche’s Blog from Microsoft is a massive reference to Data Culture and Governance. This guy explains everything about Dataflows here.

    If you have any questions discussed in this post or need help, feel free to contact me at this link.

    That’s it for today!