Big Data

Microsoft Fabric evolves from data lake to application platform


If there’s one thing a modern business needs, it’s data—as much of it as possible. Starting with data warehouses and now with data lakes, we’re using on-premises and cloud tools to manage and analyze that data, putting it in shape to deliver necessary business insights.

Data is increasingly important today, as it’s now used to train and fine-tune custom AI models, or to provide essential grounding for existing AI applications. Microsoft’s Fabric is a hosted analytics platform that builds on top of existing data tools like Azure Synapse, so it’s not surprising that Microsoft used its AI-focused BUILD 2024 event to unveil new features that are targeted at supporting the at-scale analytics and data requirements of modern AI applications.

Microsoft has been describing Fabric as a platform that takes the complexity out of working with substantial amounts of data, allowing you to instead focus on analytics and getting value from that data. That can be by using tools like Power BI to build and share data-powered dashboards, or using that data to train, test, and operate custom AIs or to ground existing generative AI foundation models.

Wrapping Icebergs in Fabric

One of the more important new features was adding support for more data formats to help integrate Microsoft Fabric with other large-scale data platforms. Until now Fabric was built on top of the Delta Parquet data format, managed by the Linux Foundation, and used by many different lakehouse-based platforms. Its open source data storage technology lets you mix transaction logs with at-scale cloud object stores. There’s no need to use specialized data stores; instead, your choice of data engine can simply work with a Delta Lake file that’s stored in Azure Blob Storage.

It’s an important data forma, but it’s not the only one used to manage large amounts of data. One popular platform is Snowflake’s managed cloud data platform, which uses Apache’s Iceberg open table format. This uses SQL-like tools to manage your big data, allowing you to quickly edit large tables and edit your current schema.

If Microsoft Fabric is to be the hub for AI data on Azure, then it needs to support as many data sources as possible. So, one of the more significant data platform announcements at BUILD was support for Iceberg in Microsoft Fabric’s OneLake data environment alongside the Delta Parquet, as well as tools for a two-way link between Microsoft Fabric and Snowflake, letting you work with the tools you prefer.

One key aspect of Fabric’s support for Iceberg is using shortcuts to translate metadata between the two formats and allowing queries and analytical tools to treat them as a single source, no matter where they are hosted. This should allow organizations with existing large data sets hosted in Snowflake or other Iceberg environments to take advantage of Microsoft Fabric and its integration with tools like Azure AI Studio. This should simplify the process of training AI models on data held in Snowflake’s cloud, without having to store it in two separate places.

That same approach is being taken with both Adobe’s cloud-based marketing tools and with Azure Databricks. Since they use Microsoft Fabric’s shortcut tools, you’ll be able to bring existing Databricks catalogs into Fabric, and at the same time, your OneLake data will be visible as a catalog in Azure Databricks. This allows you to use the tool that’s best for the task you need, with workflows that cross different tool sets without compromising your data.

Improved real-time data support

Although Microsoft Fabric had basic support for one key data type—real-time streamed data—it required two different tools to use that data effectively. Running analytics over live data from your business systems and from industrial Internet of Things systems can provide rapid insights that help you catch issues before they affect your business, especially when tied to tools that can trigger alerts and actions when your data indicates problems.

The new Real-Time Intelligence tool provides a hub for working with streamed data. You can think of it as the equivalent of a data lake for your real-time data, bringing it in from multiple sources and providing a set of tools to manage and transform that data. The result is a no-code development environment that uses the familiar connector metaphor to help construct paths for your data, extracting information and routing the streamed data into a data lake for further analysis. Streamed data can come from inside Azure and from other external data sources.

This approach helps you extract the maximum value from your streamed data. By triggering on outlying events, you can respond quickly, trapping fraud in an ecommerce platform or spotting incipient failures in instrumented machinery. Data becomes a tool for training new AI models that can automate those processes.

Natural language queries with Copilots

Microsoft has been adding a natural language interface to Fabric in the shape of its own Copilot. This is intended to enable users to ask quick questions about their time-series data, generating the underlying Kusto Query Language (KQL) needed to repeat or refine the query. Usefully, this approach helps you learn to use KQL. You can quickly see how a KQL query relates to your initial question, which allows inexperienced users to pick up necessary data analysis skills.

That same underlying Copilot is used to build Microsoft Fabric’s new AI skills feature. Here you start by selecting a data source and, by using natural language questions and no additional configuration, quickly build complex queries, adding additional sources and tables, as necessary. Again, the AI tool will show you the query it’s built, allowing you to make edits and share the result with colleagues. Microsoft intends to make these skills available to Copilot Studio, giving you an end-to-end, no-code development environment for data and workflows.

Adding application APIs to Microsoft Fabric analytics

Microsoft Fabric is an important analytical tool, and it also offers a hub for managing and controlling your big data, ready for use in other applications. What’s needed is a way to attach APIs to that data so that Fabric endpoints can be built into your code. Until now all the Fabric APIs were RESTful management APIs, for building your own administrative tools. This latest set of updates lets you add your own GraphQL APIs to your data.

Data lakes and lakehouses can contain many different schemas, so using GraphQL’s type-based API definitions makes it possible to construct APIs that work across all your Fabric data, returning data from all your sources in a single JSON object. There’s no need for your code to have any knowledge of the data in your Fabric environment; the Fabric query engine provides all the necessary abstraction.

Creating an API is an uncomplicated process. Inside the Microsoft Fabric management environment, start by naming your API. Then choose your sources and the tables you want to expose. This creates the GraphQL schema, and you can work in the built-in schema explorer to define the queries and any necessary relationships between tables. Not all Fabric data sources are supported at the moment, but you should be able to get started with the current set of analytics endpoints, which lets you deliver access to existing analytics data. This allows Microsoft Fabric to store data, run analytics queries, store results in tables, and then offer API access to those results.

Once your API is ready, all you need to do is copy the resulting endpoint and pass it to your application developers. They’ll need to include appropriate authorizations, ensuring that only approved users get access (especially important if your API allows data to be modified).

These latest updates to Microsoft Fabric fill many of the platform’s obvious gaps. By making it easier to work with alternative data formats, including streamed data, you can now leverage existing investments, while support for GraphQL APIs offers the opportunity to build applications that can work with big data while Fabric handles the underlying queries behind the scenes.

By offering a way to abstract away from the complexity associated with data at scale, and by providing AI agents, Microsoft Fabric is demonstrating how a managed data platform can let you go from raw data to analytical applications no matter your skills. All you need to do is ask questions.

Copyright © 2024 IDG Communications, Inc.



READ SOURCE

This website uses cookies. By continuing to use this site, you accept our use of cookies.