Prompt Flow : Experiment with LLM(s)

Azure Machine Learning Prompt Flow is a development tool designed to simplify the development of AI applications that interact with Large Language Models (LLMs). It facilitates the experimentation and deployment of AI applications. Developers can easily test prompt variants and evaluate their performance. The interface also offers endpoint deployment directly from the studio.

In this article, I will explore the features of Prompt Flow through an example of Enterprise Search. We will discuss testing the “Bring Your Own Data Chat QnA” quickstart available in the studio.

Prerequisites

Here are the resources you might require:

Azure Machine Learning, which is essential.
Azure OpenAI. It’s worth noting that when you deploy an OpenAI model through Azure Machine Learning, it automatically creates an Azure OpenAI resource in your name, even if you already have one.
Optional – Azure Cognitive Search: If you plan to use your own data, you will need to create a Vector Index using Cognitive Search (another option is using a Faiss index).

On the Azure Machine Learning homepage, you can see a carousel of Prompt Flow quickstarts, including the one we’re interested in, allowing us to query our own documents (see image below). It is possible to clone the quickstart from this page.

Hands on Prompt Flow

After cloning the project, you will notice a new flow available in the Prompt Flow tab. From this page, you can:

Configure “Connections” to Azure OpenAI, vector databases (such as Weaviate or Qdrant), and more…
Associate a “Runtime” with your flow(s), either by creating a new one or linking a pre-configured runtime. Microsoft recommends using memory-optimized compute for this purpose.
Create a “Vector Index” from local files or from a dataset you’ve previously created. You must specify whether your vector store will be an Azure Cognitive Search Index or a Faiss Index. Note: This feature only supports specific file types : .md, .txt, .html, .py, .doc, .ppt, .pdf, and .xls.

Done with configuring Prompt Flow, you can open the cloned flow, which is presented as follows:

An interface for configuring the flow, testing prompt variations, and possibly incorporating Python code, such as with LangChain or Semantic Kernel (highlighted in the blue box in the screenshot).
A pipeline graph that closely resembles Azure Machine Learning Designer (red box).
A “Chat” button that enables you to simulate a chat experience for the user (green box).

Upon closer examination of the flow, we can see multiple stages that correspond to those observed in the diagram. For the project we’ve cloned, these stages include:

“Inputs” / “Outputs“: Establishing a connection with the chat interface.
“embed_the_question“: Embedding the user’s question using an embedding model deployed in AOAI (Azure OpenAI).
“search_question_from_indexed_docs“: Matching the vector index (either your custom one or the sample index) and retrieving the top_k relevant documents (2 in my case).
“generate_prompt_context“: Utilizing the retrieved content to generate the prompt context.
“prompt_variants“: Testing prompt variations.
“chat_with_context“: Sending the generated response back to the chat interface.

Testing Prompts and Plugins

Within the “Prompt_variants” box you can do some prompt engineering and test different prompts. For instance, if you generated 3 prompts and want to see how it influenced the response, you can “Run” (button on the upper right corner) your flow and it will executes the entire flow you built.

If you take a look at the diagram after the process, it now indicates it completed 3 times each step. You’ll be able to “View full output” (corresponding button) step by step for each prompt variant.

Once your flow meets your requirements and passes all your tests, you can deploy an endpoint directly from the Prompt Flow interface. This endpoint will enable you to seamlessly integrate your development into applications that utilize generative AI.

Prompt Flow also enables testing of plugins through the use of the Semantic Kernel. In the following documentation, Microsoft explains the integration and testing of plugins using the example of the MathPlugin to address mathematical problems : Evaluate your plugins with Prompt flow.

If we want to delve further into testing, there is an “Evaluation Flow” feature available to assess the outputs against specific criteria. The evaluation is based on the execution results but can also make use of an optional dataset containing ground truth data.

A performance measure of the flow based on individual scores is then calculated and will be used to evaluate your flow. In the image below (from the documentation), there is an accuracy score derived from combining individual scores. Microsoft provides documentation on this topic: Evaluation flow in Prompt flow (preview).

Prompt Flow VS Code Extension

Prompt Flow is also available as an extension to Visual Studio Code. Microsoft documentation on Prompt Flow and its integration into VSC is available here : Prompt flow documentation.

One response to “Prompt Flow : Experiment with LLM(s)”

Azure AI Studio – AI Data Vision

January 12, 2024 at 10:00 am

[…] with ou without RAG, using your reference data and different metrics (see the following article : Prompt Flow : Experiment with LLM(s)). It is also possible to perform a manual evaluation on your […]

LikeLike