ASP.NET Core 10: Real-Time AI Streaming with ASP.NET Core 10 and Azure OpenAI

This article walks through building a token-streaming technical chat application using ASP.NET Core 10, the new Microsoft.Extensions.AI abstraction layer, and Azure OpenAI. The server streams each response token to the browser as it is generated, and a progress bar advances in real time to show the user that work is happening.

1. Why Streaming Matters for AI Applications

Large language models generate text one token at a time. A complete answer to a technical question may contain several hundred tokens and take multiple seconds to produce. Without streaming, the server silently waits for every token, then sends the full response in one HTTP reply. From the user's perspective the application appears frozen for those seconds.

With streaming, each token is forwarded to the browser the moment it leaves the model, so the answer builds up visibly in real time. There are three concrete reasons this matters:

Perceived performance. The user sees the response start within a fraction of a second, even when the full answer takes several seconds to complete.
Timeout avoidance. Long-running buffered responses risk being cut off by proxies or load balancers. A streaming connection keeps the socket active throughout.
Progressive reading. The user can start reading the beginning of the answer before it is finished, and can stop early if the first paragraph already solves the problem.

This application uses Server-Sent Events (SSE) to deliver tokens. SSE is a lightweight, one-way HTTP protocol designed exactly for this: the server holds a connection open and pushes data frames as they become available, and the browser processes each frame as it arrives.

2. Application Architecture

The diagram below shows how a question travels from the browser to Azure OpenAI and how each token travels back.

The four stages are:

The browser POSTs the question as JSON to POST /chat/stream on the ASP.NET Core 10 server.
The server builds a message list and calls GetStreamingResponseAsync() on the IChatClient, which sends the messages to the Azure OpenAI GPT-4o deployment.
Azure OpenAI streams tokens back to the server one at a time via the SDK's IAsyncEnumerable.
The server writes each token as an SSE event (data: {"text":"..."}\n\n) and flushes immediately. The browser's JavaScript reads these events and appends each token to the page.

3. NuGet Packages

Package	Version	Role
`Azure.AI.OpenAI`	2.1.0	Provides `AzureOpenAIClient`. Handles authentication, retry, and Azure-specific request shaping for hosted OpenAI deployments.
`Microsoft.Extensions.AI`	10.x	Defines `IChatClient`, `ChatMessage`, `ChatRole`, and `GetStreamingResponseAsync()`. Makes the application provider-agnostic.
`Microsoft.Extensions.AI.OpenAI`	10.3.0	Bridges `OpenAI.Chat.ChatClient` to `IChatClient` via `AsIChatClient()`. Also provides `AddChatClient()` for DI registration.
`Azure.Core`	1.44.1 (transitive)	Provides `AzureKeyCredential` for API key authentication. Pulled in automatically by `Azure.AI.OpenAI`.
`OpenAI`	2.8.0 (transitive)	Official OpenAI .NET client. `Azure.AI.OpenAI` builds on top of it; `GetChatClient()` returns an `OpenAI.Chat.ChatClient`.

Client-side libraries loaded from CDN (no NuGet required):

Library	Used for
Bootstrap 5.3.3	Grid, progress bar, cards, buttons, spinner
Bootstrap Icons 1.11.3	SVG icon font
marked.js	Converts the full streamed response to HTML after streaming ends
highlight.js 11.9.0	Syntax-highlights code blocks inside the rendered markdown

4. Program.cs — Server Setup and the Streaming Endpoint

The entire server is configured in a single Program.cs using ASP.NET Core 10's top-level program style. There are two responsibilities: registering the IChatClient with DI and defining the /chat/stream SSE endpoint.

4.1 Registering the IChatClient

using Azure;
using Azure.AI.OpenAI;
using Microsoft.Extensions.AI;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddControllersWithViews();

builder.Services.AddChatClient(
    new AzureOpenAIClient(
        new Uri(builder.Configuration["AzureOpenAI:Endpoint"]!),
        new AzureKeyCredential(builder.Configuration["AzureOpenAI:ApiKey"]!)
    ).GetChatClient(builder.Configuration["AzureOpenAI:DeploymentName"]!)
     .AsIChatClient()
);

The three-step chain works as follows:

new AzureOpenAIClient(endpoint, credential) — authenticates to the Azure OpenAI resource. Credentials are read from appsettings.json.
.GetChatClient(deploymentName) — scopes the client to a specific model deployment such as gpt-4o, returning an OpenAI.Chat.ChatClient.
.AsIChatClient() — wraps the concrete Azure client in the IChatClient abstraction. The rest of the application depends only on this interface.

4.2 The SSE Streaming Endpoint

app.MapPost("/chat/stream", async (ChatStreamRequest req, IChatClient chatClient, HttpResponse response) =>
{
    // Tell the browser to keep the connection open and treat each frame as an SSE event
    response.Headers.ContentType = "text/event-stream";
    response.Headers.CacheControl = "no-cache";
    response.Headers.Connection  = "keep-alive";

    var messages = new List<ChatMessage>
    {
        new ChatMessage(ChatRole.System,
            "You are a senior technical expert. Answer only technical questions..."),
        new ChatMessage(ChatRole.User, req.Question)
    };

    await foreach (var update in chatClient.GetStreamingResponseAsync(messages))
    {
        if (!string.IsNullOrEmpty(update.Text))
        {
            // JSON-encode each token so special characters do not break the SSE wire format
            var json = System.Text.Json.JsonSerializer.Serialize(new { text = update.Text });
            await response.WriteAsync($"data: {json}\n\n");
            // Flush immediately — without this, tokens would buffer and the browser
            // would not receive them until the internal write buffer fills up
            await response.Body.FlushAsync();
        }
    }

    // Sentinel event — tells the browser the stream is finished
    await response.WriteAsync("data: {\"done\":true}\n\n");
    await response.Body.FlushAsync();

}).DisableAntiforgery();  // fetch() JSON POST has no antiforgery cookie

record ChatStreamRequest(string Question);

Key points:

text/event-stream — the Content-Type that keeps the connection open and tells the browser to process frames as they arrive.
GetStreamingResponseAsync() — the Microsoft.Extensions.AI method that returns IAsyncEnumerable<StreamingChatCompletionUpdate>. The await foreach loop processes one token at a time without ever buffering the full response in memory.
FlushAsync() — critical for streaming. Each call pushes buffered bytes to the network so the browser receives each token immediately.
{"done":true} sentinel — a final event after the loop ends. The browser JavaScript uses this to know the stream is complete and trigger markdown rendering.
DisableAntiforgery() — this endpoint is called with fetch() and a JSON body, not an HTML form POST, so no antiforgery token is present in the request.
record ChatStreamRequest — ASP.NET Core's minimal API model binder automatically deserialises the JSON body {"question":"..."} into this record.

5. Index.cshtml — The Razor View

The Razor view contains HTML structure only. No server-side logic runs beyond setting the page title. All interactivity is handled by chat.js.

5.1 The Progress Bar Markup

<!-- Hidden by default. chat.js removes d-none when streaming starts -->
<div id="progressSection" class="mb-4 d-none">

    <span id="progressLabel">Connecting to Azure OpenAI...</span>
    <span id="tokenCount">0 tokens</span>

    <div class="progress" role="progressbar">
        <!-- Width driven by chat.js using an asymptotic formula -->
        <div id="progressBar"
             class="progress-bar progress-bar-striped progress-bar-animated bg-info"
             style="width: 0%">
        </div>
    </div>
</div>

5.2 The Dual-mode Response Card

<div id="responseCard" class="card d-none">
    <div class="card-body">
        <!-- Visible DURING streaming: raw tokens appended here by chat.js -->
        <pre id="streamingText"></pre>

        <!-- Visible AFTER streaming: marked.js renders the full markdown here -->
        <div id="renderedResponse" class="d-none"></div>
    </div>
</div>

During streaming, chat.js sets streamingText.textContent to the accumulated token text on each event. When {"done":true} is received, streamingText is hidden and renderedResponse receives the fully parsed HTML from marked.parse().

5.3 Scripts Section

@section Scripts {
    <script src="~/js/chat.js" asp-append-version="true"></script>
}

The @section Scripts Razor directive injects the script tag into the placeholder defined at the bottom of _Layout.cshtml, ensuring the script loads after the DOM is parsed. The asp-append-version Tag Helper appends a content hash to the URL so browsers always load the latest version of the file after deployment.

6. chat.js — Client-Side Streaming Logic

chat.js is plain JavaScript with no framework. It covers the full client lifecycle: sending the request, reading the SSE byte stream, updating the DOM token by token, animating the progress bar, and rendering the final markdown.

6.1 State Variables

let isStreaming      = false;  // blocks a second request while one is in flight
let fullResponseText = '';    // accumulates all tokens for markdown + clipboard
let tokenCounter     = 0;    // drives the live "N tokens" counter label
let progressValue    = 0;    // current progress bar percentage
let progressAnimFrame = null; // setTimeout handle — cancelled on stream end

6.2 Sending the Request and Reading the Stream

async function askQuestion() {
    const question = questionInput().value.trim();
    if (!question || isStreaming) return;

    isStreaming = true;
    // ... reset UI, show progress bar ...

    const response = await fetch('/chat/stream', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ question })
    });

    const reader  = response.body.getReader();
    const decoder = new TextDecoder();
    let buffer = '';

    while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        buffer += decoder.decode(value, { stream: true });

        const lines = buffer.split('\n');
        buffer = lines.pop(); // hold incomplete last line for next chunk

        for (const line of lines) {
            if (!line.startsWith('data: ')) continue;
            const parsed = JSON.parse(line.slice(6).trim());

            if (parsed.done)  { renderMarkdown(); completeProgress(); break; }
            if (parsed.error) { throw new Error(parsed.error); }
            if (parsed.text)  {
                fullResponseText += parsed.text;
                streamingText().textContent = fullResponseText;
                tokenCounter++;
            }
        }
    }
}

fetch() is used instead of the browser's built-in EventSource because EventSource supports only GET requests. fetch() gives access to the raw ReadableStream via response.body.getReader(). The buffer variable is necessary because the network does not guarantee that each reader.read() call returns a complete SSE line; a token boundary may fall in the middle of a data: ... line.

6.3 The Asymptotic Progress Bar

function advanceProgress() {
    if (progressValue < 90) {
        // Each tick advances 4% of the remaining gap to 90%
        // Result: fast at the start, slows to a crawl near 90%
        const increment = (90 - progressValue) * 0.04;
        setProgress(progressValue + Math.max(increment, 0.3));
    }
    progressAnimFrame = setTimeout(advanceProgress, 150); // repeat every 150 ms
}

function completeProgress() {
    stopProgressAnimation();
    progressBar().classList.remove('progress-bar-animated', 'progress-bar-striped');
    setProgress(100, 'Response complete');
    setTimeout(() => progressSection().classList.add('d-none'), 1200);
}

The bar never reaches 100% on its own. It approaches 90% asymptotically, then completeProgress() jumps it to 100% when the server sends the {"done":true} sentinel.

6.4 Markdown Rendering After Streaming

function renderMarkdown() {
    marked.setOptions({ breaks: true, gfm: true });
    const html = marked.parse(fullResponseText);
    renderedResp().innerHTML = html;

    // Syntax-highlight every code block produced by marked.js
    renderedResp().querySelectorAll('pre code').forEach(block => {
        hljs.highlightElement(block);
    });

    // Swap: hide raw text, show rendered HTML
    streamingText().classList.add('d-none');
    renderedResp().classList.remove('d-none');
}

Markdown is rendered only after streaming ends. Parsing incrementally would produce broken HTML on every token because a fenced code block starting with ```csharp has no valid closing tag until ``` is received later. Waiting for the full text guarantees that marked.parse() sees valid, complete markdown.

7. The SSE Wire Format

Each token event on the wire looks like this:

data: {"text":"The "}\n
\n
data: {"text":"answer"}\n
\n
data: {"done":true}\n
\n

Rules of the SSE protocol:

Each line starts with a field name (data) followed by a colon and value.
A blank line (double \n) marks the end of one event.
The browser processes events as they arrive without waiting for the connection to close.

The token text is JSON-encoded before being placed in the data field. This ensures that newlines, backslashes, and quotes inside the token do not accidentally match the SSE framing characters.

8. Configuration

{
  "AzureOpenAI": {
    "Endpoint":       "https://your-resource.openai.azure.com/",
    "ApiKey":         "your-azure-openai-key",
    "DeploymentName": "gpt-4o"
  }
}

builder.Configuration["AzureOpenAI:Endpoint"] reads these values using a colon-separated key path. The ! null-forgiving operator indicates the value is required at startup; the application will throw a clear runtime error if a key is missing, rather than silently failing later.

Once you run the Application, the result will be shown as follows:

9. Summary

The application demonstrates three capabilities that are new or improved in the .NET 10 ecosystem:

Microsoft.Extensions.AI abstraction. IChatClient decouples the application from the Azure OpenAI SDK. Switching to a different provider requires changing only the DI registration in Program.cs.
Minimal API SSE endpoint. A single app.MapPost() lambda creates the streaming endpoint with no controller class, no action method, and direct access to HttpResponse.
Fetch API ReadableStream consumption. chat.js consumes the SSE stream using response.body.getReader(), which supports POST-based streaming and full control over chunk buffering and parsing.

Together these three pieces produce a responsive real-time AI chat interface in roughly 150 lines of server code and 200 lines of JavaScript, delivering the same token-by-token streaming experience as commercially hosted AI products.

The code for this article can be downloaded from this link.

Search This Blog

Technology Wonders