ASP.NET Core 10: Real-Time AI Streaming with ASP.NET Core 10 and Azure OpenAI
This article walks through building a token-streaming technical chat application using
ASP.NET Core 10, the new Microsoft.Extensions.AI abstraction layer, and Azure OpenAI.
The server streams each response token to the browser as it is generated, and a progress bar
advances in real time to show the user that work is happening.
1. Why Streaming Matters for AI Applications
Large language models generate text one token at a time. A complete answer to a technical question may contain several hundred tokens and take multiple seconds to produce. Without streaming, the server silently waits for every token, then sends the full response in one HTTP reply. From the user's perspective the application appears frozen for those seconds.
With streaming, each token is forwarded to the browser the moment it leaves the model, so the answer builds up visibly in real time. There are three concrete reasons this matters:
- Perceived performance. The user sees the response start within a fraction of a second, even when the full answer takes several seconds to complete.
- Timeout avoidance. Long-running buffered responses risk being cut off by proxies or load balancers. A streaming connection keeps the socket active throughout.
- Progressive reading. The user can start reading the beginning of the answer before it is finished, and can stop early if the first paragraph already solves the problem.
This application uses Server-Sent Events (SSE) to deliver tokens. SSE is a lightweight, one-way HTTP protocol designed exactly for this: the server holds a connection open and pushes data frames as they become available, and the browser processes each frame as it arrives.
2. Application Architecture
The diagram below shows how a question travels from the browser to Azure OpenAI and how each token travels back.
The four stages are:
- The browser POSTs the question as JSON to
POST /chat/streamon the ASP.NET Core 10 server. - The server builds a message list and calls
GetStreamingResponseAsync()on theIChatClient, which sends the messages to the Azure OpenAI GPT-4o deployment. - Azure OpenAI streams tokens back to the server one at a time via the SDK's
IAsyncEnumerable. - The server writes each token as an SSE event (
data: {"text":"..."}\n\n) and flushes immediately. The browser's JavaScript reads these events and appends each token to the page.
3. NuGet Packages
| Package | Version | Role |
|---|---|---|
Azure.AI.OpenAI |
2.1.0 | Provides AzureOpenAIClient. Handles authentication, retry, and Azure-specific request shaping for hosted OpenAI deployments. |
Microsoft.Extensions.AI |
10.x | Defines IChatClient, ChatMessage, ChatRole, and GetStreamingResponseAsync(). Makes the application provider-agnostic. |
Microsoft.Extensions.AI.OpenAI |
10.3.0 | Bridges OpenAI.Chat.ChatClient to IChatClient via AsIChatClient(). Also provides AddChatClient() for DI registration. |
Azure.Core |
1.44.1 (transitive) | Provides AzureKeyCredential for API key authentication. Pulled in automatically by Azure.AI.OpenAI. |
OpenAI |
2.8.0 (transitive) | Official OpenAI .NET client. Azure.AI.OpenAI builds on top of it; GetChatClient() returns an OpenAI.Chat.ChatClient. |
Client-side libraries loaded from CDN (no NuGet required):
| Library | Used for |
|---|---|
| Bootstrap 5.3.3 | Grid, progress bar, cards, buttons, spinner |
| Bootstrap Icons 1.11.3 | SVG icon font |
| marked.js | Converts the full streamed response to HTML after streaming ends |
| highlight.js 11.9.0 | Syntax-highlights code blocks inside the rendered markdown |
4. Program.cs — Server Setup and the Streaming Endpoint
The entire server is configured in a single Program.cs using ASP.NET Core 10's top-level
program style. There are two responsibilities: registering the IChatClient with DI and
defining the /chat/stream SSE endpoint.
4.1 Registering the IChatClient
using Azure;
using Azure.AI.OpenAI;
using Microsoft.Extensions.AI;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddControllersWithViews();
builder.Services.AddChatClient(
new AzureOpenAIClient(
new Uri(builder.Configuration["AzureOpenAI:Endpoint"]!),
new AzureKeyCredential(builder.Configuration["AzureOpenAI:ApiKey"]!)
).GetChatClient(builder.Configuration["AzureOpenAI:DeploymentName"]!)
.AsIChatClient()
);
The three-step chain works as follows:
new AzureOpenAIClient(endpoint, credential)— authenticates to the Azure OpenAI resource. Credentials are read fromappsettings.json..GetChatClient(deploymentName)— scopes the client to a specific model deployment such asgpt-4o, returning anOpenAI.Chat.ChatClient..AsIChatClient()— wraps the concrete Azure client in theIChatClientabstraction. The rest of the application depends only on this interface.
4.2 The SSE Streaming Endpoint
app.MapPost("/chat/stream", async (ChatStreamRequest req, IChatClient chatClient, HttpResponse response) =>
{
// Tell the browser to keep the connection open and treat each frame as an SSE event
response.Headers.ContentType = "text/event-stream";
response.Headers.CacheControl = "no-cache";
response.Headers.Connection = "keep-alive";
var messages = new List<ChatMessage>
{
new ChatMessage(ChatRole.System,
"You are a senior technical expert. Answer only technical questions..."),
new ChatMessage(ChatRole.User, req.Question)
};
await foreach (var update in chatClient.GetStreamingResponseAsync(messages))
{
if (!string.IsNullOrEmpty(update.Text))
{
// JSON-encode each token so special characters do not break the SSE wire format
var json = System.Text.Json.JsonSerializer.Serialize(new { text = update.Text });
await response.WriteAsync($"data: {json}\n\n");
// Flush immediately — without this, tokens would buffer and the browser
// would not receive them until the internal write buffer fills up
await response.Body.FlushAsync();
}
}
// Sentinel event — tells the browser the stream is finished
await response.WriteAsync("data: {\"done\":true}\n\n");
await response.Body.FlushAsync();
}).DisableAntiforgery(); // fetch() JSON POST has no antiforgery cookie
record ChatStreamRequest(string Question);
Key points:
text/event-stream— the Content-Type that keeps the connection open and tells the browser to process frames as they arrive.GetStreamingResponseAsync()— theMicrosoft.Extensions.AImethod that returnsIAsyncEnumerable<StreamingChatCompletionUpdate>. Theawait foreachloop processes one token at a time without ever buffering the full response in memory.FlushAsync()— critical for streaming. Each call pushes buffered bytes to the network so the browser receives each token immediately.{"done":true}sentinel — a final event after the loop ends. The browser JavaScript uses this to know the stream is complete and trigger markdown rendering.DisableAntiforgery()— this endpoint is called withfetch()and a JSON body, not an HTML form POST, so no antiforgery token is present in the request.record ChatStreamRequest— ASP.NET Core's minimal API model binder automatically deserialises the JSON body{"question":"..."}into this record.
5. Index.cshtml — The Razor View
The Razor view contains HTML structure only. No server-side logic runs beyond setting the page title.
All interactivity is handled by chat.js.
5.1 The Progress Bar Markup
<!-- Hidden by default. chat.js removes d-none when streaming starts -->
<div id="progressSection" class="mb-4 d-none">
<span id="progressLabel">Connecting to Azure OpenAI...</span>
<span id="tokenCount">0 tokens</span>
<div class="progress" role="progressbar">
<!-- Width driven by chat.js using an asymptotic formula -->
<div id="progressBar"
class="progress-bar progress-bar-striped progress-bar-animated bg-info"
style="width: 0%">
</div>
</div>
</div>
5.2 The Dual-mode Response Card
<div id="responseCard" class="card d-none">
<div class="card-body">
<!-- Visible DURING streaming: raw tokens appended here by chat.js -->
<pre id="streamingText"></pre>
<!-- Visible AFTER streaming: marked.js renders the full markdown here -->
<div id="renderedResponse" class="d-none"></div>
</div>
</div>
During streaming, chat.js sets streamingText.textContent to the accumulated
token text on each event. When {"done":true} is received, streamingText is hidden
and renderedResponse receives the fully parsed HTML from marked.parse().
5.3 Scripts Section
@section Scripts {
<script src="~/js/chat.js" asp-append-version="true"></script>
}
The @section Scripts Razor directive injects the script tag into the placeholder defined at the
bottom of _Layout.cshtml, ensuring the script loads after the DOM is parsed. The
asp-append-version Tag Helper appends a content hash to the URL so browsers always load the
latest version of the file after deployment.
6. chat.js — Client-Side Streaming Logic
chat.js is plain JavaScript with no framework. It covers the full client lifecycle: sending
the request, reading the SSE byte stream, updating the DOM token by token, animating the progress bar,
and rendering the final markdown.
6.1 State Variables
let isStreaming = false; // blocks a second request while one is in flight
let fullResponseText = ''; // accumulates all tokens for markdown + clipboard
let tokenCounter = 0; // drives the live "N tokens" counter label
let progressValue = 0; // current progress bar percentage
let progressAnimFrame = null; // setTimeout handle — cancelled on stream end
6.2 Sending the Request and Reading the Stream
async function askQuestion() {
const question = questionInput().value.trim();
if (!question || isStreaming) return;
isStreaming = true;
// ... reset UI, show progress bar ...
const response = await fetch('/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ question })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop(); // hold incomplete last line for next chunk
for (const line of lines) {
if (!line.startsWith('data: ')) continue;
const parsed = JSON.parse(line.slice(6).trim());
if (parsed.done) { renderMarkdown(); completeProgress(); break; }
if (parsed.error) { throw new Error(parsed.error); }
if (parsed.text) {
fullResponseText += parsed.text;
streamingText().textContent = fullResponseText;
tokenCounter++;
}
}
}
}
fetch() is used instead of the browser's built-in EventSource because
EventSource supports only GET requests. fetch() gives access to the raw
ReadableStream via response.body.getReader(). The buffer
variable is necessary because the network does not guarantee that each reader.read() call
returns a complete SSE line; a token boundary may fall in the middle of a data: ... line.
6.3 The Asymptotic Progress Bar
function advanceProgress() {
if (progressValue < 90) {
// Each tick advances 4% of the remaining gap to 90%
// Result: fast at the start, slows to a crawl near 90%
const increment = (90 - progressValue) * 0.04;
setProgress(progressValue + Math.max(increment, 0.3));
}
progressAnimFrame = setTimeout(advanceProgress, 150); // repeat every 150 ms
}
function completeProgress() {
stopProgressAnimation();
progressBar().classList.remove('progress-bar-animated', 'progress-bar-striped');
setProgress(100, 'Response complete');
setTimeout(() => progressSection().classList.add('d-none'), 1200);
}
The bar never reaches 100% on its own. It approaches 90% asymptotically, then
completeProgress() jumps it to 100% when the server sends the {"done":true} sentinel.
6.4 Markdown Rendering After Streaming
function renderMarkdown() {
marked.setOptions({ breaks: true, gfm: true });
const html = marked.parse(fullResponseText);
renderedResp().innerHTML = html;
// Syntax-highlight every code block produced by marked.js
renderedResp().querySelectorAll('pre code').forEach(block => {
hljs.highlightElement(block);
});
// Swap: hide raw text, show rendered HTML
streamingText().classList.add('d-none');
renderedResp().classList.remove('d-none');
}
Markdown is rendered only after streaming ends. Parsing incrementally would produce broken
HTML on every token because a fenced code block starting with ```csharp has no valid
closing tag until ``` is received later. Waiting for the full text guarantees that
marked.parse() sees valid, complete markdown.
7. The SSE Wire Format
Each token event on the wire looks like this:
data: {"text":"The "}\n
\n
data: {"text":"answer"}\n
\n
data: {"done":true}\n
\n
Rules of the SSE protocol:
- Each line starts with a field name (
data) followed by a colon and value. - A blank line (double
\n) marks the end of one event. - The browser processes events as they arrive without waiting for the connection to close.
The token text is JSON-encoded before being placed in the data field. This ensures that
newlines, backslashes, and quotes inside the token do not accidentally match the SSE framing characters.
8. Configuration
{
"AzureOpenAI": {
"Endpoint": "https://your-resource.openai.azure.com/",
"ApiKey": "your-azure-openai-key",
"DeploymentName": "gpt-4o"
}
}
builder.Configuration["AzureOpenAI:Endpoint"] reads these values using a colon-separated
key path. The ! null-forgiving operator indicates the value is required at startup; the
application will throw a clear runtime error if a key is missing, rather than silently failing later.
Once you run the Application, the result will be shown as follows:
9. Summary
The application demonstrates three capabilities that are new or improved in the .NET 10 ecosystem:
-
Microsoft.Extensions.AI abstraction.
IChatClientdecouples the application from the Azure OpenAI SDK. Switching to a different provider requires changing only the DI registration inProgram.cs. -
Minimal API SSE endpoint.
A single
app.MapPost()lambda creates the streaming endpoint with no controller class, no action method, and direct access toHttpResponse. -
Fetch API ReadableStream consumption.
chat.jsconsumes the SSE stream usingresponse.body.getReader(), which supports POST-based streaming and full control over chunk buffering and parsing.
Together these three pieces produce a responsive real-time AI chat interface in roughly 150 lines of server code and 200 lines of JavaScript, delivering the same token-by-token streaming experience as commercially hosted AI products.
The code for this article can be downloaded from this link.
