Lesson CH04-L02

Streamed Output

ストリーミングで部分構造を逐次受け取り、UIに流す。

読了目安: 10 min
Colab目安: 16 min
合計: 26 min
前提: output_type

一行サマリ

agent.run_stream(...) は応答全体を待たず、部分的なテキストや構造化データを逐次受け取る ストリーミング API。Web UI に「タイピング中」を見せたり、長い応答の体感速度を上げるのに使う。

ヒーロー: 一括 vs ストリーミング

通常の run / run_sync は 応答全体が確定してから返ります。run_stream は 生成の途中から逐次 受け取れるので、ユーザー体感の "待ち時間" を実質ゼロに近づけられます。

図を読み込み中…

図1. run と run_stream の体感時間比較

概念: 何がストリーミングできるのか

ストリーミングの中身は 2 種類あります。

種別	API	中身
テキストストリーミング	`response.stream_text()`	部分的に伸びていく文字列
構造化ストリーミング	`response.stream(output_type)`	部分的に埋まっていく Pydantic Model (一部フィールドが None だったり値が暫定)

どちらも async with agent.run_stream(...) as response: で response を開き、async for chunk in ... で受け取ります。

コード: 3 つのパターン

パターン 1: テキスト逐次表示 (タイピング風)

import asyncio
from pydantic_ai import Agent
 
agent = Agent('google-gla:gemini-3-flash-preview', instructions='短い日本語で。')
 
async def main():
    async with agent.run_stream('PydanticAI の特徴を 3 つ教えて') as response:
        async for text in response.stream_text(delta=True):
            print(text, end='', flush=True)
        print()  # 末尾改行
 
asyncio.run(main())

ポイント:

stream_text(delta=True) は 新しく届いた差分だけ を返す (Ch0-L02 で覚えた書き方)
delta=False (デフォルト) は累積テキストを返す
ターミナルでは \r でカーソル戻しもできるが、Colab/Jupyter では改行で見える

パターン 2: 構造化応答のストリーミング

output_type を指定した Agent を run_stream で開くと、部分的に埋まった Pydantic Model を逐次受け取れます。

import asyncio
from pydantic import BaseModel
from pydantic_ai import Agent
 
class Recipe(BaseModel):
    title: str
    ingredients: list[str]
    steps: list[str]
 
agent_r = Agent(
    'google-gla:gemini-3-flash-preview',
    output_type=Recipe,
    instructions='レシピを構造化して返してください。',
)
 
async def main():
    async with agent_r.run_stream('簡単なオムレツのレシピを教えて') as response:
        async for partial in response.stream():
            # partial は Recipe (の部分的な状態)
            print(f'\\rtitle={partial.title or "(未確定)"} / ingredients={len(partial.ingredients or [])} 件', end='')
        print()
        # ループを抜けたら完成形が確定
        final = await response.get_output()
        print(f'\\nFINAL: {final}')
 
asyncio.run(main())

ポイント:

response.stream() は output_type の 部分的に埋まったインスタンス を yield する
早い段階では title だけ埋まり ingredients=[] のような状態がありうる
最終結果は await response.get_output() で取れる (完全に検証済み)

パターン 3: 部分結果を Web UI に流す (FastAPI 想定)

実プロジェクトでは、ストリームをそのまま HTTP SSE (Server-Sent Events) に流すのが王道。

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic_ai import Agent
 
app = FastAPI()
agent = Agent('google-gla:gemini-3-flash-preview', instructions='短く日本語で。')
 
@app.get('/chat')
async def chat(q: str):
    async def gen():
        async with agent.run_stream(q) as response:
            async for text in response.stream_text(delta=True):
                yield f'data: {text}\\n\\n'
        yield 'data: [DONE]\\n\\n'
    return StreamingResponse(gen(), media_type='text/event-stream')

ブラウザ側は EventSource で受けて、文字を逐次描画します。これで「ChatGPT 風のタイピング表示」が完成。

図を読み込み中…

図2. ストリーミングのライフサイクル

いつストリーミングを使うか

状況	選ぶ
Colab / CLI で結果だけ欲しい	`run_sync`
バッチ処理	`run`
Web UI に「タイピング中」を見せたい	`run_stream` + stream_text(delta=True)`
構造化応答の各フィールドが順に埋まる様を UI で見せたい	`run_stream` + `stream()`
1 文字ずつ TTS / 音声合成にかける	`run_stream` + delta

ストリーミングは ユーザー体感 が改善する一方、コードはやや複雑になります。「待ち時間が UX を決める」場面でだけ 採用するのが無難です。

⚠️ つまずきポイント

async with を忘れる agent.run_stream(...) は async context manager。async with で受けないと内部リソースが解放されません。

stream_text の delta フラグを忘れる デフォルト delta=False は 累積テキスト が毎回返ります。前回との差分を自分で計算するか、delta=True を渡す。

get_output を呼ばずに最終結果を期待する ループを抜けただけでは結果オブジェクトは確定しません。await response.get_output() で明示的に取り出す。

Streaming 中に validator を期待する @output_validator は 最終結果に対してのみ 走ります。Streaming 途中の部分応答は検証されません。

Colab で \\r がうまく効かない Colab はターミナルではないので \\r でカーソル戻しはできず、各 chunk が改行付きで列挙されます。本番 (FastAPI 等) で意図通り動くなら問題なし。

まとめ

agent.run_stream(...) を async with で開くと、応答を逐次受け取れる
stream_text(delta=True) で差分テキスト、stream(output_type) で部分的な構造化モデル
最終結果は await response.get_output()
UI / TTS など「待ち時間が UX を決める」場面で使う

次レッスンでは、複数の output_type 候補を Union 型 で表現し、LLM に「どの型を返すか」を選ばせるパターンを扱います。

Colab で実際に動かす

本レッスンの内容を Google Colab 上で実行できるノートブックを用意しています。下のボタンから自分のColab環境に開けます (要 Google アカウント / GOOGLE_API_KEY)。

Open in Colab

notebooks/ch04/02-streamed-output.ipynb