Qwen API

Archived

Unofficial Python SDK for Qwen AI

Qwen API is an unofficial Python SDK that provides seamless access to Qwen AI models including qwen-max, qwen-plus, qwq-32b, and more. It features real-time streaming, async/sync support, web search integration, file upload, function calling (tools), advanced reasoning modes (thinking & web development), and native LlamaIndex integration. The package is fully typed with Pydantic models for a great developer experience.

GitHub

PythonPydanticLlamaIndexhttpxAlibaba Cloud OSS

Features

Multiple Model Support

Supports 10+ Qwen models including qwen-max-latest, qwen-plus-latest, qwq-32b, qwen-turbo, and specialized vision/coder models.

Streaming & Async

Real-time token-by-token streaming with both synchronous and asynchronous API support.

Function Calling

Extend AI capabilities with custom tools and function calling for complex workflows.

LlamaIndex Integration

Native support for LlamaIndex framework with a dedicated qwen-llamaindex package.

Documentation

Getting Started

Installation

pip install qwen-api

LlamaIndex Integration

pip install qwen-llamaindex

Quick Start

from qwen_api import Qwen
from qwen_api.core.types.chat import ChatMessage

client = Qwen()

messages = [ChatMessage(
    role="user",
    content="What is artificial intelligence?",
    web_search=False,
    thinking=False
)]

response = client.chat.create(
    messages=messages,
    model="qwen-max-latest"
)

print(response.choices.message.content)

Streaming Response

stream = client.chat.create(
    messages=messages,
    model="qwen-max-latest",
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Async Usage

import asyncio

async def main():
    client = Qwen()
    messages = [ChatMessage(
        role="user",
        content="Explain quantum computing",
        web_search=True,
        thinking=True
    )]
    response = await client.chat.acreate(
        messages=messages,
        model="qwen-max-latest"
    )
    print(response.choices.message.content)

asyncio.run(main())