API Documentation

Complete reference for the Weatmood AI API. Integrate powerful AI models into your applications with our REST-compatible endpoints.

OpenAI Compatible. Weatmood is fully compatible with the OpenAI API specification. Use your existing OpenAI SDKs with a single base URL change.

GLM-4.7

glm-4.7

Advanced reasoning model with chain-of-thought thinking. Best for complex analysis.

203K contextReasoning

GLM-5.2

glm-5.2

Next-generation model with enhanced reasoning and expanded 256K context.

256K contextReasoning

Qwen3.6-27B

qwen3.6-27b

Efficient open-weight model optimized for code generation and multilingual tasks.

262K contextCodeMultilingual

Quick Start

Get up and running with Weatmood API in under 5 minutes.

Step 1 — Get Your API Key

weatmood-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Step 2 — Make Your First Request

The simplest way to test your key:

bash
curl https://weatmood.ru/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Step 3 — Check Available Models

bash
curl https://weatmood.ru/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

Step 4 — Health Check

bash

curl https://weatmood.ru/health

Tip: We recommend setting your base URL to https://weatmood.ru and using the OpenAI SDK — no additional configuration needed.

Authentication

All API requests must include a valid API key in the Authorization header.

Bearer Token

Include your API key as a Bearer token in the Authorization header:

Header
Authorization: Bearer weatmood-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Keep your API key secret. Never expose your API key in client-side code, public repositories, or logs. If compromised, revoke it immediately from the admin panel.

Key Rotation

You can create multiple API keys and revoke old ones at any time from the admin panel. We recommend rotating keys periodically.

Rate Limits

Each API key has configurable rate limits (default: 60 requests per minute). Contact us for higher limits.

List Models GET

Returns a list of all available models and their capabilities.

GEThttps://weatmood.ru/v1/models

Response

JSON
{
  "object": "list",
  "data": [
    {
      "id": "glm-4.7",
      "object": "model",
      "context_length": 203000,
      "reasoning": true,
      "supports_files": true,
      "supports_images": true,
      "created": 1700000000
    },
    ...
  ]
}

Chat Completions POST

The core endpoint for generating AI chat completions. Works with the OpenAI chat format.

POSThttps://weatmood.ru/v1/chat/completions

Request Body

Field	Type	Required	Description
`model`*	string	Yes	Model ID: `glm-4.7`, `glm-5.2`, or `qwen3.6-27b`
`messages`*	array	Yes	Array of message objects with `role` and `content`
`max_tokens`	integer	No	Maximum tokens to generate (default: 16384)
`temperature`	number	No	Sampling temperature (default: 0.7, range: 0–2)
`stream`	boolean	No	Enable streaming responses (default: false)
`top_p`	number	No	Nucleus sampling threshold
`stop`	array/string	No	Stop sequences to end generation

Message Roles

Role	Description
`system`	Instructions that set the behavior of the assistant
`user`	Messages from the human
`assistant`	Previous responses from the assistant (for multi-turn conversations)

Minimal Request

JSON Body
{
  "model": "glm-5.2",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ]
}

Full Request Example

JSON Body
{
  "model": "glm-4.7",
  "messages": [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to calculate fibonacci numbers"}
  ],
  "max_tokens": 1024,
  "temperature": 0.7
}

Response

JSON
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1719000000,
  "model": "glm-5.2",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Here's a Python function...",
      "reasoning": "Let me think step by step..."  // GLM reasoning models only
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 342,
    "total_tokens": 367
  }
}

Reasoning models (glm-4.7, glm-5.2): Responses include a reasoning field containing the model's chain-of-thought thinking process, followed by the final content answer.

IDE Mode POST

Use the IDE endpoint for simple chat applications. Reasoning is stripped and moved into the content, returning only the clean response.

POSThttps://weatmood.ru/ide/v1/chat/completions

When to use IDE mode: If you don't need the reasoning/thinking trace and want a simple clean response — use this endpoint. Works great for chatbots, IDE plugins, and simple Q&A applications.

bash
curl https://weatmood.ru/ide/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4.7",
    "messages": [{"role": "user", "content": "Explain recursion in one sentence"}]
  }'

Note: The /ide/ prefix is used to distinguish from standard chat completions. The same models are available.

Health Check GET

Check if the API is operational. No authentication required.

GEThttps://weatmood.ru/health

JSON Response
{
  "status": "ok",
  "version": "1.0",
  "models": {
    "glm-4.7": "operational",
    "glm-5.2": "operational",
    "qwen3.6-27b": "operational"
  }
}

You can also check weatmood.ru/status for a visual dashboard with model latency and availability metrics.

Python SDK

The fastest way to integrate Weatmood into your Python project.

Installation

bash

pip install openai

Basic Chat

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://weatmood.ru",
    api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
    model="glm-5.2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    max_tokens=1024,
    temperature=0.7
)

print(response.choices[0].message.content)
# Paris is the capital and largest city of France.

Multi-turn Conversation

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://weatmood.ru",
    api_key="YOUR_API_KEY"
)

messages = [
    {"role": "system", "content": "You are a Python expert."},
]

while True:
    user_input = input("You: ")
    if user_input.lower() == "exit":
        break
    messages.append({"role": "user", "content": user_input})
    response = client.chat.completions.create(
        model="qwen3.6-27b",
        messages=messages,
        max_tokens=2048
    )
    answer = response.choices[0].message.content
    messages.append({"role": "assistant", "content": answer})
    print(f"Assistant: {answer}")

Streaming Responses

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://weatmood.ru",
    api_key="YOUR_API_KEY"
)

stream = client.chat.completions.create(
    model="glm-5.2",
    messages=[{"role": "user", "content": "Write a haiku about coding:"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Reasoning Model with Thinking

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://weatmood.ru",
    api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[{"role": "user", "content": "What is 17 * 23?"}]
)

msg = response.choices[0].message
print("Thinking:", msg.reasoning)
print("Answer:", msg.content)

cURL Examples

Direct HTTP requests using cURL. Works in any terminal without installing SDKs.

Basic Chat Completion

bash
curl https://weatmood.ru/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

With System Prompt and Parameters

bash
curl https://weatmood.ru/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4.7",
    "messages": [
      {"role": "system", "content": "You are a concise technical assistant."},
      {"role": "user", "content": "Explain what a REST API is"}
    ],
    "max_tokens": 512,
    "temperature": 0.5
  }'

Streaming Response

bash
curl https://weatmood.ru/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6-27b",
    "messages": [{"role": "user", "content": "Count from 1 to 5"}],
    "stream": true
  }'

Streaming responses use Server-Sent Events (SSE). Each chunk has data: {...} format ending with data: [DONE].

IDE Mode (No Reasoning)

bash
curl https://weatmood.ru/ide/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4.7",
    "messages": [{"role": "user", "content": "What is 2+2?"}]
  }'

Code Generation with Qwen

bash
curl https://weatmood.ru/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6-27b",
    "messages": [
      {"role": "system", "content": "You are an expert Python developer."},
      {"role": "user", "content": "Write a function that checks if a string is a palindrome"}
    ],
    "max_tokens": 512,
    "temperature": 0.3
  }'

List Available Models

bash
curl https://weatmood.ru/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

JavaScript / Node.js

Use the OpenAI SDK in Node.js or browser environments.

Installation

bash

npm install openai

Basic Chat

JavaScript (Node.js)
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://weatmood.ru',
  apiKey: 'YOUR_API_KEY',
});

const response = await client.chat.completions.create({
  model: 'glm-5.2',
  messages: [
    {role: 'user', content: 'Hello!'}
  ],
});

console.log(response.choices[0].message.content);

Streaming in Node.js

JavaScript (Node.js)
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://weatmood.ru',
  apiKey: 'YOUR_API_KEY',
});

const stream = await client.chat.completions.create({
  model: 'glm-5.2',
  messages: [{role: 'user', content: 'Write a story about a robot.'}],
  stream: true,
});

for await (const chunk of stream) {
  if (chunk.choices[0].delta.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}

Browser / Fetch API

JavaScript (Browser)
// Direct fetch (no SDK) - browser compatible
const response = await fetch('https://weatmood.ru/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'glm-5.2',
    messages: [{role: 'user', content: 'Hello!'}],
    max_tokens: 1024
  })
});

const data = await response.json();
console.log(data.choices[0].message.content);

Go

Make HTTP requests directly in Go without an SDK.

Go
package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

func main() {
    payload := map[string]any{
        "model": "glm-5.2",
        "messages": []map[string]string{
            {"role": "user", "content": "Hello!"},
        },
    }
    body, _ := json.Marshal(payload)

    req, _ := http.NewRequest("POST", "https://weatmood.ru/v1/chat/completions", bytes.NewBuffer(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")

    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
        fmt.Println("Error:", err)
        return
    }
    defer resp.Body.Close()

    result, _ := io.ReadAll(resp.Body)
    fmt.Println(string(result))
}

Streaming Responses

Enable real-time token-by-token responses by setting stream: true. Each chunk arrives as a Server-Sent Event (SSE).

Use case: Chat UIs, real-time text generation, code autocompletion, and anywhere you want the response to appear progressively.

cURL Streaming

bash
curl https://weatmood.ru/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"glm-5.2","messages":[{"role":"user","content":"Write a short story:"}],"stream":true}'

Sample SSE Chunks

Server-Sent Events
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"glm-5.2","choices":[{"index":0,"delta":{"content":"Once"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"glm-5.2","choices":[{"index":0,"delta":{"content":" upon"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"glm-5.2","choices":[{"index":0,"delta":{"content":" a"},"logprobs":null,"finish_reason":null}]}

... more chunks ...

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"glm-5.2","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}

data: [DONE]

Python Streaming

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://weatmood.ru",
    api_key="YOUR_API_KEY"
)

stream = client.chat.completions.create(
    model="glm-5.2",
    messages=[{"role": "user", "content": "Count: 1, 2, 3"}],
    stream=True
)

for chunk in stream:
    token = chunk.choices[0].delta.content
    if token:
        print(token, end="", flush=True)

OpenAI SDK Compatibility

Weatmood is a drop-in replacement for OpenAI. Just change the base URL and API key.

Compatible with: Python, Node.js, Go, Ruby, Java, C#, PHP, and any language with an OpenAI SDK or HTTP client.

SDK Configuration

Python
# OpenAI
client = OpenAI(
    base_url="https://api.openai.com/v1",
    api_key="sk-..."
)

# Weatmood — just swap the URL and key
client = OpenAI(
    base_url="https://weatmood.ru",
    api_key="weatmood-YOUR_KEY"
)

Supported OpenAI Features

Feature	Status	Notes
Chat Completions	Supported	All parameters
Streaming	Supported	SSE format
System Messages	Supported
Multi-turn Conversations	Supported	Pass full message history
Temperature	Supported	0–2 range
Max Tokens	Supported	Default: 16384
Stop Sequences	Supported
Token Usage Reporting	Supported	In response usage object
List Models	Supported	GET /v1/models
Fine-tuning	N/A	We provide hosted models only
Images / Multimodal	Limited	Check per-model capabilities

Pricing Guide

Transparent, pay-per-use pricing. No subscriptions or hidden fees.

Token Pricing

Prices are per 1,000,000 (1M) tokens. Both input (prompt) and output (completion) tokens are counted and charged separately.

GLM-4.7glm-4.7 · Reasoning Model

$0.30per 1M input

$1.30per 1M output

GLM-5.2glm-5.2 · Latest Generation

$0.70per 1M input

$2.50per 1M output

Qwen3.6-27Bqwen3.6-27b · Code & Multilingual

$0.20per 1M input

$1.50per 1M output

Token Counting: Every API response includes a usage object showing prompt_tokens, completion_tokens, and total_tokens. You are billed based on the actual tokens used.

Example Cost Calculation

A single chat request with:
  - 500 input tokens (your prompt)
  - 1000 output tokens (AI response)

GLM-5.2 cost:
  Input:  500 / 1,000,000 × $0.70 = $0.00035
  Output: 1000 / 1,000,000 × $2.50 = $0.00250
  Total:  $0.00285 per request

What's Included

All available models — no per-model surcharges
Streaming responses at no extra cost
Multi-turn conversations (you pay for total tokens in the conversation)
System prompts count as input tokens
Context windows are the maximum, you only pay for tokens used

Custom Pricing

For high-volume usage, dedicated infrastructure, or custom models, contact us at Fillsites0@gmail.com.

Models Guide

Choose the right model for your use case.

Model Comparison

	GLM-4.7	GLM-5.2	Qwen3.6-27B
Context	203,000 tokens	256,000 tokens	262,000 tokens
Type	Reasoning	Reasoning	General / Code
Use case	Complex analysis, math, logic	Advanced reasoning, long documents	Code, multilingual, efficiency
Input price	$0.30/1M	$0.70/1M	$0.20/1M
Output price	$1.30/1M	$2.50/1M	$1.50/1M
Streaming	✔	✔	✔
Files	✔	✔	✔

When to Use Each Model

GLM-4.7 — Reasoning Model

Best for tasks requiring step-by-step thinking: mathematical proofs, logical analysis, strategic planning, debugging complex code, research synthesis. The model shows its reasoning process before giving the final answer.

GLM-5.2 — Latest Generation

Enhanced capabilities over GLM-4.7 with a larger context window. Best when you need the best possible reasoning quality and can process very long documents or conversations. Higher cost, but superior results.

Qwen3.6-27B — Efficient Coder

Open-weight model optimized for code generation, translation, and multilingual tasks. Lower input cost makes it economical for high-volume applications. Great for chatbots, content generation, and repetitive tasks.

IDE Mode vs Standard Mode

// Standard mode (/v1/chat/completions) — for GLM reasoning models:
{
  "message": {
    "content": "391",                    // Final answer
    "reasoning": "17 x 23 = 391"          // Thinking trace
  }
}

// IDE mode (/ide/v1/chat/completions) — reasoning is moved into content:
{
  "message": {
    "content": "17 x 23 = 391"             // Clean answer only
  }
}

Error Handling

Weatmood returns standard HTTP status codes with JSON error bodies.

Error Response Format

JSON
{
  "error": {
    "message": "Invalid API key",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

HTTP Status Codes

Code	Meaning	Common Cause
200	OK	Request successful
400	Bad Request	Invalid JSON, missing required fields
401	Unauthorized	Invalid or missing API key
429	Rate Limited	Too many requests per minute
500	Server Error	Upstream provider issue — retry with backoff
503	Unavailable	Model temporarily offline

Retry Strategy

Python
import time

def call_with_retry(client, payload, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(**payload)
            return response
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            wait = 2 ** attempt  # Exponential backoff: 1s, 2s, 4s
            time.sleep(wait)

Rate Limits

Rate limits protect the API from abuse and ensure fair access for all users.

Default Limits

Plan	Requests/Minute	Tokens/Minute
Default (free)	60	120,000
Enterprise	Custom	Custom

Rate limit exceeded (HTTP 429): Wait before retrying. Implement exponential backoff in your client. Contact us to increase limits for production workloads.

Headers

Each response includes rate limit information in headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1719000060  // Unix timestamp when limit resets