API Documentation
Complete reference for the Weatmood AI API. Integrate powerful AI models into your applications with our REST-compatible endpoints.
GLM-4.7
Advanced reasoning model with chain-of-thought thinking. Best for complex analysis.
GLM-5.2
Next-generation model with enhanced reasoning and expanded 256K context.
Qwen3.6-27B
Efficient open-weight model optimized for code generation and multilingual tasks.
Quick Start
Get up and running with Weatmood API in under 5 minutes.
Step 1 — Get Your API Key
Sign up at weatmood.ru/admin to receive your personal API key. Your API key looks like:
weatmood-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxStep 2 — Make Your First Request
The simplest way to test your key:
curl https://weatmood.ru/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "glm-5.2", "messages": [{"role": "user", "content": "Hello!"}] }'
Step 3 — Check Available Models
curl https://weatmood.ru/v1/models \ -H "Authorization: Bearer YOUR_API_KEY"
Step 4 — Health Check
curl https://weatmood.ru/health
https://weatmood.ru and using the OpenAI SDK — no additional configuration needed.
Authentication
All API requests must include a valid API key in the Authorization header.
Bearer Token
Include your API key as a Bearer token in the Authorization header:
Authorization: Bearer weatmood-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Key Rotation
You can create multiple API keys and revoke old ones at any time from the admin panel. We recommend rotating keys periodically.
Rate Limits
Each API key has configurable rate limits (default: 60 requests per minute). Contact us for higher limits.
List Models GET
Returns a list of all available models and their capabilities.
Response
{
"object": "list",
"data": [
{
"id": "glm-4.7",
"object": "model",
"context_length": 203000,
"reasoning": true,
"supports_files": true,
"supports_images": true,
"created": 1700000000
},
...
]
}Chat Completions POST
The core endpoint for generating AI chat completions. Works with the OpenAI chat format.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
model* | string | Yes | Model ID: glm-4.7, glm-5.2, or qwen3.6-27b |
messages* | array | Yes | Array of message objects with role and content |
max_tokens | integer | No | Maximum tokens to generate (default: 16384) |
temperature | number | No | Sampling temperature (default: 0.7, range: 0–2) |
stream | boolean | No | Enable streaming responses (default: false) |
top_p | number | No | Nucleus sampling threshold |
stop | array/string | No | Stop sequences to end generation |
Message Roles
| Role | Description |
|---|---|
system | Instructions that set the behavior of the assistant |
user | Messages from the human |
assistant | Previous responses from the assistant (for multi-turn conversations) |
Minimal Request
{
"model": "glm-5.2",
"messages": [
{"role": "user", "content": "Hello!"}
]
}Full Request Example
{
"model": "glm-4.7",
"messages": [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to calculate fibonacci numbers"}
],
"max_tokens": 1024,
"temperature": 0.7
}Response
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1719000000,
"model": "glm-5.2",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Here's a Python function...",
"reasoning": "Let me think step by step..." // GLM reasoning models only
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 342,
"total_tokens": 367
}
}reasoning field containing the model's chain-of-thought thinking process, followed by the final content answer.
IDE Mode POST
Use the IDE endpoint for simple chat applications. Reasoning is stripped and moved into the content, returning only the clean response.
curl https://weatmood.ru/ide/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "glm-4.7", "messages": [{"role": "user", "content": "Explain recursion in one sentence"}] }'
/ide/ prefix is used to distinguish from standard chat completions. The same models are available.Health Check GET
Check if the API is operational. No authentication required.
{
"status": "ok",
"version": "1.0",
"models": {
"glm-4.7": "operational",
"glm-5.2": "operational",
"qwen3.6-27b": "operational"
}
}Python SDK
The fastest way to integrate Weatmood into your Python project.
Installation
pip install openaiBasic Chat
from openai import OpenAI client = OpenAI( base_url="https://weatmood.ru", api_key="YOUR_API_KEY" ) response = client.chat.completions.create( model="glm-5.2", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"} ], max_tokens=1024, temperature=0.7 ) print(response.choices[0].message.content) # Paris is the capital and largest city of France.
Multi-turn Conversation
from openai import OpenAI client = OpenAI( base_url="https://weatmood.ru", api_key="YOUR_API_KEY" ) messages = [ {"role": "system", "content": "You are a Python expert."}, ] while True: user_input = input("You: ") if user_input.lower() == "exit": break messages.append({"role": "user", "content": user_input}) response = client.chat.completions.create( model="qwen3.6-27b", messages=messages, max_tokens=2048 ) answer = response.choices[0].message.content messages.append({"role": "assistant", "content": answer}) print(f"Assistant: {answer}")
Streaming Responses
from openai import OpenAI client = OpenAI( base_url="https://weatmood.ru", api_key="YOUR_API_KEY" ) stream = client.chat.completions.create( model="glm-5.2", messages=[{"role": "user", "content": "Write a haiku about coding:"}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True)
Reasoning Model with Thinking
from openai import OpenAI client = OpenAI( base_url="https://weatmood.ru", api_key="YOUR_API_KEY" ) response = client.chat.completions.create( model="glm-4.7", messages=[{"role": "user", "content": "What is 17 * 23?"}] ) msg = response.choices[0].message print("Thinking:", msg.reasoning) print("Answer:", msg.content)
cURL Examples
Direct HTTP requests using cURL. Works in any terminal without installing SDKs.
Basic Chat Completion
curl https://weatmood.ru/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "glm-5.2", "messages": [{"role": "user", "content": "Hello!"}] }'
With System Prompt and Parameters
curl https://weatmood.ru/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "glm-4.7", "messages": [ {"role": "system", "content": "You are a concise technical assistant."}, {"role": "user", "content": "Explain what a REST API is"} ], "max_tokens": 512, "temperature": 0.5 }'
Streaming Response
curl https://weatmood.ru/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3.6-27b", "messages": [{"role": "user", "content": "Count from 1 to 5"}], "stream": true }'
data: {...} format ending with data: [DONE].IDE Mode (No Reasoning)
curl https://weatmood.ru/ide/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "glm-4.7", "messages": [{"role": "user", "content": "What is 2+2?"}] }'
Code Generation with Qwen
curl https://weatmood.ru/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3.6-27b", "messages": [ {"role": "system", "content": "You are an expert Python developer."}, {"role": "user", "content": "Write a function that checks if a string is a palindrome"} ], "max_tokens": 512, "temperature": 0.3 }'
List Available Models
curl https://weatmood.ru/v1/models \ -H "Authorization: Bearer YOUR_API_KEY"
JavaScript / Node.js
Use the OpenAI SDK in Node.js or browser environments.
Installation
npm install openaiBasic Chat
import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://weatmood.ru', apiKey: 'YOUR_API_KEY', }); const response = await client.chat.completions.create({ model: 'glm-5.2', messages: [ {role: 'user', content: 'Hello!'} ], }); console.log(response.choices[0].message.content);
Streaming in Node.js
import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://weatmood.ru', apiKey: 'YOUR_API_KEY', }); const stream = await client.chat.completions.create({ model: 'glm-5.2', messages: [{role: 'user', content: 'Write a story about a robot.'}], stream: true, }); for await (const chunk of stream) { if (chunk.choices[0].delta.content) { process.stdout.write(chunk.choices[0].delta.content); } }
Browser / Fetch API
// Direct fetch (no SDK) - browser compatible const response = await fetch('https://weatmood.ru/v1/chat/completions', { method: 'POST', headers: { 'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'glm-5.2', messages: [{role: 'user', content: 'Hello!'}], max_tokens: 1024 }) }); const data = await response.json(); console.log(data.choices[0].message.content);
Go
Make HTTP requests directly in Go without an SDK.
package main import ( "bytes" "encoding/json" "fmt" "io" "net/http" ) func main() { payload := map[string]any{ "model": "glm-5.2", "messages": []map[string]string{ {"role": "user", "content": "Hello!"}, }, } body, _ := json.Marshal(payload) req, _ := http.NewRequest("POST", "https://weatmood.ru/v1/chat/completions", bytes.NewBuffer(body)) req.Header.Set("Authorization", "Bearer YOUR_API_KEY") req.Header.Set("Content-Type", "application/json") client := &http.Client{} resp, err := client.Do(req) if err != nil { fmt.Println("Error:", err) return } defer resp.Body.Close() result, _ := io.ReadAll(resp.Body) fmt.Println(string(result)) }
Streaming Responses
Enable real-time token-by-token responses by setting stream: true. Each chunk arrives as a Server-Sent Event (SSE).
cURL Streaming
curl https://weatmood.ru/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"glm-5.2","messages":[{"role":"user","content":"Write a short story:"}],"stream":true}'
Sample SSE Chunks
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"glm-5.2","choices":[{"index":0,"delta":{"content":"Once"},"logprobs":null,"finish_reason":null}]} data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"glm-5.2","choices":[{"index":0,"delta":{"content":" upon"},"logprobs":null,"finish_reason":null}]} data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"glm-5.2","choices":[{"index":0,"delta":{"content":" a"},"logprobs":null,"finish_reason":null}]} ... more chunks ... data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"glm-5.2","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]} data: [DONE]
Python Streaming
from openai import OpenAI client = OpenAI( base_url="https://weatmood.ru", api_key="YOUR_API_KEY" ) stream = client.chat.completions.create( model="glm-5.2", messages=[{"role": "user", "content": "Count: 1, 2, 3"}], stream=True ) for chunk in stream: token = chunk.choices[0].delta.content if token: print(token, end="", flush=True)
OpenAI SDK Compatibility
Weatmood is a drop-in replacement for OpenAI. Just change the base URL and API key.
SDK Configuration
# OpenAI client = OpenAI( base_url="https://api.openai.com/v1", api_key="sk-..." ) # Weatmood — just swap the URL and key client = OpenAI( base_url="https://weatmood.ru", api_key="weatmood-YOUR_KEY" )
Supported OpenAI Features
| Feature | Status | Notes |
|---|---|---|
| Chat Completions | Supported | All parameters |
| Streaming | Supported | SSE format |
| System Messages | Supported | |
| Multi-turn Conversations | Supported | Pass full message history |
| Temperature | Supported | 0–2 range |
| Max Tokens | Supported | Default: 16384 |
| Stop Sequences | Supported | |
| Token Usage Reporting | Supported | In response usage object |
| List Models | Supported | GET /v1/models |
| Fine-tuning | N/A | We provide hosted models only |
| Images / Multimodal | Limited | Check per-model capabilities |
Pricing Guide
Transparent, pay-per-use pricing. No subscriptions or hidden fees.
Token Pricing
Prices are per 1,000,000 (1M) tokens. Both input (prompt) and output (completion) tokens are counted and charged separately.
usage object showing prompt_tokens, completion_tokens, and total_tokens. You are billed based on the actual tokens used.
Example Cost Calculation
A single chat request with: - 500 input tokens (your prompt) - 1000 output tokens (AI response) GLM-5.2 cost: Input: 500 / 1,000,000 × $0.70 = $0.00035 Output: 1000 / 1,000,000 × $2.50 = $0.00250 Total: $0.00285 per request
What's Included
- All available models — no per-model surcharges
- Streaming responses at no extra cost
- Multi-turn conversations (you pay for total tokens in the conversation)
- System prompts count as input tokens
- Context windows are the maximum, you only pay for tokens used
Custom Pricing
For high-volume usage, dedicated infrastructure, or custom models, contact us at Fillsites0@gmail.com.
Models Guide
Choose the right model for your use case.
Model Comparison
| GLM-4.7 | GLM-5.2 | Qwen3.6-27B | |
|---|---|---|---|
| Context | 203,000 tokens | 256,000 tokens | 262,000 tokens |
| Type | Reasoning | Reasoning | General / Code |
| Use case | Complex analysis, math, logic | Advanced reasoning, long documents | Code, multilingual, efficiency |
| Input price | $0.30/1M | $0.70/1M | $0.20/1M |
| Output price | $1.30/1M | $2.50/1M | $1.50/1M |
| Streaming | ✔ | ✔ | ✔ |
| Files | ✔ | ✔ | ✔ |
When to Use Each Model
GLM-4.7 — Reasoning Model
Best for tasks requiring step-by-step thinking: mathematical proofs, logical analysis, strategic planning, debugging complex code, research synthesis. The model shows its reasoning process before giving the final answer.
GLM-5.2 — Latest Generation
Enhanced capabilities over GLM-4.7 with a larger context window. Best when you need the best possible reasoning quality and can process very long documents or conversations. Higher cost, but superior results.
Qwen3.6-27B — Efficient Coder
Open-weight model optimized for code generation, translation, and multilingual tasks. Lower input cost makes it economical for high-volume applications. Great for chatbots, content generation, and repetitive tasks.
IDE Mode vs Standard Mode
// Standard mode (/v1/chat/completions) — for GLM reasoning models: { "message": { "content": "391", // Final answer "reasoning": "17 x 23 = 391" // Thinking trace } } // IDE mode (/ide/v1/chat/completions) — reasoning is moved into content: { "message": { "content": "17 x 23 = 391" // Clean answer only } }
Error Handling
Weatmood returns standard HTTP status codes with JSON error bodies.
Error Response Format
{
"error": {
"message": "Invalid API key",
"type": "authentication_error",
"code": "invalid_api_key"
}
}HTTP Status Codes
| Code | Meaning | Common Cause |
|---|---|---|
| 200 | OK | Request successful |
| 400 | Bad Request | Invalid JSON, missing required fields |
| 401 | Unauthorized | Invalid or missing API key |
| 429 | Rate Limited | Too many requests per minute |
| 500 | Server Error | Upstream provider issue — retry with backoff |
| 503 | Unavailable | Model temporarily offline |
Retry Strategy
import time def call_with_retry(client, payload, max_retries=3): for attempt in range(max_retries): try: response = client.chat.completions.create(**payload) return response except Exception as e: if attempt == max_retries - 1: raise wait = 2 ** attempt # Exponential backoff: 1s, 2s, 4s time.sleep(wait)
Rate Limits
Rate limits protect the API from abuse and ensure fair access for all users.
Default Limits
| Plan | Requests/Minute | Tokens/Minute |
|---|---|---|
| Default (free) | 60 | 120,000 |
| Enterprise | Custom | Custom |
Headers
Each response includes rate limit information in headers:
X-RateLimit-Limit: 60 X-RateLimit-Remaining: 45 X-RateLimit-Reset: 1719000060 // Unix timestamp when limit resets