Call the Claude API with Python: Your First Script

Send your first message, handle the response, stream output, and use a system prompt — the building blocks for any Claude-powered application.

The Claude API takes a list of messages and returns a response. Everything else — memory, tools, multi-turn conversations — builds on top of that basic structure. This tutorial gets you to a working script in a few minutes and explains the parts that trip people up.

Install the SDK

pip install anthropic

Your first message

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from environment

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain what a Unix socket is in two sentences."}
    ]
)

print(message.content[0].text)

Set your API key before running:

export ANTHROPIC_API_KEY="sk-ant-..."

Or pass it directly — useful during development, never do this in committed code:

client = anthropic.Anthropic(api_key="sk-ant-...")

The response object

message is a Message object. The parts you will use most:

message.content[0].text     # the response text
message.model               # which model responded
message.stop_reason         # "end_turn", "max_tokens", "stop_sequence"
message.usage.input_tokens  # tokens consumed by your input
message.usage.output_tokens # tokens in the response

stop_reason is worth checking. If it is "max_tokens", the response was cut off at your max_tokens limit — the answer is incomplete.

Add a system prompt

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a senior Linux sysadmin. Answer concisely. Show commands before explanations.",
    messages=[
        {"role": "user", "content": "How do I find what process is using port 8080?"}
    ]
)

The system parameter sets persistent context for the conversation. It is separate from the messages list and applies to every turn.

Multi-turn conversations

The API is stateless — each call is independent. To simulate a conversation, pass the full message history on every request:

messages = []

def chat(user_input):
    messages.append({"role": "user", "content": user_input})

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system="You are a helpful coding assistant.",
        messages=messages
    )

    assistant_reply = response.content[0].text
    messages.append({"role": "assistant", "content": assistant_reply})
    return assistant_reply

print(chat("What does the walrus operator do in Python?"))
print(chat("Show me an example with a while loop."))

The messages list grows with each turn. Both roles (user and assistant) alternate — the API requires them to alternate strictly. If you need to skip a turn or inject a message, you must still maintain the alternating pattern.

Streaming

For long responses, streaming shows output as it is generated instead of waiting for the full response:

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[
        {"role": "user", "content": "Write a Python function to parse a cron expression."}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

print()  # newline after stream ends

stream.text_stream yields text deltas as they arrive. flush=True ensures each chunk is printed immediately rather than buffered.

Access the final message after streaming:

    final_message = stream.get_final_message()
    print(f"\nTokens used: {final_message.usage.input_tokens} in / {final_message.usage.output_tokens} out")

Error handling

import anthropic

client = anthropic.Anthropic()

try:
    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello"}]
    )
    print(message.content[0].text)

except anthropic.AuthenticationError:
    print("Invalid API key")
except anthropic.RateLimitError:
    print("Rate limit hit — back off and retry")
except anthropic.APIStatusError as e:
    print(f"API error {e.status_code}: {e.message}")

For production use, wrap calls in retry logic for RateLimitError and transient APIStatusError (5xx) responses. The SDK does not retry automatically.

Choosing a model

# Fast, inexpensive — good for classification, extraction, simple Q&A
model="claude-haiku-4-6"

# Balanced — most tasks, coding, analysis
model="claude-sonnet-4-6"

# Most capable — complex reasoning, long documents, nuanced writing
model="claude-opus-4-6"

Start with Sonnet. Move to Haiku for high-volume tasks where latency and cost matter. Move to Opus when Sonnet's output is not good enough. Any doubt? For more information about the differences between each model, refer to our "Claude Haiku, Sonnet, and Opus: Which Model to Use" tutorial.

Set max_tokens appropriately

max_tokens is a hard ceiling, not a target length. Claude will stop abruptly if it hits the limit mid-sentence. For open-ended generation, set it generously. For constrained outputs (a classification label, a JSON object), set it tightly to avoid paying for tokens you do not need.

A response cut off by max_tokens returns stop_reason: "max_tokens". Check for this if completeness matters.