Call the Claude API with Python: Your First Script
Send your first message, handle the response, stream output, and use a system prompt — the building blocks for any Claude-powered application.
The Claude API takes a list of messages and returns a response. Everything else — memory, tools, multi-turn conversations — builds on top of that basic structure. This tutorial gets you to a working script in a few minutes and explains the parts that trip people up.
Install the SDK
pip install anthropic
Your first message
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from environment
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain what a Unix socket is in two sentences."}
]
)
print(message.content[0].text)
Set your API key before running:
export ANTHROPIC_API_KEY="sk-ant-..."
Or pass it directly — useful during development, never do this in committed code:
client = anthropic.Anthropic(api_key="sk-ant-...")
The response object
message is a Message object. The parts you will use most:
message.content[0].text # the response text
message.model # which model responded
message.stop_reason # "end_turn", "max_tokens", "stop_sequence"
message.usage.input_tokens # tokens consumed by your input
message.usage.output_tokens # tokens in the response
stop_reason is worth checking. If it is "max_tokens", the response was cut off at your max_tokens limit — the answer is incomplete.
Add a system prompt
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a senior Linux sysadmin. Answer concisely. Show commands before explanations.",
messages=[
{"role": "user", "content": "How do I find what process is using port 8080?"}
]
)
The system parameter sets persistent context for the conversation. It is separate from the messages list and applies to every turn.
Multi-turn conversations
The API is stateless — each call is independent. To simulate a conversation, pass the full message history on every request:
messages = []
def chat(user_input):
messages.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a helpful coding assistant.",
messages=messages
)
assistant_reply = response.content[0].text
messages.append({"role": "assistant", "content": assistant_reply})
return assistant_reply
print(chat("What does the walrus operator do in Python?"))
print(chat("Show me an example with a while loop."))
The messages list grows with each turn. Both roles (user and assistant) alternate — the API requires them to alternate strictly. If you need to skip a turn or inject a message, you must still maintain the alternating pattern.
Streaming
For long responses, streaming shows output as it is generated instead of waiting for the full response:
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[
{"role": "user", "content": "Write a Python function to parse a cron expression."}
]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
print() # newline after stream ends
stream.text_stream yields text deltas as they arrive. flush=True ensures each chunk is printed immediately rather than buffered.
Access the final message after streaming:
final_message = stream.get_final_message()
print(f"\nTokens used: {final_message.usage.input_tokens} in / {final_message.usage.output_tokens} out")
Error handling
import anthropic
client = anthropic.Anthropic()
try:
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
print(message.content[0].text)
except anthropic.AuthenticationError:
print("Invalid API key")
except anthropic.RateLimitError:
print("Rate limit hit — back off and retry")
except anthropic.APIStatusError as e:
print(f"API error {e.status_code}: {e.message}")
For production use, wrap calls in retry logic for RateLimitError and transient APIStatusError (5xx) responses. The SDK does not retry automatically.
Choosing a model
# Fast, inexpensive — good for classification, extraction, simple Q&A
model="claude-haiku-4-6"
# Balanced — most tasks, coding, analysis
model="claude-sonnet-4-6"
# Most capable — complex reasoning, long documents, nuanced writing
model="claude-opus-4-6"
Start with Sonnet. Move to Haiku for high-volume tasks where latency and cost matter. Move to Opus when Sonnet's output is not good enough. Any doubt? For more information about the differences between each model, refer to our "Claude Haiku, Sonnet, and Opus: Which Model to Use" tutorial.
Set max_tokens appropriately
max_tokens is a hard ceiling, not a target length. Claude will stop abruptly if it hits the limit mid-sentence. For open-ended generation, set it generously. For constrained outputs (a classification label, a JSON object), set it tightly to avoid paying for tokens you do not need.
A response cut off by max_tokens returns stop_reason: "max_tokens". Check for this if completeness matters.
SysEmperor