Amazon Bedrock Pricing Explained: What You Need to Know
Introduction
For developers creating advanced generative AI applications, AWS Bedrock has established itself as a leading solution. As a fully managed service, it removes the burden of infrastructure management, allowing teams to dedicate their efforts to building AI-driven features while AWS handles automatic scaling seamlessly.
Yet, its pricing model introduces complexities that require careful navigation. Understanding Bedrock’s cost structure is essential for optimizing expenses and ensuring efficient resource allocation. In this guide, we’ll break down the critical aspects of its pricing framework, empowering you to maintain project momentum without compromising your budget.
What is AWS Bedrock ?
Amazon Bedrock is a managed cloud service designed to accelerate the deployment of generative AI applications. It provides unified access to top-tier foundation models (FMs) from industry leaders such as Anthropic, Meta, Mistral AI, Stability AI, AI21 Labs, Cohere, and Amazon itself—all through a single API. Built with enterprise-grade security, privacy safeguards, and responsible AI practices, Bedrock empowers teams to focus on innovation rather than infrastructure.
Core Features & Flexibility
- Managed Scalability: Eliminates the complexity of infrastructure management with serverless, pay-as-you-go access to compute resources.
- Custom Model Development: Tailor FMs to your specific use cases—from text generation and image analysis to training bespoke models—using intuitive tools.
- Advanced Capabilities: Evaluate model performance, integrate knowledge bases, enforce ethical AI guardrails, and manage prompts efficiently.
Beyond basic model deployment, Bedrock supports end-to-end AI workflows. Its agentic capabilities enable dynamic interactions (e.g., chatbots that retrieve real-time data), while built-in tools for prompt engineering ensure consistent, high-quality outputs.
Understanding Amazon Bedrock’s Cost Structure
Amazon Bedrock’s pricing is shaped by four primary components: compute resources, model selection, storage needs, and data transfer volumes. Here’s how each factor influences your expenses:
1. Compute Resources
Costs scale with the processing power required to execute AI workloads. Larger models or complex tasks (e.g., real-time inference) demand more computational capacity, directly impacting your bill.
2. Model Selection
Foundation models (FMs) from providers like Anthropic or Meta have distinct pricing tiers. For instance, advanced models like Claude 3.5 Sonnet may incur higher per-token fees compared to lighter alternatives like Mistral 7B.
3. Storage Requirements
Fees apply for storing custom models, training datasets, and knowledge bases. Costs depend on the volume of data retained and the storage duration.
4. Data Transfer
Moving data into or out of Bedrock (e.g., importing training sets, exporting results) incurs charges based on the amount of data processed and the regions involved.
Amazon Bedrock Pricing Models
Amazon Bedrock offers multiple pricing tiers to align with diverse workload requirements. Below, we break down the pricing models and their use cases:
All examples are based on the Llama3.3 Instruct(70B) model and the Oregon (US-WEST-2) region.
1. On-Demand Pricing
If your usage is variable or you’re still in an experimental phase, on-demand pricing allows you to pay-as-you-go, charging you only for the tokens you consume. This model is ideal for workloads that see spikes in usage or remain relatively low much of the time.
Structure:
- Pay-as-you-go: Charges apply only for consumed resources (no upfront commitments).
- Basis:
- Text/Embedding Models: Costs determined by the number of input and output tokens (1 token is roughly 6 characters).
- Image Models: Billed per-image processed.
- Cross-Region Inference: You can leverage AWS’s global infrastructure without extra fees; costs reflect the source region’s rates.
Example Scenario:
Let’s consider the cost of a marketing text generator powered by Llama 3.3 Instruct (70B)
- Input: 15 tokens per query ($0.00072 per 1,000 input tokens)
- Output: 150 tokens per response ($0.00072 per 1,000 output tokens)
- Daily usage: Assuming an average of 1,000 queries per day (which may fluctuate), this equates to around 15,000 input tokens and 150,000 output tokens per day.
Given these parameters, here’s a cost breakdown:
- Input cost: 15,000 tokens / 1,000 × $0.00072 = $0.0108/day
- Output cost: 150,000 tokens / 1,000 × $0.00072 = $0.108/day
- Total daily cost: $0.0108 + $0.108 = $0.1188/day
- Total monthly cost: $0.1188/day × 30 days = $3.56/month (approximately)
Overall, on-demand pricing with Llama 3.3 Instruct (70B) provides flexibility without any long-term commitments—perfect if your workload varies day to day or if you’re still experimenting with different features and traffic levels.
2. Provisioned Throughput
If you need guaranteed performance and predictable costs, you can reserve dedicated capacity for your model, billed at a fixed hourly rate per “model unit.” This Provisioned Throughput approach ensures consistent availability and throughput, which is especially valuable for high-traffic applications or those requiring minimal latency.
Commitment Options
You can choose from three main commitment tiers, each with a different hourly rate:
- No Commitment – Maximum flexibility; stop anytime
- 1-Month Commitment – Lower hourly rate for a one-month term
- 6-Month Commitment – Deepest discount if you’re confident you’ll need consistent capacity for half a year
Rate structure for Llama3.3 Instruct(70B) are the following:
- No Commitment: $24.00/hour
- 1-Month Commitment: $21.18/hour
- 6-Month Commitment: $13.08/hour
Example Scenario
Assume you run one provisioned model unit continuously (24 hours/day) for a 30-day month. Depending on your commitment, costs would vary approximately as follows:
No Commitment
- Hourly: $24.00
- 30 Days: $24.00 × 24 hours × 30 days = $17,280
1-Month Commitment
- Hourly: $21.18
- 30 Days: $21.18 × 24 hours × 30 days ≈ $15,250
6-Month Commitment
- Hourly: $13.08
- 30 Days: $13.08 × 24 hours × 30 days ≈ $9,420
While these examples demonstrate daily and monthly costs for a single unit, you can scale up by adding more units if your workload demands it.
When to Choose Provisioned Throughput
- Consistent, High-Volume Usage: If your application runs around the clock or handles large numbers of requests, provisioned capacity often proves more cost-effective than on-demand.
- Performance Guarantees: A dedicated model unit ensures minimal cold starts and stable response times.
- Long-Term Predictability: Budgeting is simpler when your hourly rate doesn’t fluctuate with usage.
If your workload is more sporadic or you’re unsure about sustained demand, on-demand pricing might be a safer starting point—until you confirm that you’ll consistently use the allocated capacity.
3. Batch Processing
Sometimes you need to run a large volume of requests at once—think of reprocessing historical data or bulk-analyzing user inputs. That’s where batch processing shines: you can combine multiple prompts into one input file, send it off, and let the system handle it in a single, streamlined job.
Structure
- Bulk Submissions
- Gather your prompts into a file (or multiple files) and submit them as a single request, rather than one at a time.
- Discounted Rates
- Batch processing often costs around half the on-demand rate for select models, offering significant savings if your workload supports this mode.
- Model Support
- Not every foundation model (FM) on the platform supports batch submissions, so be sure to check the official documentation for compatibility.
Example Pricing
Compared to on-demand pricing, these batch rates can make a big difference if you process a high volume of data at once. For instance, suppose you’re working with a model that offers:
- $0.00036 per 1,000 input tokens (batch)
- $0.00036 per 1,000 output tokens (batch)
If you have a bulk job of 150,000 input tokens and generate about 600,000 output tokens in response, you can estimate your costs as follows:
- Input Cost
- 150,000 tokens / 1,000 × $0.00036 = $0.054
- Output Cost
- 600,000 tokens / 1,000 × $0.00036 = $0.216
- Total Batch Cost
- $0.054 + $0.216 = $0.27 for this bulk run
This example shows how batch pricing can be much more cost-effective if you have a sizable dataset to process in a single go.
When to Choose Batch Processing
- Periodic Large Jobs: Ideal when your team reprocesses historical logs, retrains models on fresh data, or runs big analytics jobs at the end of the day/week/month.
- Tight Budget Constraints: The per-token discount can add up to significant savings if your volume is consistently high.
- Flexible Timelines: If you can queue up requests and run them overnight or during low-traffic windows, batching may be more attractive than real-time on-demand calls.
If your application demands immediate responses—for instance, a live chatbot—on-demand or provisioned throughput might be a better fit. But for batchable workloads, this pricing model can substantially reduce your AI expenses.
4. Model Customization (Fine-Tuning)
Sometimes an off-the-shelf foundation model (FM) just isn’t enough. Model customization lets you fine-tune an FM using your organization’s data, tailoring it to a specific domain or use case—such as hyper-targeted recommendations or specialized customer interactions. The end result is a model that’s effectively “yours,” offering a competitive edge through greater accuracy in areas that matter.
Structure
- Training Costs:
- You’re billed for the number of tokens processed during training (e.g., $0.00799 per 1,000 tokens for a 70B model).
- Monthly Storage:
- Once your custom model is trained, you pay a small recurring fee (e.g., $1.95 per month) to store it.
- Inference (Provisioned Throughput):
- After training, the cost to run inferences typically follows Provisioned Throughput pricing for a “no-commit” scenario (e.g., $24/hour per model unit).
These examples reflect Llama 3.1 Instruct rather than 3.3, as AWS currently only shows customization for that specific model.
Example Scenario
Imagine you’re building a specialized product recommendation engine. You gather a training dataset containing 1,000,000 tokens of product descriptions, user reviews, and purchase history, then fine-tune your model with it.
- Training Cost
- 1,000,000 tokens ÷ 1,000 × $0.00799 = $7.99
- Model Storage
- $1.95 per month, as long as you retain the customized version
- Inference
- If you host your custom model on Provisioned Throughput (no-commit), the hourly rate is $24 per model unit. If you run the model 24/7 for a month, that can add up—but for short bursts or smaller concurrency, you can scale down units as needed to reduce costs.
In this example, training is a one-time cost of $7.99 for your dataset, while monthly storage adds a modest $1.95. The inference expense varies based on how many hours and model units you provision.
When to Choose Model Customization
- Highly Specific Domains: If you need more focused or domain-specific language understanding—say, specialized medical texts or unique industry jargon—fine-tuning can significantly improve performance.
- Competitive Differentiation: Tailor the base model to your brand’s tone, style, or product context.
- Long-Term Investments: Once you’ve fine-tuned a model, you can keep reaping the benefits while only paying the monthly storage fee—then host it as needed.
For teams that need a fully bespoke solution, model customization can be a game-changer. Just be sure you’re ready to handle the upfront training cost, the ongoing storage fee, and the provisioned throughput expense when serving inferences. By calibrating how you train, store, and deploy, you can tailor the model to your use case and effectively manage costs.
5. Model Evaluation
Getting reliable feedback on your model’s performance is crucial—especially if you need to compare multiple models or fine-tuned versions. AWS Bedrock provides automated evaluation metrics at no additional cost, but also offers a human-in-the-loop evaluation option for deeper insights.
Structure
- Inference Costs
- You’re billed for the tokens processed during evaluation—according to whichever model you select. If you choose a model with on-demand pricing, you’ll pay its per-token rate; if you choose a provisioned throughput model, hourly rates may apply.
- Human-Based Evaluation
- For each completed human task (e.g., a manual review or rating of model output), there’s a $0.21 fee. This approach can yield richer, more nuanced feedback, but costs more than algorithmic scoring alone.
Example Scenario
Suppose you want to evaluate 100 test samples using a chosen model. Each sample might consume around 2,000 input tokens and generate 300 output tokens:
Inference Cost
- Billed at the selected model’s token rate. For instance, if your model costs $0.00072 per 1,000 tokens on-demand, you’d pay that rate for the total tokens processed.
Human-Based Evaluation (Optional)
- If you request human reviewers for each of the 100 samples, you’ll be billed $0.21 per completed task.
- That’s an additional $21 in total (100 tasks × $0.21).
If you rely only on automated scoring, you pay no extra fee besides the token inference cost. Human evaluations can offer higher-fidelity insights but also raise your overall expenses.
When to Use Human-Based Evaluation
- Subjective Responses: Tasks requiring nuanced judgment—like naturalness, sentiment accuracy, or style—often benefit from human input.
- Critical Applications: High-stakes environments (medical/legal) may need thorough evaluation of outputs to ensure reliability.
- Comparisons Across Models: If you’re deciding which model or fine-tuned variant performs best, human evaluation can help fill gaps where automated metrics might fall short.
Model Evaluation on AWS Bedrock allows you to combine automated metrics (free) with human-based assessments ($0.21 per task) for deeper insights. Keep in mind, you’re still responsible for the underlying model’s inference costs while evaluating. By balancing budget with accuracy requirements, you can refine your workflows for the best possible outcomes.
Additional Bedrock Tools & Pricing
Beyond the core foundation models and their various pricing tiers, AWS Bedrock provides a suite of value-added services to enhance and customize your generative AI applications. Each service has its own cost model, so be sure to factor these into your overall budget.
1. Guardrails
Amazon Bedrock Guardrails helps you implement customized safeguards and responsible AI policies for your applications. It adds an extra layer of configurable safety features on top of a model’s built-in protections and is compatible with all FMs in Bedrock—including fine-tuned models. Guardrails can also integrate with Bedrock Agents and Knowledge Bases to ensure your AI solutions are consistent with your organization’s policies.
- How It’s Priced
- Guardrails typically charge per 1,000 text units processed, depending on the policy type. For example, content filtering or denied topic checks can cost $0.15 per 1,000 text units, while certain sensitive information filters are free. These fees apply whenever the Guardrails service evaluates inputs or outputs.
2. Knowledge Bases & Data Automation
Amazon Bedrock Knowledge Bases is a fully managed Retrieval-Augmented Generation (RAG) workflow that draws on your own data sources—whether they’re in S3 or third-party systems like Salesforce or SharePoint. It converts unstructured data into vector embeddings and supports structured data retrieval via natural language to SQL.
Amazon Bedrock Data Automation transforms unstructured, multimodal content (documents, videos, images, audio) into structured formats, enabling advanced applications like intelligent document processing or RAG. Standard Output is generated using default blueprints (e.g., for transcriptions or scene descriptions), while Custom Outputs let you define exact data schemas.
- How It’s Priced
- Structured Data Retrieval (SQL Generation): ~$2.00 per 1,000 queries
- Re-rank Models: ~$1.00 per 1,000 queries to improve response relevance
- Data Automation Inference:
- Audio: $0.006/minute
- Documents: $0.010/page
- Images: $0.003/image
- Video: $0.050/minute
- Custom Output: $0.040/page or $0.005/image (plus an incremental charge if your blueprint includes more than 30 defined fields)
Through an integration with Knowledge Bases, Data Automation can parse multimodal content (images and text) for RAG, boosting the accuracy and relevance of your AI-driven responses.
3. Agents
Amazon Bedrock Agents allow you to build autonomous, context-aware assistants for your applications. They can securely connect to various data sources, recall past interactions for seamless user experiences, and even generate code on the fly. Agents accelerate development by letting you easily configure multiple steps or tasks—like retrieving external info or parsing user requests—in a single workflow.
- How It’s Priced
- Bedrock Agents themselves don’t have a separate, published price. However, you’ll still pay for any underlying foundation model usage, data retrieval, or guardrail evaluations they trigger.
4. Flows
Amazon Bedrock Flows is a workflow authoring and execution feature that helps you orchestrate multiple components—like foundation models, prompts, agents, knowledge bases, guardrails, and AWS services—into a coherent pipeline. You can visually design and test workflows, then run them serverlessly without deploying your own infrastructure.
- How It’s Priced
- You’re charged based on node transitions—each time a node in your Flow executes, it counts toward your total. Pricing is $0.035 per 1,000 node transitions, metered daily and billed monthly starting February 1st, 2025.
While AWS Bedrock primarily charges you per token or provisioned capacity for foundation models, these extra services and tools come with their own cost considerations. By combining Guardrails for safety, Knowledge Bases and Data Automation for enhanced data retrieval and transformation, Agents for autonomous interactions, and Flows for orchestration, you can build end-to-end generative AI applications tailored to your exact needs.
Just keep in mind each tool’s usage-based pricing—from per 1,000 queries to per text unit or node transition. Planning ahead and monitoring your usage closely will help you harness Bedrock’s ecosystem without incurring surprise bills.
Practical Approaches to AWS Bedrock Cost Optimization
Managing your AWS Bedrock expenses can be challenging, and here are a few reasons why:
Unpredictable Workloads
Traffic can spike or plummet from one day to the next, making costs hard to anticipate.
Multiple Pricing Models
Each foundation model (FM) may have its own billing structure, leading to confusion if you’re juggling several.
Limited Cost Visibility
Without adequate monitoring, pinpointing your biggest cost drivers is difficult.
The good news is that these issues aren’t insurmountable. With a few targeted strategies, you can optimize your spending and keep your budget under control.
Choosing the Right Model
AWS Bedrock presents a wide range of foundation models (FMs) to choose from. Rather than always picking the biggest or the cheapest, focus on the cost-effectiveness of a model for your specific application.
Remember, not all models fit every use case. Some are overly complex for simpler workloads. For instance, if you’re only running a straightforward text classification, there’s little reason to opt for a sophisticated—and potentially costly—model. Instead, select a more moderate FM that delivers acceptable accuracy at a lower price point.
By aligning the complexity of your FM with the actual requirements of your use case, you’ll keep operational costs down without significantly compromising on quality.
Keeping an Eye on Usage
A critical step in controlling your AWS Bedrock costs is monitoring your workloads. Services like AWS CloudWatch provide real-time insights into token usage and model performance. Here are a few handy features to tap into:
- Custom Dashboards: Focus on the metrics that matter most—like input/output tokens or latency—so you can quickly assess your system’s health.
- Alarms: Configure alert thresholds that trigger notifications when usage (and thus costs) begin climbing out of your comfort zone.
You can also leverage AWS CloudTrail to record API calls, giving you an audit trail of who’s doing what and how. By staying on top of your resource consumption, you’ll avoid unpleasant billing surprises.
Optimize Large-Scale Inference with Batch Mode
For bulk workloads that aren’t time-critical, Batch Mode can be far more cost-effective—often at half the rate of on-demand pricing for certain models. Consider these tips to get the most out of batch processing:
- Combine Your Requests: Group related inference tasks into a single job submission.
- Leverage Off-Peak Windows: Schedule batch jobs during lower traffic periods to free up resources and maintain performance.
- Centralize Your Outputs: Store your completed batch results in Amazon S3, ensuring an efficient way to retrieve and analyze data at scale.
Capitalize on Provisioned Throughput
If your application’s traffic is steady and predictable, Provisioned Throughput can be a cost-efficient option. Here’s how to make the most of it:
- Assess Your Traffic Patterns: Determine whether usage remains stable enough to commit to a set capacity.
- Select the Right Term: Choose between 1-month or 6-month commitments based on your forecasted demand.
- Track and Adjust: Monitor throughput utilization, and scale your provisioned capacity if your workload changes.
Refine Your Data Preprocessing
A carefully designed data preprocessing strategy not only boosts model performance but also keeps operating expenses in check. Here’s what to consider:
- Remove Unnecessary Information: Clean your dataset of irrelevant or duplicate entries.
- Compress Where Possible: Employ data compression techniques to cut down on storage and transmission overhead.
- Standardize Formats: Keep data formats uniform to streamline processing and avoid inconsistencies.
Conclusion
Amazon Bedrock offers a powerful and flexible platform for building and deploying generative AI applications, but its pricing structure can be complex and requires careful consideration to optimize costs. By understanding the key components of Bedrock’s pricing—such as compute resources, model selection, storage, and data transfer—you can make informed decisions that align with your budget and workload requirements.
The variety of pricing models, including on-demand, provisioned throughput, batch processing, and model customization, provides flexibility to suit different use cases. Whether you’re experimenting with AI features, running high-volume applications, or fine-tuning models for specialized tasks, Bedrock’s pricing options allow you to scale efficiently without overspending.
Additionally, leveraging tools like Guardrails, Knowledge Bases, Agents, and Flows can enhance your AI applications while introducing additional cost considerations. Monitoring usage through services like AWS CloudWatch and optimizing workflows—such as using batch processing for non-time-critical tasks or refining data preprocessing—can further help control expenses.
Ultimately, the key to managing AWS Bedrock costs lies in aligning your usage patterns with the most cost-effective pricing model, continuously monitoring resource consumption, and optimizing your workflows. By doing so, you can harness the full potential of Amazon Bedrock while maintaining control over your budget, ensuring that your AI-driven projects remain both innovative and financially sustainable.