AWS offers multiple cloud-based storage options. Each has a unique combination of performance, durability, availability, cost, and interface, as well as other characteristics such as scalability and elasticity. These additional characteristics are critical for web-scale cloud-based solutions.
Media sharing platforms are a great example of how these cloud-based storage options can be used together. Startups based on media sharing have a staggering appetite for placing photos and videos on social networking sites, and for sharing their media in custom online photo albums. Here is a diagram that shows an example of a media sharing processing platform that takes advantage of four AWS cloud-based storage options: Amazon Simple Storage Service (Amazon S3), Amazon Simple Queue Service (Amazon SQS), Amazon Relational Database Service (RDS) and Amazon CloudFront. Each service offers a unique combination of performance, durability, availability, cost, and interface, as well as other characteristics such as scalability and elasticity.
Let’s examine each of these AWS storage options in more detail:
Amazon Simple Storage Service (Amazon S3)
Amazon S3 is storage for the Internet. It is designed to make web-scale computing easier for developers.
Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, secure, fast, inexpensive infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers.
Amazon S3 is intentionally built with a minimal feature set.
- Write, read, and delete objects containing from 1 byte to 5 terabytes of data each. The number of objects you can store is unlimited.
- Each object is stored in a bucket and retrieved via a unique, developer-assigned key.
- A bucket can be stored in one of several Regions. You can choose a Region to optimize for latency, minimize costs, or address regulatory requirements. Amazon S3 is currently available in the US Standard, US West (Oregon), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney), South America (Sao Paulo), and GovCloud (US) Regions. The US Standard Region automatically routes requests to facilities in Northern Virginia or the Pacific Northwest using network maps.
- Objects stored in a Region never leave the Region unless you transfer them out. For example, objects stored in the EU (Ireland) Region never leave the EU.
- Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access. Objects can be made private or public, and rights can be granted to specific users.
- Options for secure data upload/download and encryption of data at rest are provided for additional data protection.
- Uses standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit.
- Built to be flexible so that protocol or functional layers can easily be added. The default download protocol is HTTP. A BitTorrent™ protocol interface is provided to lower costs for high-scale distribution.
- Provides functionality to simplify manageability of data through its lifetime. Includes options for segregating data by buckets, monitoring and controlling spend, and automatically archiving data to even lower cost storage options.
Amazon Simple Queue Service (Amazon SQS)
Amazon Simple Queue Service (SQS) is a fast, reliable, scalable, fully managed queue service. SQS makes it simple and cost-effective to decouple the components of a cloud application. You can use SQS to transmit any volume of data, at any level of throughput, without losing messages or requiring other services to be always available.
With SQS, you can offload the administrative burden of operating and scaling a highly available messaging cluster, while paying a low price for only what you use.
- Developers can create an unlimited number of Amazon SQS queues with an unlimited number of messages.
- A queue can be created in any region.
- The message payload can contain up to 256KB of text in any format. Each 64KB ‘chunk’ of payload is billed as 1 request. For example, a single API call with a 256KB payload will be billed as four requests.
- Messages can be sent, received or deleted in batches of up to 10 messages or 256KB. Batches cost the same amount as single messages, meaning SQS can be even more cost effective for customers that use batching.
- Long polling reduces extraneous polling to help you minimize cost while receiving new messages as quickly as possible. When your queue is empty, long-poll requests wait up to 20 seconds for the next message to arrive. Long poll requests cost the same amount as regular requests.
- Messages can be retained in queues for up to 14 days.
- Messages can be sent and read simultaneously.
- When a message is received, it becomes “locked” while being processed. This keeps other computers from processing the message simultaneously. If the message processing fails, the lock will expire and the message will be available again. In the case where the application needs more time for processing, the “lock” timeout can be changed dynamically via the ChangeMessageVisibility operation.
- Developers can securely share Amazon SQS queues with others. Queues can be shared with other AWS accounts and Anonymously. Queue sharing can also be restricted by IP address and time-of-day.
- When combined with Amazon Simple Notification Service (SNS) , developers can ‘fanout’ identical messages to multiple SQS queues in parallel. When developers want to process the messages in multiple passes, fanout helps complete this more quickly, and with fewer delays due to bottlenecks at any one stage. Fanout also makes it easier to record duplicate copies of your messages, for example in different databases.
Amazon Relational Database Service (RDS)
Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business.
Amazon RDS gives you access to the capabilities of a familiar MySQL, Oracle or Microsoft SQL Server database engine. This means that the code, applications, and tools you already use today with your existing databases can be used with Amazon RDS. Amazon RDS automatically patches the database software and backs up your database, storing the backups for a user-defined retention period and enabling point-in-time recovery. You benefit from the flexibility of being able to scale the compute resources or storage capacity associated with your Database Instance (DB Instance) via a single API call.
Amazon RDS DB Instances can be provisioned with either standard storage or Provisioned IOPS storage. Amazon RDS Provisioned IOPS is a storage option designed to deliver fast, predictable, and consistent I/O performance, and is optimized for I/O-intensive, transactional (OLTP) database workloads.
In addition, Amazon RDS makes it easy to use replication to enhance availability and reliability for production workloads. Using the Multi-AZ deployment option you can run mission critical workloads with high availability and built-in automated fail-over from your primary database to a synchronously replicated secondary database in case of a failure. Amazon RDS for MySQL also enables you to scale out beyond the capacity of a single database deployment for read-heavy database workloads. As with all Amazon Web Services, there are no up-front investments required, and you pay only for the resources you use.
Amazon RDS is designed for developers or businesses who require the full features and capabilities of a relational database, or who wish to migrate existing applications and tools that utilize a relational database. It gives you access to the capabilities of a MySQL, Oracle or SQL Server database engines running on your own Amazon RDS database instance.
To use Amazon RDS, you simply:
- Use the AWS Management Console or Amazon RDS APIs to launch a Database Instance (DB Instance), selecting the DB Engine (MySQL, Oracle or SQL Server), License Type, DB Instance class and storage capacity that best meets your needs.
- Connect to your DB Instance using your favorite database tool or programming language. Since you have direct access to a native MySQL, Oracle or SQL Server database engine, most tools designed for these engines should work unmodified with Amazon RDS.
- Monitor the compute and storage resource utilization of your DB Instance, for no additional charge, via Amazon CloudWatch metrics available using the AWS Management Console “DB Instances” tab or Amazon CloudWatch APIs. If at any point you need additional capacity, you can scale the compute and storage resources associated with your DB Instance with a few clicks of the console or a simple API call.
- Pay only for the resources you actually consume, based on your DB Instance hours consumed, database storage, backup storage, and data transfer.
Amazon CloudFront is a web service for content delivery. It integrates with other Amazon Web Services to give developers and businesses an easy way to distribute content to end users with low latency, high data transfer speeds, and no commitments.
Amazon CloudFront can be used to deliver your entire website, including dynamic, static and streaming content using a global network of edge locations. Requests for your content are automatically routed to the nearest edge location, so content is delivered with the best possible performance. Amazon CloudFront is optimized to work with other Amazon Web Services, like Amazon Simple Storage Service (Amazon S3), Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Load Balancing, and Amazon Route 53. Amazon CloudFront also works seamlessly with any non-AWS origin server, which stores the original, definitive versions of your files. Like other Amazon Web Services, there are no contracts or monthly commitments for using Amazon CloudFront – you pay only for as much or as little content as you actually deliver through the service.
Amazon CloudFront has a simple, web services interface that lets you get started in minutes. In Amazon CloudFront, your content is organized into distributions. A distribution specifies the location or locations of the original version of your files. A distribution has a unique CloudFront.net domain name (e.g. abc123.cloudfront.net) that you can use to reference your objects through the global network of edge locations. If you wish, you can also map your own domain name (e.g. www.example.com) to your distribution. You can create distributions to either download your content using the HTTP or HTTPS protocols, or stream your content using the RTMP protocol.
To use Amazon CloudFront, you:
- Store the original versions of your files on one or more origin servers. An origin server is the location of the definitive version of an object. Origin servers could be other Amazon Web Services – an Amazon S3 bucket, an Amazon EC2 instance, or an Elastic Load Balancer – or your own origin server.
- Create a distribution to register your origin servers with Amazon CloudFront through a simple API call or the AWS Management Console. When configuring more than one origin server, use URL pattern matches to specify which origin has what content. You can assign one of the origins as the default origin.
- Use your distribution’s domain name in your web pages, media player, or application. When end users request an object using this domain name, they are automatically routed to the nearest edge location for high performance delivery of your content.
- Pay only for the data transfer and requests that you actually use.
- Amazon CloudFront’s availability is backed with the Amazon CloudFront Service Level Agreement .
Here are additional cloud-based storage options offered by AWS:
- Amazon Glacier – It is an extremely low-cost storage service that provides secure, durable, and flexible storage for data backup and archival. With Amazon Glacier, customers can reliably store their data for as little as $0.007 per gigabyte per month.
- Amazon EBS — It provides persistent block level storage volumes for use with Amazon EC2 instances in the AWS Cloud. Each Amazon EBS volume is automatically replicated within its Availability Zone to protect you from component failure, offering high availability and durability.
- EC2 Instance Storage — Temporary Block Storage Volumes for EC2 instances which is ideal for storing the information that changes frequently, such as buffers, caches, scratch data, and other temporary content, or for data that is replicated across a fleet of instances, such as a load-balanced pool of web servers.
- AWS Import/Export — It is a limited beta program for moving large amounts of data into and out of Amazon’s cloud services by using portable storage devices for transport.
- AWS Storage Gateway — It is a service connecting an on-premises software appliance with cloud-based storage to provide seamless and secure integration between an organization’s on-premises IT environment and AWS’s storage infrastructure.
- Amazon DynamoDB — It is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models.
- Amazon ElastiCache — It is a fully managed caching service. ElastiCache is protocol-compliant with Memcached, an open source, high-performance, distributed memory object caching system for speeding up dynamic web applications by alleviating database load.
- Amazon Redshift — It is a hosted data warehouse product, which is part of the larger cloud computing platform Amazon Web Services. It is built on top of technology from the massive parallel processing (MPP) data warehouse ParAccel by Actian.
- SimpleDB — Amazon SimpleDB is a distributed database written in Erlang by Amazon.com. It is used as a web service in concert with Amazon Elastic Compute Cloud (EC2) and Amazon S3 and is part of Amazon Web Services.
- Database on EC2 — Self-Managed Database on EC2 instances. You can run managed relational databases as well as managed NoSQL databases, or you can operate your own online database in the cloud on Amazon EC2 and Amazon EBS.