Which of the following AWS services allows you to query data directly in Amazon S3?
You may know Timescale as the developer of TimescaleDB, the leading time-series and analytics database built on PostgreSQL. We’re also building Timescale Cloud, a cloud-native PostgreSQL solution for time series, events, and analytics that expands traditional cloud databases' boundaries by combining all the goodness of PostgreSQL with the flexibility of cloud-native architectures. And we’re building it on AWS. Show
With its extensive catalog of services, AWS is the industry leader in public cloud infrastructure—and the default choice for many developers on where to build their projects. One of the advantages of building in AWS is that you can mix and match its wide range of services and tools to architect your data infrastructure, avoiding the need to create these services from scratch and speeding up development time. Timescale Cloud is built to serve developers working with time series, events, and analytics applications in AWS. You can integrate your Timescale Cloud databases seamlessly into your existing AWS infrastructure as your all-in-one datastore for relational and time-series data for your applications. But don’t just take it from us. See what one of our customers, SquareRoots, says about building on AWS with Timescale Cloud: "Timescale Cloud integrated seamlessly into our AWS data pipeline with AWS IoT Greengrass, AWS Kinesis, and AWS Lambda to help power our controlled environment agriculture platform." A downside to AWS’s extensive range of tools is the paradox of choice. To help solve that problem, we’ll tell you about eight AWS services that Timescale customers love using with Timescale Cloud, ranging from tools to ingest data into your database and business intelligence tools to services for low-cost data archiving and tiering. In particular, we’ll cover the pairing of Timescale Cloud with the following services:
Let’s get into it! 1. Amazon VPCVirtual Private Cloud (VPC) peering is a method of connecting separate Cloud private networks. It makes it possible for the virtual machines in the different VPCs to talk to each other without going through the public internet—resembling a traditional network that businesses would previously operate in their own data center but with the benefits of using scalable cloud infrastructure. Amazon VPC is the service that bridges your Timescale Cloud databases and the rest of your AWS infrastructure. VPC peering enables you to securely access data stored in Timescale Cloud from your existing cloud infrastructure without ever exposing your services to the public internet. More specifically, this service creates a private network “peering” connection between your Amazon VPC(s) and your Timescale Cloud VPC(s), making it possible for both to speak to each other without going through the wider Internet. VPC peering using Amazon VPC enables you to establish a private connection between Timescale Cloud and other elements of your AWS infrastructure, giving you maximum security and privacyThis is very useful for running a managed database with the utmost privacy. For example, you may be hesitant to use a managed service because you’re concerned about exposing your database to the public internet. VPC peering solves this issue, giving you a private connection between your database and the rest of your AWS infrastructure. With VPC peering, you can enjoy all the benefits of a managed database service in Timescale Cloud without compromising on the isolation you’d get in a self-hosted deployment in AWS. VPC peering is useful for simple peer-to-peer connections, but it can also be used for more advanced deployments. For example, you can create multiple Virtual Private Clouds per service, meaning that you could set up separate VPCs for different applications—or your dev, staging, and production environments—each with its own set of security and access control preferences. Finally, it’s worth noting that using VPC peering in Timescale Cloud is very inexpensive—it will only cost you $0.030/hr per connection (which comes out to around $20/month). Want to learn more about Amazon VPC and Timescale Cloud? The following resources will tell you everything you need to know:
2. AWS LambdaAWS Lambda is a popular serverless compute service that lets you run applications without worrying about provisioning or managing the underlying infrastructure. As a user working with AWS Lambda, you define event-based functions that will run your code in response to triggers.
Being serverless, AWS Lambda is a powerful tool to operate your data pipelines with almost no operational overhead and paying only for what you consume. You can also connect AWS Lambda with AWS API Gateway to expose your function as an API endpoint or automatically run the function periodically using AWS EventBridge or AWS SNS/SQS. It works with Go, Node.js, Java, or Python code. Timescale customers often use AWS Lambda to route time-series data into Timescale Cloud—for example, using AWS Lambda together with edge runtimes for IoT like AWS IoT Greengrass. Furthermore, AWS Lambda can be used for transforming, fetching, and performing other data operations on tables and hypertables in your Timescale Cloud databases. To learn more about AWS Lambda and how to use it in your next project, see the resources below:
IoT is one of the most popular use cases for customers on Timescale Cloud. Here’s how Timescale Cloud can be used with AWS IoT solutions to build stellar IoT applications. We often hear about AWS IoT Core and AWS IoT Greengrass: AWS IoT Core establishes a secure, bidirectional connection between your edge devices and your AWS infrastructure in a serverless manner. It supports the most common networking protocols (LoRaWAN, MQTT, and HTTPS), helping you manage your IoT fleet, which can get significantly complex once you start having thousands (or even millions) of devices. Timescale customers often send sensor data from their devices to tools like AWS IoT Core (to help them manage the connection between edge and cloud) and use services like AWS Lambda to store that sensor data in Timescale CloudAWS IoT Greengrass is an edge runtime that helps you configure your IoT devices faster via pre-built modules and functionality. This service can be useful if you have a large fleet of devices performing some form of edge processing (like Lambdas or machine learning inference), if devices communicate with each other, or if you're operating with disrupted internet connectivity. AWS IoT Greengrass integrates with AWS IoT Core, but it also allows you to directly stream data to services like Amazon Kinesis or Amazon S3. You can use these tools to build your IoT data architecture, storing your sensor data in Timescale Cloud. Our customers love using Timescale Cloud for IoT because of its performance at scale (think: real-time queries and dashboards over millions of data points), time-series functionality, seamless integration with data visualization tools and end-user systems, and great cost efficiency for high data volumes. If you’re running an IoT use case, make sure you give Timescale Cloud a try (it’s completely free for 30 days) while checking out these resources:
5. Amazon QuickSightAmazon QuickSight is a managed business intelligence (BI) tool that provides both easy-to-use visualizations and dashboarding to get insights from business analytics. It integrates with a wide range of data sources, including Timescale Cloud, via its PostgreSQL driver. It also provides a machine learning functionality for pattern and anomaly detection. It’s another popular tool that Timescale Customers love to use with Timescale Cloud via VPC peering. Amazon QuickSight is a powerful BI tool that is very popular among Timescale Cloud customers (Source: aws.amazon.com)How to get started
6. Amazon CloudWatchTimescale Cloud directly integrates with Amazon CloudWatch, so you can directly monitor your Timescale Cloud database services. Amazon CloudWatch provides a reliable, scalable, and flexible monitoring solution that’s easy to spin up in minutes, saving developers the burden of managing their own monitoring systems and infrastructure. Most of our customers are using Timescale Cloud in production—mission-critical applications require close monitoring of your service metrics to ensure that your database operates efficiently and without interruption. This is where integrating Timescale Cloud with monitoring tools like Amazon CloudWatch can be extremely helpful, allowing you to set up alerts on your service metrics to get notified every time your memory surpasses a certain threshold or once your storage starts to get full. Here’s a five-minute video that walks you through integrating Timescale Cloud and Amazon CloudWatch in a few simple steps:
7. AWS Managed Service for Apache KafkaApache Kafka is a popular real-time event streaming service used for a wide variety of data-intensive applications. You can write your own producers to insert generated data into Kafka topics and subsequently write consumers to subscribe to those topics to receive all newly generated data. The Kafka Connect framework enables you to easily stream data in and out of Kafka to and from other services and software using pre-written connectors. A popular connector is the JDBC connector which allows you to ingest data into PostgreSQL from a Kafka topic. Because each Timescale Cloud database is also a PostgreSQL database, you can use this JDBC sink to ingest data into Timescale Cloud. Deploying and maintaining a Kafka cluster can be a monumental task requiring intimate knowledge of Kafka, Zookeeper, and various other tools like Kafka Connect. A good alternative is to use AWS Managed Service for Apache Kafka (or MSK for short). A feature of MSK is MSK Connect which allows you to deploy Kafka Connectors at scale.
8. Amazon S3Amazon S3, also known as Amazon Simple Storage Service, is a highly scalable cloud object storage service that stores object data within buckets. It’s built to retrieve large volumes of data. Unlike some of the services and tools mentioned above, no integration or dev work is required to use Amazon S3 with Timescale Cloud databases—you can tier data from a Timescale Cloud database to Amazon S3 right within a Timescale Cloud database itself! We recently released a consumption-based, low-cost object storage layer in Timescale Cloud built on Amazon S3. By running one command on your Timescale Cloud database, you can transparently tier data to this Amazon S3 object storage without leaving your database and while retaining access to all your data via standard SQL. In fact, you will keep the abstraction of a single table (a hypertable) that’s now transparently stretched across multiple storage layers (disk and S3), allowing you to scale your time-series data without breaking the bank. Amazon S3 is an important service for developers building cloud-native applications. It’s an object storage service with excellent durability, high availability, and virtually infinite scalability that allows you to store vast volumes of data at a lower cost than other AWS storage services, like EBS, via its consumption-based pricing. S3 is one of the most popular services in AWS (perhaps the most popular), and it’s widely used for data warehousing and archiving. But building, integrating, and operating a separate data warehouse or data lake for your time-series data means more development work, complexity, and costs. With Timescale Cloud, moving data from the database to an object store is as simple as running a SQL command. You’ll pay only for what you store—no extra charge per query and no more paying for an upper bound of storage just in case you need it. And the best part is that even when data is tiered, all data remains fully and directly queryable from within your database via standard full SQL—including predicates and filters, JOINs, CTEs, windowing, and everything else you’re used to in PostgreSQL! Note: This feature is still under active development and, thus, not ready for production use. Still, you can test it by requesting access to the private beta. To do so, follow these steps:
Get Started TodayNow it’s your turn! Pick your favorites from the AWS tools and services list and apply them to your next time-series, analytics, or events project.
Do you have feedback or suggestions for more AWS tools and services we should cover next? Let us know in the Timescale Community Forum or on Twitter @TimescaleDB. The open-source relational database for time-series and analytics. Try Timescale for free What is used to query directly from S3?Amazon S3 Select and Amazon S3 Glacier Select enable customers to run structured query language SQL queries directly on data stored in S3 and Amazon S3 Glacier. With S3 Select, you simply store your data on S3 and query using SQL statements to filter the contents of S3 objects, retrieving only the data that you need.
Can we query data in S3?Athena S3 – Reference Architecture
In these cases, Athena can provide a hassle-free way to query the data. The results of the Athena-S3 SQL queries can then be read by QuickSight or other visualization tools, which will provide BI and dashboarding to end users.
Which AWS service can be used to load data from Amazon S3?Alternatively, a web-based interface for accessing and managing Amazon S3 resources is available via the AWS Management Console.
Can Athena query S3 glacier?Amazon S3 Glacier storage – Athena does not support querying the data in the S3 Glacier flexible retrieval or S3 Glacier Deep Archive storage classes, or in the Archive access or deep archive access tiers of the S3 Intelligent Tiering storage class.
|