Which of the following AWS services allows you to query data directly in Amazon S3?

You may know Timescale as the developer of TimescaleDB, the leading time-series and analytics database built on PostgreSQL. We’re also building Timescale Cloud, a cloud-native PostgreSQL solution for time series, events, and analytics that expands traditional cloud databases' boundaries by combining all the goodness of PostgreSQL with the flexibility of cloud-native architectures. And we’re building it on AWS.

With its extensive catalog of services, AWS is the industry leader in public cloud infrastructure—and the default choice for many developers on where to build their projects. One of the advantages of building in AWS is that you can mix and match its wide range of services and tools to architect your data infrastructure, avoiding the need to create these services from scratch and speeding up development time.

Timescale Cloud is built to serve developers working with time series, events, and analytics applications in AWS. You can integrate your Timescale Cloud databases seamlessly into your existing AWS infrastructure as your all-in-one datastore for relational and time-series data for your applications.

But don’t just take it from us. See what one of our customers, SquareRoots, says about building on AWS with Timescale Cloud:

"Timescale Cloud integrated seamlessly into our AWS data pipeline with AWS IoT Greengrass, AWS Kinesis, and AWS Lambda to help power our controlled environment agriculture platform."

Mark Thompson, Senior Infrastructure Engineer
Square Roots

A downside to AWS’s extensive range of tools is the paradox of choice. To help solve that problem, we’ll tell you about eight AWS services that Timescale customers love using with Timescale Cloud, ranging from tools to ingest data into your database and business intelligence tools to services for low-cost data archiving and tiering.

In particular, we’ll cover the pairing of Timescale Cloud with the following services:

  • Amazon VPC
  • AWS Lambda
  • AWS IoT Tools: IoT Core and IoT Greengrass
  • Amazon QuickSight
  • Amazon CloudWatch
  • AWS Managed Service for Apache Kafka
  • Amazon S3

Let’s get into it!

1. Amazon VPC

Virtual Private Cloud [VPC] peering is a method of connecting separate Cloud private networks. It makes it possible for the virtual machines in the different VPCs to talk to each other without going through the public internet—resembling a traditional network that businesses would previously operate in their own data center but with the benefits of using scalable cloud infrastructure.

Amazon VPC is the service that bridges your Timescale Cloud databases and the rest of your AWS infrastructure. VPC peering enables you to securely access data stored in Timescale Cloud from your existing cloud infrastructure without ever exposing your services to the public internet. More specifically, this service creates a private network “peering” connection between your Amazon VPC[s] and your Timescale Cloud VPC[s], making it possible for both to speak to each other without going through the wider Internet.

VPC peering using Amazon VPC enables you to establish a private connection between Timescale Cloud and other elements of your AWS infrastructure, giving you maximum security and privacy

This is very useful for running a managed database with the utmost privacy. For example, you may be hesitant to use a managed service because you’re concerned about exposing your database to the public internet. VPC peering solves this issue, giving you a private connection between your database and the rest of your AWS infrastructure. With VPC peering, you can enjoy all the benefits of a managed database service in Timescale Cloud without compromising on the isolation you’d get in a self-hosted deployment in AWS.

VPC peering is useful for simple peer-to-peer connections, but it can also be used for more advanced deployments. For example, you can create multiple Virtual Private Clouds per service, meaning that you could set up separate VPCs for different applications—or your dev, staging, and production environments—each with its own set of security and access control preferences.

Finally, it’s worth noting that using VPC peering in Timescale Cloud is very inexpensive—it will only cost you $0.030/hr per connection [which comes out to around $20/month].

Want to learn more about Amazon VPC and Timescale Cloud? The following resources will tell you everything you need to know:

  • [Blog Post] VPC Peering: From Zero to Hero: A comprehensive guide on how VPC peering works and how to set it up in Timescale Cloud. This guide also includes information on how to peer Timescale Cloud with your own EC2 instance, AWS Lambda, and Amazon QuickSight [more info on these later in this post].
  • [Docs] VPC Documentation: Contains step-by-step instructions for setting up VPC peering in Timescale Cloud using Amazon VPC.

2. AWS Lambda

AWS Lambda is a popular serverless compute service that lets you run applications without worrying about provisioning or managing the underlying infrastructure. As a user working with AWS Lambda, you define event-based functions that will run your code in response to triggers.


As mentioned in the Amazon VPC section, AWS Lambda is one of the services in which you can use VPC peering to access and insert data into your Timescale Cloud databases.

Being serverless, AWS Lambda is a powerful tool to operate your data pipelines with almost no operational overhead and paying only for what you consume. You can also connect AWS Lambda with AWS API Gateway to expose your function as an API endpoint or automatically run the function periodically using AWS EventBridge or AWS SNS/SQS. It works with Go, Node.js, Java, or Python code.

Timescale customers often use AWS Lambda to route time-series data into Timescale Cloud—for example, using AWS Lambda together with edge runtimes for IoT like AWS IoT Greengrass.

Furthermore, AWS Lambda can be used for transforming, fetching, and performing other data operations on tables and hypertables in your Timescale Cloud databases.

You can connect AWS Lambda to Timescale Cloud via VPC peering. The example above shows how you can directly query hypertables in Timescale Cloud using AWS Lambda and psycopg2, the popular PostgreSQL database adapter for Python

To learn more about AWS Lambda and how to use it in your next project, see the resources below:

  • [Tutorials] AWS Lambda Tutorial: Here’s a tutorial that walks you through how to create a data API for Timescale Cloud using AWS Lambda and AWS API Gateway, how to pull data from third-party APIs and ingest it into Timescale Cloud, and how to continuously deploy your Lambda function using GitHub Actions.
  • [Blog Post] AWS Lambda For Beginners: Overcoming the Most Common Challenges: Useful advice on navigating the trickiest parts of working with AWS Lambda, like adding external dependencies, overcoming the 250 MB package limit in containers, or how to set up continuous deployment.
  • [Blog Post] How to Peer Timescale Cloud With AWS Lambda: Navigate to the “Peering Timescale Cloud…” section, where you’ll find detailed instructions on establishing a successful connection between AWS Lambda and Timescale Cloud.

IoT is one of the most popular use cases for customers on Timescale Cloud. Here’s how Timescale Cloud can be used with AWS IoT solutions to build stellar IoT applications. We often hear about AWS IoT Core and AWS IoT Greengrass:

AWS IoT Core establishes a secure, bidirectional connection between your edge devices and your AWS infrastructure in a serverless manner. It supports the most common networking protocols [LoRaWAN, MQTT, and HTTPS], helping you manage your IoT fleet, which can get significantly complex once you start having thousands [or even millions] of devices.

Timescale customers often send sensor data from their devices to tools like AWS IoT Core [to help them manage the connection between edge and cloud] and use services like AWS Lambda to store that sensor data in Timescale Cloud

AWS IoT Greengrass is an edge runtime that helps you configure your IoT devices faster via pre-built modules and functionality. This service can be useful if you have a large fleet of devices performing some form of edge processing [like Lambdas or machine learning inference], if devices communicate with each other, or if you're operating with disrupted internet connectivity. AWS IoT Greengrass integrates with AWS IoT Core, but it also allows you to directly stream data to services like Amazon Kinesis or Amazon S3.

You can use these tools to build your IoT data architecture, storing your sensor data in Timescale Cloud. Our customers love using Timescale Cloud for IoT because of its performance at scale [think: real-time queries and dashboards over millions of data points], time-series functionality, seamless integration with data visualization tools and end-user systems, and great cost efficiency for high data volumes.

If you’re running an IoT use case, make sure you give Timescale Cloud a try [it’s completely free for 30 days] while checking out these resources:

  • [Customer Story] How Edeva Uses IoT to Build Smarter Cities: A software architect at Edeva shares how his team collects huge amounts of data from IoT devices to help build safer, smarter cities and how they leverage Timescale’s continuous aggregations for lightning-fast dashboards.
  • [Tutorial] Visualize Geospatial Data Using Timescale Cloud and Grafana: IoT use cases often involve both temporal and geospatial analysis. Grafana includes a WorldMap visualization that helps you see geospatial data overlaid atop a map of the world. This tutorial offers step-by-step instructions on building dashboards for time-series and geospatial data, which are common in smart logistics, and fleet management use cases.
  • [Customer Story] How Everactive Powers a Dense Sensor Network:Everactive engineers share how they’re bringing analytics and real-time device monitoring to scenarios and places never before possible. Learn how they’ve set up their data stack, their database evaluation criteria, their advice for fellow developers, and more.

5. Amazon QuickSight

Amazon QuickSight is a managed business intelligence [BI] tool that provides both easy-to-use visualizations and dashboarding to get insights from business analytics. It integrates with a wide range of data sources, including Timescale Cloud, via its PostgreSQL driver. It also provides a machine learning functionality for pattern and anomaly detection.

It’s another popular tool that Timescale Customers love to use with Timescale Cloud via VPC peering.

Amazon QuickSight is a powerful BI tool that is very popular among Timescale Cloud customers [Source: aws.amazon.com]

How to get started

  • [AWS resources] Get Started With Amazon QuickSight. If you’re looking for tutorials to help you navigate QuickSight for the first time, you can start with AWS’s collection of demo videos and getting started guides.
  • [Blog post] How to Peer Timescale Cloud With Amazon Quicksight: Navigate to the “Peering Timescale Cloud…” section to get tips on how to properly peer Amazon Quicksight with Timescale Cloud.

6. Amazon CloudWatch

Timescale Cloud directly integrates with Amazon CloudWatch, so you can directly monitor your Timescale Cloud database services. Amazon CloudWatch provides a reliable, scalable, and flexible monitoring solution that’s easy to spin up in minutes, saving developers the burden of managing their own monitoring systems and infrastructure.

Most of our customers are using Timescale Cloud in production—mission-critical applications require close monitoring of your service metrics to ensure that your database operates efficiently and without interruption. This is where integrating Timescale Cloud with monitoring tools like Amazon CloudWatch can be extremely helpful, allowing you to set up alerts on your service metrics to get notified every time your memory surpasses a certain threshold or once your storage starts to get full.

Here’s a five-minute video that walks you through integrating Timescale Cloud and Amazon CloudWatch in a few simple steps:


To learn more, check out the following resources:

  • [Blog post] Monitoring Your Timescale Cloud Services With Amazon CloudWatch: In this blog post, we walk you through integrating Timescale Cloud and Amazon CloudWatch to monitor your memory, CPU, and storage metrics.
  • [Documentation] Export Telemetry Data to Amazon CloudWatch: Step-by-step instructions on how to export your Timescale Cloud database metrics to Amazon CloudWatch.

7. AWS Managed Service for Apache Kafka

Apache Kafka is a popular real-time event streaming service used for a wide variety of data-intensive applications. You can write your own producers to insert generated data into Kafka topics and subsequently write consumers to subscribe to those topics to receive all newly generated data.

The Kafka Connect framework enables you to easily stream data in and out of Kafka to and from other services and software using pre-written connectors. A popular connector is the JDBC connector which allows you to ingest data into PostgreSQL from a Kafka topic. Because each Timescale Cloud database is also a PostgreSQL database, you can use this JDBC sink to ingest data into Timescale Cloud.

Deploying and maintaining a Kafka cluster can be a monumental task requiring intimate knowledge of Kafka, Zookeeper, and various other tools like Kafka Connect. A good alternative is to use AWS Managed Service for Apache Kafka [or MSK for short]. A feature of MSK is MSK Connect which allows you to deploy Kafka Connectors at scale.

  • [Blog post] Build a Data Pipeline With Apache Kafka and Timescale: Learn how to set up a Timescale Cloud database as a Kafka consumer [via the JDBC sink connector] with the Confluent Platform for streaming data.
  • [Blog Post] Ingesting Data from Apache Kafka to Timescale: A Timescale community member shares his advice for ingesting data from Apache Kafka into TimescaleDB.
  • [Docs] JDBC Connector [Source and Sink]: Documentation covering the JDBC source and sink connectors, which enable you to exchange data between relational databases and Kafka.
  • [Blog post] Getting Started Using Amazon MSK: Step-by-step instructions on creating an MSK cluster, producing and consuming data, and monitoring your cluster's health.

8. Amazon S3

Amazon S3, also known as Amazon Simple Storage Service, is a highly scalable cloud object storage service that stores object data within buckets. It’s built to retrieve large volumes of data.

Unlike some of the services and tools mentioned above, no integration or dev work is required to use Amazon S3 with Timescale Cloud databases—you can tier data from a Timescale Cloud database to Amazon S3 right within a Timescale Cloud database itself!

We recently released a consumption-based, low-cost object storage layer in Timescale Cloud built on Amazon S3. By running one command on your Timescale Cloud database, you can transparently tier data to this Amazon S3 object storage without leaving your database and while retaining access to all your data via standard SQL. In fact, you will keep the abstraction of a single table [a hypertable] that’s now transparently stretched across multiple storage layers [disk and S3], allowing you to scale your time-series data without breaking the bank.

Amazon S3 is an important service for developers building cloud-native applications. It’s an object storage service with excellent durability, high availability, and virtually infinite scalability that allows you to store vast volumes of data at a lower cost than other AWS storage services, like EBS, via its consumption-based pricing. S3 is one of the most popular services in AWS [perhaps the most popular], and it’s widely used for data warehousing and archiving.

But building, integrating, and operating a separate data warehouse or data lake for your time-series data means more development work, complexity, and costs. With Timescale Cloud, moving data from the database to an object store is as simple as running a SQL command. You’ll pay only for what you store—no extra charge per query and no more paying for an upper bound of storage just in case you need it.

And the best part is that even when data is tiered, all data remains fully and directly queryable from within your database via standard full SQL—including predicates and filters, JOINs, CTEs, windowing, and everything else you’re used to in PostgreSQL!

Note: This feature is still under active development and, thus, not ready for production use. Still, you can test it by requesting access to the private beta. To do so, follow these steps:

  • Log in to Timescale Cloud.
  • In your Service screen, navigate to Operations > Data Tiering. Click on the “Request Access” button, and we’ll be in touch soon.
You can request access to Timescale’s bottomless, low-cost object storage on Amazon S3 private beta via the Timescale Cloud UI

Get Started Today

Now it’s your turn! Pick your favorites from the AWS tools and services list and apply them to your next time-series, analytics, or events project.

  • Sign up for Timescale Cloud. The first 30 days are completely free [no credit card required].

Do you have feedback or suggestions for more AWS tools and services we should cover next? Let us know in the Timescale Community Forum or on Twitter @TimescaleDB.

The open-source relational database for time-series and analytics.

Try Timescale for free

What is used to query directly from S3?

Amazon S3 Select and Amazon S3 Glacier Select enable customers to run structured query language SQL queries directly on data stored in S3 and Amazon S3 Glacier. With S3 Select, you simply store your data on S3 and query using SQL statements to filter the contents of S3 objects, retrieving only the data that you need.

Can we query data in S3?

Athena S3 – Reference Architecture In these cases, Athena can provide a hassle-free way to query the data. The results of the Athena-S3 SQL queries can then be read by QuickSight or other visualization tools, which will provide BI and dashboarding to end users.

Which AWS service can be used to load data from Amazon S3?

Alternatively, a web-based interface for accessing and managing Amazon S3 resources is available via the AWS Management Console.

Can Athena query S3 glacier?

Amazon S3 Glacier storage – Athena does not support querying the data in the S3 Glacier flexible retrieval or S3 Glacier Deep Archive storage classes, or in the Archive access or deep archive access tiers of the S3 Intelligent Tiering storage class.

Chủ Đề