Retrieving the Uptime for EC2 Instances Using the AWS Config Service API

Andy Rea
6 min readNov 9, 2023

--

Managing AWS EC2 instances involves a close watch on their state transitions and uptime. This is crucial for maintaining service availability and can be vital information for cost management and auditing purposes. AWS Config Service API provides a detailed view of your resource configurations and changes, which can be used to track the uptime of EC2 instances. However, retrieving this information can be nuanced, requiring careful handling of the data returned by AWS Config Service.

Traditional Uptime Retrieval via SSH

Traditionally, to get the uptime of an EC2 instance, an administrator might SSH into each instance and run the uptime command. This method, while straightforward, has several drawbacks:

  • Access Management: SSH requires proper key management, and in organizations with high security standards, gaining access might involve a complex approval process.
  • Partial Access: In large-scale deployments, administrators may only have access to a subset of instances due to segmented networks or differing permission levels.
  • Network Configuration: Instances within private subnets or behind VPNs and firewalls may not be directly accessible via SSH without additional networking configurations.

Using AWS Config Service API for Uptime Retrieval

AWS Config Service provides a more efficient method for retrieving uptime data. It allows administrators to track resources, configuration changes, and relationships between AWS resources. Here’s how it can be beneficial:

Automated Compliance and Auditing: With AWS Config Service, you can continuously monitor and record your EC2 instances’ configurations, allowing for automated compliance checks and easier auditing processes.

Programmatic Access: Config Service API provides programmatic access to the instance’s configuration history, enabling you to automate uptime checks without manual SSH access.

Historical Data: AWS Config Service maintains historical data, giving you the ability to analyze past configurations and state transitions over time.

Scalability: Using the API allows for scalability. It can handle thousands of instances simultaneously, making it ideal for large-scale deployments.

Security and Accessibility: Config Service API can be accessed using AWS IAM roles and policies, which provide fine-grained access control and don’t require direct access to the instance or managing SSH keys.

Time Zone Consistency: The API ensures consistency in time zone reporting, as all timestamps are in UTC, avoiding confusion from various system time zones across instances.

The Challenge with Sequential Events

One would assume that fetching the most recent ‘running’ state of an instance would suffice to calculate uptime. However, AWS Config Service can log multiple sequential events with the ‘running’ state if other attributes of the instance have changed. This means simply retrieving the first ‘running’ state event could be misleading. A robust solution needs to review a sequence of states to find the last transition from a non-running state to a running state.

Example 1: A Sequential Approach

The first example demonstrates a sequential approach using Bash and AWS CLI:

#!/usr/bin/env bash

set -e

current_time=$(date +%s)

aws ec2 describe-instances --filter 'Name=instance-state-name,Values=running' \
--query 'Reservations[*].Instances[*].[InstanceId]' \
--output text | \
while read -r instance_id; do
aws configservice get-resource-config-history --resource-id "$instance_id" --resource-type AWS::EC2::Instance --limit 1 |
jq -r -c '.configurationItems[] | select((.configuration | fromjson | .state.name)=="running") | [.configurationItemCaptureTime] | @sh' | \
while read -r event_time; do
event_time_seconds=$(date -d "${event_time//\'/}" +%s)
hours_diff=$(( (current_time - event_time_seconds) / 3600 ))
echo "$instance_id,$hours_diff"
done
done

This script sequentially checks each running instance for its last reported ‘running’ state. However, it does not account for multiple sequential running states and assumes the first found ‘running’ state is the correct indicator of the instance’s uptime. This assumption could lead to incorrect uptime calculations.

Example 2: Improving Accuracy

The second example rectifies the oversight of the first by looking at the last 100 changes, assuming that the transition from a non-running to a running state occurred within this window:

#!/usr/bin/env bash

set -e

current_time=$(date +%s)

aws ec2 describe-instances --filter 'Name=instance-state-name,Values=running' \
--query 'Reservations[*].Instances[*].[InstanceId]' \
--output text | \
while read -r instance_id; do
previous_state=""
uptime='unknown'
while read -r event_time state; do
state=${state//\'/}
event_time_seconds=$(date -d "${event_time//\'/}" +%s)
uptime=$(( (current_time - event_time_seconds) / 3600 ))
if [[ -z "$previous_state" ]]; then
previous_state="$state"
fi
if [[ "$previous_state" != "running" && "$state" == "running" ]]; then
break
fi
previous_state="$state"
done < <(aws configservice get-resource-config-history --resource-id "$instance_id" --resource-type AWS::EC2::Instance --limit 100 |
jq -r -c '.configurationItems[] | [.configurationItemCaptureTime, (.configuration | fromjson | .state.name)] | @sh')
echo "$instance_id,$uptime"
done

This approach reduces the chance of missing the correct transition event. The --limit 100 parameter is a significant number for most use cases, but it can be adjusted for environments with more frequent state changes.

Example 3: Enhancing Performance with GNU Parallel

The third example introduces GNU Parallel to the mix, greatly enhancing the execution speed by processing multiple instances concurrently:

#!/usr/bin/env bash

set -e


function get_uptime(){
instance_id=$1
current_time=$2
previous_state=""
uptime='unknown'
while read -r event_time state; do
state=${state//\'/}
event_time_seconds=$(date -d "${event_time//\'/}" +%s)
uptime=$(( (current_time - event_time_seconds) / 3600 ))
if [[ -z "$previous_state" ]]; then
previous_state="$state"
fi
if [[ "$previous_state" != "running" && "$state" == "running" ]]; then
break
fi
previous_state="$state"
done < <(aws configservice get-resource-config-history --resource-id "$instance_id" --resource-type AWS::EC2::Instance --limit 100 |
jq -r -c '.configurationItems[] | [.configurationItemCaptureTime, (.configuration | fromjson | .state.name)] | @sh')
echo "$instance_id,$uptime"
}

export -f get_uptime

current_time=$(date +%s)

aws ec2 describe-instances --filter 'Name=instance-state-name,Values=running' \
--query 'Reservations[*].Instances[*].[InstanceId]' \
--output text | sed 's/"//g' | \
parallel --will-cite --jobs 5 --colsep ',' get_uptime {} "$current_time"

This parallelized solution mitigates the performance issue when dealing with a high number of instances. However, it introduces the potential for ThrottlingExceptions from the AWS API, which can be managed by adjusting the number of concurrent jobs or increasing the AWS CLI's retry count.

Key Points to Consider

  • The initial query targets only running instances, which streamlines the process by focusing on currently active resources.
  • The use of jq to parse JSON outputs is integral to all examples, demonstrating its flexibility in handling complex data structures.
  • The third example is particularly efficient for larger environments but requires careful tuning of the --jobs parameter to balance speed with API rate limits.

Use Cases for AWS Config Service Uptime Retrieval

1. Cost Optimization: By tracking the uptime of instances, organizations can identify underutilized resources. Instances that are frequently stopped or have low utilization metrics can be candidates for downsizing or termination, leading to cost savings.

2. Performance Monitoring: Understanding uptime patterns can help with performance tuning. Instances that are continuously running may need scaling or load balancing to optimize performance.

3. Compliance Reporting: For industries with strict compliance requirements, such as finance or healthcare, maintaining accurate records of instance states is crucial. AWS Config Service provides an audit trail that can be essential for regulatory compliance.

4. Disaster Recovery Planning: In a disaster recovery scenario, knowing the exact state transitions of your instances can help in understanding the impact of the disaster and in planning recovery strategies.

5. Automated Alerting: With AWS Config Service, you can set up rules to trigger alerts when instances transition into particular states, allowing for real-time incident response.

Conclusion

AWS Config Service API offers an advanced solution for tracking the operational status and history of EC2 instances. Calculating the uptime of EC2 instances is a multifaceted task that requires more than just retrieving the last ‘running’ state. By considering the sequence of state changes and using efficient data processing techniques like GNU Parallel, we can achieve accurate and timely insights into our instances’ uptime.

Whether you’re managing a handful of instances or thousands, AWS Config Service provides the tools you need to maintain visibility and control over your cloud environment.

These examples serve as a foundation that can be customized to fit the specific needs of any AWS environment, ensuring that uptime calculations are both precise and efficient.

--

--

Andy Rea
Andy Rea

Written by Andy Rea

Experimenting with Medium to share my AWS CLI queries in combination with other shell utilities and also help from ChatGPT for post and image content