So the focus for this week was looking at cost-effective development for Alexa - where are costs likely to be incurred inside the AWS landscape, and how can they be mitigated. As a part of the investigation, I’ve looked also at scaling options, and tools for investigating cost.
(Please bear with my phrasing here, lovely people. There’s no clear reason why, but this week has really tanked my creativity, so pulling words out has been like blood from a stone. I appreciate your patience!)
Let’s start with the AWS services most likely used as part of an Alexa skill.
Costing Across Services
- AWS Lambda - this is where the logic and processing of a skill are implemented. Each Alexa skill has one principal Lambda function, and potentially more for housekeeping tasks.
- DynamoDB - this is where the memory of a skill resides - it’s state information and content, and the user’s relationship with the skill. Using a database to drive a skill’s content allows updates to be made frequently without the overhead of re-certification.
- Cloudwatch - this is where the logs from user interaction live. Both understanding user behaviour through logged messages and troubleshooting utilise information stored in Cloudwatch
- S3 - this is where static assets are most easily retrieved from. Before beginning experiments with dynamic audio, this is where project prototypes initially stored audio files to be served as part of an Alexa skill.
This is absolutely not an exhaustive list of the possible services that could be used for development, but these are the most likely tools to be used, and the most straightforward. I haven’t explored self-hosting of skill implementations or other resources here, partly because the options for alternatives are so broad, and partly because staying inside AWS infrastructure allows these things to be discussed in concrete terms.
It’s worth noting as well that, to spoil the end of this post, this investigation turned out to be a little preemptive. The user load supported by free tier functionality across most services is very high (we’ll go into this in a second) and Amazon also currently offer a $100/month credit for incurred AWS infrastructure costs, which would mitigate the expenses created by a high number of users.
Let’s look at AWS Lambda as an example, using some calculations tested with the
AWS Lambda provides 1M requests and 400,000GB-seconds of processing in its free tier. Assuming a memory configuration of 128Mb for a Python 3.4+ Lambda function (which seems more than generous as a minimum configurable) that’s 32,000,000 seconds of processing, so processing time is unlikely to be a constraint.
Assuming (and this is where things get more tentative) a typical session of play with an interactive skill of 200 interactions - 200 requests to Alexa, and that a typical user would interact with a skill once every three days, that’s 2000 user interactions per user per month, accommodating 500 users. More succinctly:
400,000GB-seconds @ 128Mb = 32,000,000 seconds processing time.<br/>
10 x 200 requests / month = 500 users supported each month.
So across either multiple Alexa skills or an individual skill, by the time that free tier capacity is exceeded, the user base for the skill should be generating sufficient revenue to more than offset infrastructure costs, especially once the Amazon $100 offset is applied. For Cloudwatch, the free tier supports 5GB ingested and archived per month, for up to 1M requests. By the time Cloudwatch’s free levels are exceeded, its the kind of problem you want to have.
There is one caveat here - the volume of data queried is also set at 5GB per month. In order to grow a skill and understand its userbase, it’s likely that log files would be queried and investigated extensively. This is where third-party tools like Datadog may become more cost-effective. However with a cost of $5 per function monitored along with ingestion costs, it’s not a simple tradeoff.
DynamoDB’s free-tier capabilities are a little harder to parse on the product page, but much easier to comprehend on AWS’s billing report:
- 25 GB of storage
- 25 provisioned write capacity units
- 25 provisioned read capacity units
(listed as enough to handle 200M requests per month)
It gets a little more complex determining cost of individual operations, but essentially it’s a combination of volume of data and required transaction integrity. Happily for DynamoDB (and the majority of the AWS services used) due to the nature of Alexa’s interaction - a more slow-paced, conversational model) transaction integrity requirements are low to non-existent, and skills are more tolerant of some additional latency. It’s also worth noting that DynamoDB indicates no cost for data inside a region, but I don’t have enough boots on the ground experience to confirm this.
For S3, the costing is so low for expected usage it doesn’t bear detailed discussion here.
What can be tweaked?
Before looking at how to find where money is going, it’s worth looking next at where there’s potential for reducing costs.
-
AWS Lambda - the number of requests to AWS Lambda can’t really be lowered. There is a 1:1 relationship between requests to Alexa and AWS Lambda invocations. It is possible to optimise execution time by profiling Lambda code (in my case, Python code) and breaking down time spent in other AWS services via AWS X-Ray.
-
DynamoDB - by trading off memory use in Lambda code, it may be possible to reduce read requests to DynamoDB. The more likely tuneable area here would be the volume of data being used. But not using DynamoDB currently makes this conversation more academic.
- Cloudwatch - over time the amount of information logged by a skill could be reduced, and as errors are detected and removed, error logging would also be reduced. In understanding user behaviour, there is a potential tradeoff here between Cloudwatch and DynamoDB usage.
- S3 - image and audio assets could have their size optimised, or for heavy usage assets even cached in memory. Optimisations here would need to be done carefully though, as both image and audio are the aesthetic layer for any Alexa experience.
How To Find Cost Culprits
-
AWS Billing Report - happily, the most detailed tool for understanding service spend is the monthly billing report provided by Amazon. While the bill sent by AWS is a static PDF, an online report is available breaking down the nature of costs incurred for each service on a region-by-region basis.
-
AWS Cost Explorer - AWS also provides an online dashboard allowing high level investigation of cost breakdowns. The most useful axes here appear to be by service, by usage time and by API operation. Once a particular service has been identified as a problem spot, its then necessary to dig down into metrics in an individual service’s dashboard to find culprits.
-
AWS X-Ray - the more moving parts a Lambda function has, the more useful X-Ray is in determining where time is being spent in servicing a request.
-
Python code profiling - using either native Python code profiling tools with testing suites like LambdaUnit, or commercial profiling and reporting tools like Datadog or StackImpact allow heavily used code to be profiled first in developer performance testing, then most importantly analysing performance inside the cloud environment its ultimately deployed to.
Scaling Up & Closing Out
I had originally planned on exploring scaling up in more depth here, but I’ll leave this as more of a footnote. The happy outcome of this investigation is that cost management is very much a problem on the other side of more difficult problems like user acquisition. This neatly avoids premature optimisation as a potential time sink. Investigating potential future scaling options, the following services appeared worth future investigation:
- Lambda@Edge - reducing network latency between Alexa and Lambda functions by deploying Lambda functions automatically to the same region
- Reserving capacity - AWS provides significant cost savings for most services by allowing developers to reserve capacity rather than using on-demand capacity. Once a skill is more mature and user patterns better understood, this is a solid option for keeping costs inside the $100/month limit again.
- Elastic Beanstalk - for Flask applications like the dynamic audio prototype audio engine, AWS supports deployment to Elastic Beanstalk (AWS’ automatic EC2 scaling support) once demand reached extreme levels.
- Global tables - once demand reaches extreme levels and databases are co-located with Lambda@Edge regions, then global synchronised tables within DynamoDB would need to be re-explored.