Getting to the meat and potatoes of serverless recipe parsing with Amazon Bedrock
14 May, 2024
If you're like me you've probably visited a few recipe websites in your time and had the unfortunate experience of having to scroll through the author's life story before getting to the actual recipe. I recently stumbled across an interesting website which is able to take a URL and extract just the recipe without all the surrounding fluff. This got me thinking... could I build something similar on AWS with serverless technologies and generative AI?
Lately I have been experimenting with various large language models on Amazon Bedrock. I have been particularly interested in the latest offering from Anthropic with it's Claude 3 family of models. My idea was to see if I could use Claude to extract a recipe from a web page and return it as a structured JSON object.
Crafting the prompt
I hypothesised that Anthropic's Hiaku model would make light work of finding a recipe in a body of text assuming it is given a well crafted prompt. Luckily, Anthropic's prompt engineering documentation provides some excellent guidance on how best to craft a prompt to work with their family of LLMs.
I began with some low-fi manual testing directly in the AWS Bedrock Chat Playground. I figured my eventual solution would scrape HTML from a target website and inject it into the prompt but in the meantime I'd need to do this manually. This is as simple as navigating to a recipe in my browser, viewing the page source and copying the raw HTML. I then paste the raw HTML into my prompt with some surrounding instructional text.
After some experimentation in the Bedrock console I was able to get Claude to consistently parse a recipe from a body of text and return it as structured JSON. Pretty cool!
I ended up settling on the following prompt template:
Take note of the following:
- I am using XML tags to delineate parts of the prompt as recommended in Anthropic's documentation - Use XML tags
- The
<document>
tags is where you put the HTML you want Claude to read. The<%- document %>
contained within is an ejs placeholder. More on this shortly. - I am using
<example>
tags to show how I want the JSON to be structured. I'm also telling Claude what to do if it is unable to find a recipe in the document. - I am explicitly asking for JSON output.
Building a solution with the AWS CDK
Now that I have a prompt which I am satisfied works, I set out to automate it. I began by creating a new AWS CDK project using Typescript.
The bulk of the solution will be driven by one of my favourite serverless AWS services, Step Functions! Using the CDK I create a Step Functions state machine with it's type set to Express. We need to use the Express type as we will eventually expose the state machine via an API.
The state machine workflow performs these main tasks:
- Scrape (and sanitise) the HTML from a web page
- Generate the prompt by injecting the HTML into our prompt template
- Invoke Amazon Bedrock with our finalised prompt
- Handle the case where Claude was not able to find a recipe in the document
Scraping the web page
The first step of our state machine scrapes the contents of a web page containing our recipe.
To do this I opted to create a simple Node.js based Lambda function using Typescript which takes a url
parameter and uses fetch to... ahem... fetch the target web page.
At this point I could simply return the raw HTML to advance the state machine but that would be extremely wasteful as there is a lot of redundant markup which serves little value, not to mention the longer the prompt the more it costs. We can do better!
I opted to use cheerio, a nifty HTML parsing library to clean-up the HTML before returning it from the function. Using cheerio I:
- Extract the content of
<body>
tag - Delete elements which have little semantic meaning to Claude such as
img
,video
,svg
etc. - Delete all attributes from elements
- Delete all
<!-- >
comments
I also do some final manipulation of the HTML string to remove excessive whitespace before returning the result.
Astute readers may note this is a far from ideal way of scraping a web page as it fails to consider dynamic content loaded after page load. My assumption is most recipe websites will deliver the complete recipe in the initial HTML sent from the server to aid SEO. That said, a headless Chromium + Puppeteer setup would likely be a better choice in a production environment.
Constructing the prompt
For the second step of our state machine I wrote this short Lambda function.
It takes the sanitised HTML output of the previous step and combines it with our prompt template.
To do this I use ejs to replace the <%- document %>
placeholder in the prompt template with our sanitised HTML.
Invoking Amazon Bedrock
Now we have our finalised prompt returned from the previous step we can invoke the model with it.
What's nice is AWS Step Functions has a direct integration with Bedrock which means we can invoke it directly without first having to write another Lambda function.
Here is what the BedrockInvokeModel
task looks like as defined in the CDK app.
This takes the prompt I output from the previous step ($.Payload.output
) and invokes the Anthropic Claude 3 Haiku model as it's input.
You might be wondering what that second 'assistant' message is. That is a message pre-fill which gives Claude a starting point on how to respond to the 'user' input. As instructed in our prompt template we always want Claude to respond with JSON. This combined with the prompt entered in the 'user' message helps guide Claude on how to respond. You can read more about how this works on Anthropic's website - Control output format (JSON mode).
An interesting quirk of pre-filling the opening JSON brace is that it is not included in the message response we receive from Bedrock.
You'll notice in the resultSelector
I am using sfn.JsonPath.format
, which is one of Step Function's intrinsic functions, to prepend the missing opening brace.
The resulting string is then converted to JSON with the sfn.JsonPath.stringToJson
function.
Exposing via API Gateway
The last piece of the puzzle is to expose our state machine via Amazon API Gateway.
Within the same CDK app I instantiate a new RestApi
which proxies requests directly to my state machine.
Once deployed the CDK CLI will output the URL of the newly created Rest API. Copy the URL so we can test it in our web browser.
Test it!
At this point all that's left to do is test it out on some web pages. To do this all you need to do is navigate to a website containing a recipe and paste your API URL in-front of the url in your browser's address bar.
For example, my Rest API is available at https://c7jdzx7r36.execute-api.ap-southeast-2.amazonaws.com/prod/
.
If I want to extract the recipe from https://www.recipetineats.com/caramel-slice/
I simply append the latter to the former e.g.
After a few seconds, voila!... the recipe extracted from the page as JSON!
Here is an example of what is output when you provide a URL to something which doesn't contain a recipe.
Closing thoughts
I had a lot of fun building this and hopefully this post highlights how combining serverless technologies with generative AI can be used for novel outcomes. This is very much a toy/experimental project and is not intended for serious production use. To make this safer for use in production a number of additional features would need to be added such as improved error handing, authentication at the API layer and caching to name a few.
If you'd like to try it out yourself you can find the complete CDK app here https://github.com/willdady/recipe-extractor-cdk