Distributed Persistent Rendering (DPR) #549

cassidoo · 2021-04-14T15:03:00Z

cassidoo
Apr 14, 2021

As the Jamstack ecosystem has matured, tools and services have emerged bringing ways to automate deploys and provide productive workflows, but the practical ceiling for Jamstack sites often involves maintaining practical site generation (build) times when the number of pages being created in each build becomes very large.

Various strategies have risen in an attempt to satisfy this need such as incremental builds, and incremental static regeneration (ISR) + the use of the stale while revalidate (SWR) pattern.

Each of these approaches is either difficult for developers to implement and reason about, or falls short of upholding core principles of the Jamstack thus compromising some of the key benefits of its architecture.

Goal

What are we trying to achieve?

Provide a means of generating and serving sites which have very large numbers of pages, without build times becoming impractically long, or compromising the following characteristics enabled by Jamstack:
- Atomic deploys: Where all of the code, assets and configuration of a site are updated at once so that a website cannot be accidentally served in a partially updated state.
- Immutable deploys: Once created, an immutable deploy of a website becomes an artifact which will not change. Instead, deploys result in new versions or instances of the site, and traffic is routed to them accordingly.
Retain the mental model enabled by the Jamstack architecture — where running a build results in a set of assets deployed (or deployable) to a content delivery network (CDN) or simplified hosting infrastructure which, since it is immutable and atomic, can be rolled back to a previously deployed version with confidence in its resultant state.
Retain the performance, security, and stability profile of the Jamstack architecture by honoring the logical approach of pre-rendering as much content as possible, and serving as much of the site’s content pages as pre-rendered assets from a CDN as possible.
Avoid making it difficult to reason about the logical state of the site, which makes it harder to rationalize the expected state of a site at any given time in its deploy history.

Proposal

Distributed Persistent Rendering (DPR)

With DPR, the generation, or rendering of assets is distributed between build-time and request-time.

With DPR, the responsibility of rendering assets is distributed between the build infrastructure, and serverless functions.

With DPR, the cache of assets which form the site’s deploy can grow progressively and persists over time as requests are made to more URLs which were not previously rendered.

Logically, DPR creates the same result as a Jamstack build - it renders assets and populates them into a CDN or hosting infrastructure. Rather than the rendering of every asset taking place at build time, the rendering of some assets can be deferred from build time and instead take place on demand when each is first requested. These assets then join those previously rendered during the build process or by other on demand requests, in the CDN, logically contained within the same atomic deploy.

In this way DPR would provide the means of rendering some assets at build time, and others, later via serverless functions, as a result of the first request to their URL.

Like the assets generated during a build, those rendered by DPR at request time would remain in the CDN cache until invalidated by the successful completion of a new deploy. This would allow developers to consider the assets rendered during a deploy, and those rendered on demand from requests to DPR functions contained in that deploy as all belonging to the same logical atomic deploy.

Image: A logical overview of the areas of responsibility for a given deploy with DPR

Usage scenarios

Example scenarios utilizing DPR:

Critical and archived pages

Consider a news or publication site which may have hundreds of thousand or millions of unique content pages on unique URLs. Typically, over time, the frequency of updates to specific news story pages diminishes, as does the frequency of requests to them.

DPR would allow the site developers to focus on core content pages in their build, regularly generating those pages as a result of content updates, and as a result of feature and design iterations. The long tail of historic pages might be omitted from the core build such that the build times remain fast and manageable. Historic pages would then get added to the latest deploy output only if they are requested via their public URL. Once rendered on demand, they would remain available until invalidated by a subsequent deploy. Thereafter they would be repopulated by their next first request.

User generated content pages

DPR could also be used to populate pages created as a result of content contributed by users. Consider a site which invites users to submit content via a form which would later be presented on a unique page per contribution.

Instead of regenerating the entire site upon each user submission, only the new page which should exist as a result of the user’s contribution would need to be generated. This rendering could take place on-demand when the URL for the new page is first requested and added to the overall deploy cache for this version of the site.

Updates to the site’s design or functionality would trigger a new deploy as usual, but the build times would no longer be coupled to the volume of user generated content as the rendering of these pages would be deferred until they were first requested.

Points of difference

Key differences between DPR and other strategies for achieving these goals.

Incremental Builds build only certain parts of your site when there are changes. DPR would instead rebuild the whole site, with the exception of the content you want to be rendered on demand.

Incremental static regeneration (which is based on stale while revalidate), similar to DPR, generates only the pages defined, and then renders the new page when a user navigates to that page. That being said, SWR relies on users seeing stale content first. Whether it is a fallback page or a previous version of the page, it is not a consistent experience for each user. The first user to a new page will see stale content, and the second user (and beyond) will see the newest content.

Exclusions and omissions

This RFC does not include descriptions of internal CDN caching strategies or other implementation details. This is a deliberate omission so that we can focus this discussion on the logical model for DPR and the mental model for building sites and applications with it.

Deeper CDN implementation details are out of scope for this discussion and should be possible to be kept opaque to the developers implementing sites with DPR just as they should for existing Jamstack sites.

swyxio · 2021-04-14T17:22:08Z

swyxio
Apr 14, 2021

interesting proposal and addresses a key pain point!

I feel like we need more concrete examples here as I have trouble understanding how it works. i think i see how this works with Nextjs as that is a core design goal. but... would DPR work with Hugo or Jekyll or Gatsby as-is today, or would it require an adapter per SSG, or would it require internal modifications to be made by each SSG maintainer?

edit: ah hang on.. there is a related thing for On-demand Builders that may add context. Question still stands - i see assertions that this process will work with any framework, but I'd love some more detail on how, for given popular SSGs that dont have any concept of on demand builders, or understanding of when to stop SSGing.

7 replies

zachleat Apr 14, 2021
Maintainer

As a small demo without any additional detail 😅, y’all can peek at https://fns-demo-cloud--11ty.netlify.app/authors/ all of the author pages there are On-demand Builders, while the rest of the site is Build-time rendered

nhoizey Apr 14, 2021

@zachleat now I wonder how in Eleventy you would limit the number of pages generated by the build, to let others be built later on demand. Would it be with a dynamic permalink: false depending on an environment variable?

swyxio Apr 14, 2021

very nice; ya i can see how it might work for Eleventy and hopefully the modifications for the Hugo and Jekylls of the world will be equally easy.

The Big Boss™ for this approach is the SSGs that then hydrate into clientside routing - i imagine breaking routes up like that is a lot more tangly for Gatsby and perhaps vuepress/docusaurus too.

what can i say except... good luck team :) important problem you are working on here!

jaredcwhite Apr 15, 2021

hopefully the modifications for the Hugo and Jekylls of the world will be equally easy

Unless something changes, Netlify Functions only support Go and JavaScript. So unless your SSG is written in Go or JavaScript and can easily be called to render a single page, nope.

jaredcwhite Apr 15, 2021

FWIW, we're working on something like this for Bridgetown: bridgetownrb/bridgetown#276
(similar basic architecture to Jekyll, and like Jekyll, requires Ruby)

kamsar · 2021-04-14T17:53:56Z

kamsar
Apr 14, 2021

If I understand this correctly it's essentially ISR but without the stale - so it would have definite first hit performance implications. Probably not an issue for the intended sort of content (archival/stale data), but important to understand to use it correctly.

It would be interesting to have some sort of control over how the on-demand builder interacts with the cache; perhaps some content is revalidate-once-then-cache-until-next-deploy, but others should use every-n-minutes-SWR (data that updates frequently between deploys, like maybe a news home page, could become almost semi-dynamic).

There might also be issues with dependency in ODB content (i.e. a news article might be built by an ODB but it might be missing from lists or navigation elsewhere that should also be rebuilt). I wonder if perhaps the ODB might be able to ask the CDN to drop other ODB-driven paths from its cache to handle this?

As to the atomic deployment: because ODBs are by their nature dynamic, they break the idea of a fully atomic deployment because they can build using data sources that are newer than the last deployment, so the site is not a full snapshot. Broken datasource APIs during on demand builds could also break stuff at any time, so the points of failure also increase.

I like this, it makes a lot of sense.

1 reply

philhawksworth Apr 17, 2021
Maintainer

Yeah with this model you are making a trade off. Rather than the first request for a URL having a stale or fallback view for the first user to request it, that very first view be the very first visitor would instead take a slight performance hit. I'd expect that to be minimal and typically imperceptible to the user (big flashing caveat of "it's possible to make any render slow no matter the tech!").

I like this trade because it means that we can have confidence of what is being served as part of each deploy (or from the long tail of requests to ODB functions in which are part of any given deploy).

I'm wary of serving stale views as an initial default. Especially if they might include their own caching strategies in the browser which he developer then also needs to understand and manage.

Your suggestion of having some control over the TTL or other caching characteristics of the views being rendered by an DOB function is interesting. My personally feeling is that this is an area where caution is needed . Especially for the default behaviours. This RFC tries to express the model for DPR as being an extension to the rendering and persisting model which deploys already encapsulate. Cirrently, this is easy to reason about — my views are valid and consistent and will be updated when I do another deploy or publish a previous deploy. I'm in favour of them aligning with that as much as possible so as to not muddy the waters for developers when trying to reason about the state of the site. Indeed there are other ways to use serverless functions already instead of/ in addition to this new mechanic.

All good food for thought!

matijagrcic · 2021-04-14T20:51:34Z

matijagrcic
Apr 14, 2021

First hit would have a performance impact and subsequent ones won't? If so, this approach doesn't seem to improve the users experience as this defies what the industry is striving to prevent/fake by using skeleton screens, service workers, app shell design with AMP + PWA etc. Seems ok for news sites, certainly not for e-commerce where you want the response as fast as possible.

Another approach could be to have a Lambda/Function etc. that hits the content that wasn't included in the build as soon as production deploy happens so the user isn't penalized and the DX/build time aren't impacted.

4 replies

jlengstorf Apr 14, 2021
Maintainer

This is a good point — a lot of the discussion internally has been around using this for the "long tail" pages that make builds slow. For example, if you're building tens of thousands of pages and your build takes >10 minutes, could you build the 1,000 pages that receive the most traffic in ~1 minute and let the long tail traffic get generated via DPR?

Another strategy would be to send an async call as part of your post-build scripts (you could use a Netlify Build Plugin for this or do it through some other process) to hit all the pages that use DPR — that way your build script is the first hit in the vast majority of cases and users would always hit the cache. This approach is really interesting to me because it's a kind of "build what's important, deploy, then build the rest" approach. I haven't seen anyone try this yet but I'm really interested in how it will work.

Another approach could be to have a Lambda/Function etc. that hits the content that wasn't included

This is a great idea! I use this on the Learn With Jason API since the API results mirror my page content. You can see it in action here: https://www.learnwithjason.dev/api/episode/lets-learn-esbuild (this is using an early prototype that relies on CDN nodes, so depending on where you load this from you might see the first build perf hit, but repeat hits will use the cache)

matijagrcic Apr 14, 2021

Yeah, Lambda/Functions would be similar to what we have previously implemented to achieve better SEO a couple years back.

https://twitter.com/matijagrcic/status/1039855036942172161

We read the sitemap each day and then fired of WebDrivers, later used Puppeteers to get the HTML snapshot and stored that on a CDN

https://developers.google.com/search/docs/ajax-crawling/docs/specification#serving-the-html-snapshot-corresponding-to-the-dynamic-page

This solution was used on several major e-commerce sites and allowed us to concentrate on using latest tools while SEO was intact.

Wix uses similar approach with Puppeteers for testing
https://twitter.com/matijagrcic/status/1285340983832596488

ianand Apr 16, 2021

Another strategy would be to send an async call as part of your post-build scripts (you could use a Netlify Build Plugin for this or do it through some other process) to hit all the pages that use DPR — that way your build script is the first hit in the vast majority of cases and users would always hit the cache.

We have something like this built into our Jamstack platform https://docs.layer0.co/guides/static_prerendering and talked about it at NextjsConf and JamstackConf https://youtu.be/cVJxZEyShs4?t=530

philhawksworth Apr 17, 2021
Maintainer

Just to emphasise, the performance cost we are talking about is for the very first request to a URL, and this is not per user. So the penalty here should be incredibly low and be outweighed by the confidence that the content being returned for requests to any given URL will be consistent and predictable.

Good discussion points in this part of the thread, and all very valuable.

Interesting to see other people's efforts in this same area, and in particular with pre-rendering specific paths or resources. There is nuance under the waterline about how those rendered views are propagated and persisted around the edge of a CDN which this RFC deliberately omits (so we can focus on the function and logical model it enables).

smoya · 2021-04-14T22:39:08Z

smoya
Apr 14, 2021

Great proposal and well explained!
I believe its important to consider cache warm-up mechanisms as well in order to reduce the load times on user's side.

A valid warm-up mechanism may include what @matijagrcic mentioned above: lambda functions that executes requests of content not built yet and that run right after the build finishes.

1 reply

philhawksworth Apr 17, 2021
Maintainer

Good point @smoya! The RFC deliberately omits discussion of the complexity behind the scenes of how the rendered views are persisted around the CDN, but there is nothing to stop people us from exploring ways to additionally prime things post-deploy.

Personally I'd want to look for data on how well (or poorly) those very first requests perform before going down that route. the potential marginal gains for only the very first request for a view might not justify the additionally complexity.

Certainly, as you and @matijagrcic note, there are some potential strategies to mitigate this if it was deemed necessary.

reegodev · 2021-04-14T23:16:02Z

reegodev
Apr 14, 2021

Very interesting proposal.

However, I'm still a bit unconvinced about the lack of customization of the cache duration. The points are clear and make sense, but to me it seems that the Jamstack philosophy is applied a bit too strictly on a feature with a bigger potential.
The biggest issue i see when keeping the cache until the next deployment is content that changes frequently: while it's fine for blog pages, it's not really great for a product page if you track stock availability (I'm sure there are also other examples of content that changes often and could benefit from a shorter cache).

IMHO allowing developers to customize the cache could enable a broader range of use cases.

8 replies

reegodev May 7, 2021

I'm interested to see how the conversation about having control over cache develops, @reegodev. It certainly seems to be an area where lots of folks have thoughts (and by cache I'll assume we're talking about persisting an asset as the current, fresh resource in the CDN rather than controlling caching behaviour in the browser, which is already possible).

Of course this is always about the CDN persistency.

I do see value in being able to easily invalidate and therefor generate fresh renders of resources as required. Each deploy would do that and we may need to introduce another mechanism. But have reservations about granular cache control. Perhaps some sensible defaults might help here. Certainly something for thought and more discussion.

I'm particularly interested in this "another mechanism" you mention. Any ideas already in mind?
If managing caching granularity goes against atomic deploys, i can totally support another method of creating new deploys which does not trigger a build of the website.
I'm aware that you can already do something similar with either build hooks or API calls, but these methods always trigger a full build. I'm also aware now most frameworks can cache some of the build work, ie: they can skip the webpack build; but they still need to run the generators which could still take some time.
What i'm thinking is a method of copying all build artifacts to a new deploy hash, without the artifacts created through DPR / ODB.
This way when my content changes i have two choices:

Trigger a normal build which regenerates my static pages and invalidates DPR artifacts.
This can take some time and can also be expensive if done very frequently.
Trigger a "soft" build that copies over my static pages and just invalidates DPR artifacts.
This can be immediate and also quite cheap i guess

If this is something akin to what you can currently do in Netlify, where you can access the intra-build cache during your builds, then yes, it could be very beneficial. I've used this mechanism quite a bit to support incremental builds where the structure of the site lends itself well to the technique.

Any doc reference or link to share? Would love to experiment on this

ianand May 13, 2021

Some thoughts @reegodev:

I recently heard that Vercel is evaluating to allow builds to bring over cached pages from past deploys, do you think is it something that is worth discussing in this RFC?

At the risk of being a sounding like I'm just tooting our own horn you can do that today on Layer0. We had a lot of debate internally about the feature because it does break the atomicity of standard jamstack and makes things harder to reason about but the feedback from our customers (who are very large sites trying to make the jamstack work) is what finally drove us to support it. Though it remains an "opt-in" only feature.

Of course this is always about the CDN persistency.

This is a very insightful statement. IMHO the CDN is really the lynchpin that makes the Jamstack work and the future of the architecture will be about redefining how developers use the CDN.

ianand May 13, 2021

However, I'm still a bit unconvinced about the lack of customization of the cache duration...while it's fine for blog pages, it's not really great for a product page if you track stock availability (I'm sure there are also other examples of content that changes often and could benefit from a shorter cache).

I agree. We run into this a lot. IMHO on a large site the right solution is to allow specifying the cache duration and supporting targeted cache clears (both of which are things we support on Layer0). Otherwise you're rebuilding too much. I realize this is against "classic Jamstack" philosophically but actually think it strikes the right balance practically.

reegodev May 15, 2021

@ianand it's great to see that our issues are shared by others.
Mind if I ask how does your platform handle rollbacks when pages with different cache durations are involved? Do you just invalidate on-the-fly SSR cache and let the next request regenerate the page?

ianand May 17, 2021

Mind if I ask how does your platform handle rollbacks when pages with different cache durations are involved? Do you just invalidate on-the-fly SSR cache and let the next request regenerate the page?

@reegodev Basically yes. Developers can specify the cache duration and can explicitly invalidate specific paths if need be in response to backend changes. If there's a full rollback, the entire cache is cleared (unless you enable a special option to turn that off), which can then be rebuilt by the platform if desired using https://docs.layer0.co/guides/static_prerendering .

dochoaj-meli · 2021-04-15T19:56:11Z

dochoaj-meli
Apr 15, 2021

It's an interesting proposal!

Does it handle the use case of having a page who displays product price or product stock (who can change?). How does it impact with FCP? As what I understood (I could have been missing something) When a user hits a page that is not critical, we need to build it, cache it and after that deliver to the browser. All of that while using request time.

2 replies

toddmorey Apr 16, 2021

It's true. There would the cost of a delay for the first user arriving at a DPR page that has yet to be rendered. But that feels like the right tradeoff to make & seems preferable to serving either a fallback page or a stale asset.

However, as mentioned elsewhere, there's also the possibility to essentially run through a queue of site URLs to trigger the DPR renders automatically. Doing this second async render pass would greatly reduce the likeliness of any page having to be rendered as a user waits in a browser.

What's great about this model is the power developers will have to decide when and how pages are rendered, balancing upfront build times with these considerations.

dochoaj-meli Apr 16, 2021

Thanks for replying! So, if I understood correctly I...

would have control when and how to 'rebuild' some cached page
would have control to queue async renderings after the critical paths of my website are rendered at build time

If that's the scenario, this is going to be awesome :)

kylejrp · 2021-04-15T21:28:16Z

kylejrp
Apr 15, 2021

Love the proposal, this rocks! 🔥🤘

Is there any thought towards having the option to run the DPR functions as post-build background tasks? Then new deploys are still speedy, but you get the benefit of (eventually) having every page built instead of generating them on the fly.

For example, my personal blog site would probably wouldn't care if the first load is generated with a DPR function for the first visit after a deploy.

However, an eCommerce site might not want that initial extra load, so they choose to have the pages rendered as a post-build action. In the worst case, if a user visits a page right after a deploy but before the DPR functions run as a post-build action, they would force the page to be rendered on the fly through DPR.

(Am I making sense? This makes this more of a spectrum of how static you want your site to be)

2 replies

toddmorey Apr 16, 2021

Yes absolutely and this is exactly the model I think this could open up.

Critical pages are prerendered as part of the build
DPR functions are then called so that additional pages are rendered asynchronously
If a user arrives at a DPR page ahead of being rendered, the page is generated on demand. This happens at the cost of a slight delay, but without concerns around fallback experiences or stale cache.

It will take a little coordination with site generators to walk the site map and fire off the DPR functions for each additional page, but triggering async building in this way helps control and narrow the window between the initial build and the time when all pages are finally rendered into persistent assets. This optional enhancement could address the concerns about builds continuing to be atomic when the data returned from APIs / content systems change over time.

ianand May 13, 2021

@toddmorey @kylejrp

Is there any thought towards having the option to run the DPR functions as post-build background tasks? Then new deploys are still speedy, but you get the benefit of (eventually) having every page built instead of generating them on the fly.

At Layer0 we call this feature "Parallel Static Rendering" and it's a great approach in that it lets you decouple the build time overhead from the deployment. We spoke about it at NextJsConf and JamstackConf https://youtu.be/cVJxZEyShs4?t=530

seancdavis · 2021-04-16T15:08:59Z

seancdavis
Apr 16, 2021

This is super intriguing. I like where it is headed! In particular, I really like that you're keeping the original Jamstack benefits top of mind when considering how to expand and evolve the approach. I felt we were heading down a path where any site could fall under the purview of Jamstack simply because you can achieve it with Netlify, Vercel, etc. This reigns those ideas in by bringing us back to why the Jamstack was so revolutionary in the first place.

The main concern I have when thinking about how I might adopt to a change like this is in considering build processes that require the context of the full built output to achieve their maximum benefit. Two examples:

Purging unused styles: To end up with one CSS file to serve the site that is as small as possible, I have to look through the selectors being used throughout all the pages that could eventually be rendered to know what is safe to remove.
Catching broken links: If I want to scan through internal links and make sure they are valid, I'd have to know every page on the site. And if I wanted to include support for jumping down to a valid id, I'd also have to know the markup for each page, too, not just the path of the page.

These types of benefits are trivial to achieve with build plugins today on a completely static site. How do we ensure we don't way overcomplicate working with these supporting tools as a result of this approach?

1 reply

jgarplind Apr 16, 2021

Thought-provoking points! Thinking about 1., would it be cumbersome to separate such tasks from the main build pipeline (critical path)? I imagine you could maintain a slower script which does a lot of this type of groundwork without slowing deployments.

I suppose this applies to your second point as well, but in my mind letting through a redundant style (and thus an unnecessarily large build output) is not as critical as maintaining the validity of links.

Apologies if this whole reply is tangential, it just seemed intuitive to me that since a major goal of DPR is to optimize build speed, we should keep (what I would consider) extravaganza outside of the critical path.

slorber · 2021-04-19T09:28:31Z

slorber
Apr 19, 2021

How would on-demand builders work with Jamstack SPA sites?

I'm the maintainer of Docusaurus (quite similar to Gatsby, using Webpack and React for SSG) and wonder how easy it can be to generate that lambda function to generate pages lazily. That seems easier with non-SPA jamstack tools or Next.js and SSR-based tools that already generate lambdas.

Has anyone made a POC with Gatsby or a similar tool?

0 replies

betabong · 2021-04-21T08:35:09Z

betabong
Apr 21, 2021

I do like (if not love ☺️) the proposal per se, because it solves an important issue with many static websites. Clients have a hard time understanding why changing content in a headless cms does not have an immediate effect ... But I don't think I like the current beta implementation https://docs.netlify.com/configure-builds/on-demand-builders/ (or may be I don't understand it to its full extent).

Rendering non-critical pages on request is fine, but we need to have access to the result of the initial build process. Let's say in the build process we gather data from different sources. Now to render a page we want to have access to this data. In my understanding we would have to gather that data on every page request, right?

If in the other hand we have access to that data, serving could be quite fast. 🚀 If not, I don't see much difference to a conventional server rendered approach really. 👴

Also I'd love to see (in the future) a way to validate those non-critical pages automatically after the build process is done. We'd have to generate a list of URLs. As soon as another (partial) build succeeds, the validation queue would restart. This of course only would make sense if the result persists centrally (and not in a single edge node).

1 reply

philhawksworth Apr 27, 2021
Maintainer

Thanks @betabong. Some helpful thoughts in here!

we need to have access to the result of the initial build process. Let's say in the build process we gather data from different sources. Now to render a page we want to have access to this data. In my understanding we would have to gather that data on every page request, right?

I made a small proof of concept to explore this using the early work on Netlify's implementation (ODB). This PoC combines a number of different techniques to perform different types of render depending on what is required. I was particularly interested to see how well different serverless functions could be combined to provide a solution to a variety of functional requirements in one site, without losing the ability to reason about the state of the site after any given deploy.

Hopefully of interest will be this part of the ODB function which requires the data previously fetched from an external API by a build plugin ready for use during a deploy. I think that satisfies what you were describing.

... I don't see much difference to a conventional server rendered approach really. 👴

The key difference is in the way that the result of the first request to a URL served by an ODB function would be persisted in the CDN. This is different to the conventional server (or serverless 🥴) rendered response which would not get this persisting behaviour. The intent is to persist the first response in the CDN in the same way that you expect the results of each deploy to persist. And so let you reason about the state of your site, and consider the versioning, development workflow and publishing of deploys the same as you always could with a totally regenerated site.

nathankitchen · 2021-04-21T12:30:18Z

nathankitchen
Apr 21, 2021

Neat idea. Couple of questions though:

Is this approach particularly vulnerable to DoS with bad actors making a ton of requests to new URLs? Depending on build time, generating “on first request” would potentially need a queue/state so that subsequent requests don’t kick off parallel builds of the same resource. Impact probably depends on startup and page build time, which is probably dependent on the site/implementation so hard to control from a platform perspective.
Assuming the proposal was delivered, wouldn’t it be equally possible to address the build time problem with the same technology solution by massively parallelising the build process? If you can build one or more resources from a function, wouldn’t it be as easy to orchestrate a few hundred/thousand of those (batched for optimisation depending on startup time) and still end up with a single atomic deploy unit without the long build time?

3 replies

philhawksworth Apr 26, 2021
Maintainer

Is this approach particularly vulnerable to DoS with bad actors making a ton of requests to new URLs?

Since this model persists the first request to a URL and serves subsequent request directly for what exists in the CD, I'd say less so that any traditional rendering model where all views were rendered on demand on a server for every request.

I think that your question might have been motivated by a desire to mitigate lots of "parallel builds". I think it's worth noting that this model is not suggesting that each first request would initiate an entire build or indeed anything more than is required to return the view for that single resource as per its URL. As you note, implementations will play a part, but I'd expect that each of these requests should be very fast and focussed. I'm personally not picturing these as initiating new builds so much as returning a single page view.

In my proof of concept implementations, the functions have been completing in between 25 and 47ms (for simple views, but still which fetched data from a remote API). I'd not anticipate the need to build queuing or other logic to manage that. Especially since in order to abuse it, one would need to coordinate the request to very large numbers of unique URLs which are serviced by serverlerss functions which are provided by infrastructure designed to automatically and elastically scale.

wouldn’t it be equally possible to address the build time problem with the same technology solution by massively parallelising the build process?

That's an interesting thought. It would require each site generators and frameworks to be designed and developed to work in this way, This proposal is trying to avoid tightly coupling the SSGs to the infrastructure or to require a complete re-architecture of the existing SSGs and frameworks. I wonder if any maintainers have explored such a compute model for their site generators.

nathankitchen May 3, 2021

I think that your question might have been motivated by a desire to mitigate lots of "parallel builds".

Yes, exactly that. If the build time can be kept as low as suggested then that would seem reasonable, though that strikes me as a difficult assumption/requirement to deliver or enforce from a developer experience perspective. 25ms from cold? As in, generator and site not already loaded and in memory?

philhawksworth May 4, 2021
Maintainer

25ms from cold? As in, generator and site not already loaded and in memory?

From cold, yes. But not in this instance, using a generator. Using a WiP of Netlify's ODB functions (which would be Netlify's implementation of the DPR architectural model) you can use an additive approach to enrich the output of an SSG with any tools you like to render a page. You don't have to use the same SSG (although there will be some nice DX advantages to doing so depending on the use case)

In this early PoC the on-demand views (the content pages in this section) are returned by a function which requires the data which was gathered during the most recent build, and a template provided by a javascript sting literal. Populating that view takes around 2ms. Another example, which I'll share soon when it is better documented, does the same thing except it fetches the data from a remote data API, and yes, that do so from cold in around 24 - 45ms in my testing so far. Here's a gist of that function so you can .. get the gist :)

talves · 2021-05-03T17:48:04Z

talves
May 3, 2021

Concerns:

Today without DPR, there is no tie in to Netlify as your only deploy solution. DPR would tie you into using Netlify. This may be good or bad. An SSG would then have to make the decision to exclude this feature (opt out) and deal with slow initial builds. SSG's should be handling the issues of slow builds and making it a priority already. Of course, I'm making some assumptions about architecture, so this is more for discussion.
Using DPR for improving a build time, seems to just offset the build time. Are we just trading one slower build for another to give a perceived faster build (rhetorical)? One of our users is going to take the hit on that first DPR. Now the SSG has to figure out how to improve the DPR build time to improve the UX to get a real improvement.
Are we moving toward a more complicated setup in our SSG's for larger sites? Probably. What percentage of sites are larger? I'm going to guess these are the more important sites. Enterprise sites make up and pay for the economics of our free plans. Sustainability for maintaining open source SSG's and a more complicated setup get's increasingly problematic. These are all trade-offs worth exploring, but the tangibles are harder to grasp without the journey.

I'm not trying to throw shade on the idea of DPR. I'm trying to get my head around where the real issues are now that we have Jamstack defined. Slow builds are an issue in almost every SSG with lots of content. I'm just apprehensive when I keep perceiving the issue going back to the SSG itself.

2 replies

jlengstorf May 4, 2021
Maintainer

these are all good questions!

Today without DPR, there is no tie in to Netlify as your only deploy solution. DPR would tie you into using Netlify.

the design here is specifically set up to degrade gracefully. the SSG can use any serverless system to render on demand, but if their serverless provider implements DPR support they'd be able to persist that response

SSGs could also support server environments, etc.

the general idea is that each SSG would only need to implement a "build one page" API, which many already have (though most currently have this as an internal and not a public API). the implementation wouldn't add any Netlify-specific code, but supporting "build one page in a serverless function" would allow devs using that SSG with Netlify (or any platform that supports it) to take advantage of DPR

SSG's should be handling the issues of slow builds and making it a priority already.

definitely! our hope is that this encourages a pattern of "build the most important stuff ahead of time and defer the rest", but the real dream is that tech like Rust and esbuild get us to a point where it takes a HUGE site to make this necessary

Using DPR for improving a build time, seems to just offset the build time. Are we just trading one slower build for another to give a perceived faster build (rhetorical)? One of our users is going to take the hit on that first DPR.

a pattern that I'm really excited about is "async builds" where the most important content gets built before deployment, and the rest of the content is immediately rendered by a post-deploy hook so that no one ever feels the slowdown unless they visit a page IMMEDIATELY after it's deployed

the async part of the build would also be parallelized since it's all serverless, so it should be MUCH faster than building all of that content synchronously

Are we moving toward a more complicated setup in our SSG's for larger sites?

like I mentioned, in a perfect world the SSG tech will get fast enough that it's rare to need DPR, but currently build times are enough of a bottleneck that teams are feeling the need to go to server rendered sites to get around it. we feel strongly that this will cause more pain for teams in the long run, so DPR is intended to solve for today's limitations without tying itself to a given framework or tool

does that all make sense?

talves May 4, 2021

Yeah, it all makes sense. I think about all these patterns a ton, of course. I think this is a great discussion to be having right now. It's definitely time.

a pattern that I'm really excited about is "async builds" where the most important content gets built before deployment

I like the idea here for sure. There's plenty of room for this type of setup and I could see it being really advantageous if we tackle it correctly in the community.

the real dream is that tech like Rust and esbuild get us to a point where it takes a HUGE site to make this necessary

I'm a huge proponent of the faster tooling camp. I've been pretty excited to see how we go further here also, which is where I was heading with these concerns also.

Thanks for the feedback also Mr Lengstorf. 😉

denghongcai · 2021-05-28T09:49:21Z

denghongcai
May 28, 2021

ehhhh, is it just like standard CDN-Origin architecture?

0 replies

wardpeet · 2021-05-28T10:23:29Z

wardpeet
May 28, 2021

Hello, fellow jammers!

I like the overall flow of the spec. Part of the site can be generated at build time, the rest on demand. As mentioned above, every strategy has its trade-of.

Writing specs is difficult because you want to be specific enough but keep the implementation flexible enough so frameworks can choose how to implement it (We all want to have a feature page with checkboxes to all the buzz words).

The thing I would like to change in the proposal is:

Like the assets generated during a build, those rendered by DPR at request time would remain in the CDN cache until invalidated by the successful completion of a new deploy. This would allow developers to consider the assets rendered during a deploy, and those rendered on-demand from requests to DPR functions contained in that deploy as all belonging to the same logical atomic deploy

Incremental Builds build only certain parts of your site when there are changes. DPR would instead rebuild the whole site, with the exception of the content you want to be rendered on demand.

To me, DPR doesn't need to be opinionated about if the new file is part of the "atomic deploy". DPR & incremental builds are pretty intertwined. The difference is that DPR is built on-demand, and incremental builds ahead of time. If I change an on-demand build page and my code doesn't touch anything else, why would I want to rebuild all other pages?

I'm curious why the atomic deployment is so crucial in the proposal?

2 replies

jlengstorf Jun 27, 2021
Maintainer

hey, @wardpeet! just realized we left you hanging here

the main challenge if we allow arbitrary changes to deployments is that it muddies the mental model and creates a lot of traps

for example: caching is easy if you have one layer of it — take some assets, put them on the CDN, replace them if there's a new site build. that's Jamstack! it's a clear workflow that you can explain without a whiteboard

however, if you add multiple layers of caching, it gets really hard to track. every large team I've worked with has at least one horror story of misconfigured caches breaking the production site. because of this, we want to be really wary of introducing a secondary caching layer into the standard architecture. that's why DPR is set up to build once, then go to the same caching layer as all other prebuilt assets until the next build — otherwise we have the potential for caches to depend on each other and get out of sync, and that's a recipe for a bad time

I do think there are ways we can design thoughtfully around partially regenerating sites, but we want to be really cautious about how we implement it — otherwise we'll hit all the original problems of e.g. Varnish and other caching layers that are so great... until something goes wrong and you spend your weekend trying to figure out which layer of the cache is out of sync 🙃

thanks!

wardpeet Jun 28, 2021

Thanks for getting back to me 😄. It makes sense, Caching is great until it doesn't work as expected. I'm all on board with clearing the cache of all items. To me, an "atomic deploy" refers to upload all files over again. Many providers won't upload these new files if they haven't changed.

This is more an implementation detail but that's how I read the spec that every file needs to be re-deployed.

joelvarty · 2021-06-02T13:24:56Z

joelvarty
Jun 2, 2021

I would love to see DPR operate such that we can create new versions of any page using an incremental approach (such as with Next.js Incremental Static Regeneration) and have that actually create a delta build, similar to a commit in git.

That way as the data or content for pages change between code builds, we can still account for that and maintain immutability.

4 replies

mraerino Jun 28, 2021

how would you handle data changes that affect more than one page? e.g. you change the title of a post and a bunch of your category overviews or taxonomy pages need to change?

joelvarty Jun 28, 2021

Good question - Gatsby's incremental build engine handles that by maintaining a dependancy tree and updating all of pages that are affected by a content change. Tools like Next.js ISR handles this by using a cache timeout, which doesn't work for DPR immutability...

stewarthsoj Aug 28, 2021

Notwithstanding the complexities, I think this is a limitation. We provide a way for content creators to generate their own landing pages, which means we don't control the content that is created. For the most part, this created content is static and doesn't change very often. The current DPR proposal would require us to redeploy our website or wait for the cache to be invalidated.

ascorbic Sep 19, 2021

I think this relates to @wardpeet's comment: whether or not a file is re-uploaded is an implementation detail. If a CDN is able to determine that a file is unchanged then it should be considered safe and still not a violation of the DPR rules to not upload it again. The only way it would violate the expectations would be if there are pages that would change if rebuilt, but are not uploaded. Whether it's through the framework knowing that a page's data is unchanged, or a CDN knowing that the file hash is unchanged, I think it's reasonable to only upload deltas and for it to still be considered an atomic deploy.

Distributed Persistent Rendering (DPR) #549

Table of contents:

Challenge

Goal

Proposal

Distributed Persistent Rendering (DPR)

Usage scenarios

Critical and archived pages

User generated content pages

Points of difference

Exclusions and omissions

Replies: 15 comments · 38 replies

zachleat Apr 14, 2021 Maintainer

philhawksworth Apr 17, 2021 Maintainer

jlengstorf Apr 14, 2021 Maintainer

philhawksworth Apr 17, 2021 Maintainer

philhawksworth Apr 17, 2021 Maintainer

philhawksworth Apr 27, 2021 Maintainer

philhawksworth Apr 26, 2021 Maintainer

philhawksworth May 4, 2021 Maintainer

jlengstorf May 4, 2021 Maintainer

jlengstorf Jun 27, 2021 Maintainer

Replies: 15 comments 38 replies

zachleat Apr 14, 2021
Maintainer

philhawksworth Apr 17, 2021
Maintainer

jlengstorf Apr 14, 2021
Maintainer

philhawksworth Apr 17, 2021
Maintainer

philhawksworth Apr 17, 2021
Maintainer

philhawksworth Apr 27, 2021
Maintainer

philhawksworth Apr 26, 2021
Maintainer

philhawksworth May 4, 2021
Maintainer

jlengstorf May 4, 2021
Maintainer

jlengstorf Jun 27, 2021
Maintainer