Build your data application with Snowpark Container Services

Fredrik Göransson
8 min readJan 29, 2024

--

Get your containers in front of the data and serve your applications!
Get your containers in front of the data and serve your applications!

With Snowpark Container Services developers can now build, deploy and run full end-to-end applications directly on Snowflake. This makes it exceptionally easy to build and offer applications that are driven on data (and honestly, most are in one way or another).

Snowpark Container Services were perhaps originally seen as a way to bridge the gap to working with LLMs and other applications that would require dedicated compute infrastructure, like GPUs, connected directly to the data. And it has really quickly offered a lot of opportunity to do just that. But it has also opened up the ability to build and run just about anything directly on top of the data, which opens up for some very exciting opportunities.

In our latest Quickstart we show how you can take a common application, built on a Node.js backend API and a React based frontend and run it directly on Snowflake. Some time back we published a quickstart for building a Data Application that connects to the data, but you had to host it yourself (on a Kubernetes cluster, on a serverless runtime, on a VM, whichever is a fitting option for you).

Let’s look at some of the key points for how you now can run (just about) any application directly on Snowflake!

Overview

Building data driven applications that are connected directly to Snowflake has long been an option for any data that lives in Snowflake. With the introduction of Snowpark developers could do even more for their applications directly on the platform, run Python, Java and Scala code directly on the platform, directly connected to the data. No movement, no shifting of the data — direct and secure connection to the data. With the introduction of Snowpark Container Services it gets even better — now developers can take just about any application, end-to-end, and deploy and serve it directly from the platform — no movement of data, secure access to it, no additional runtimes required.

By containerizing an environment, we can take any code, regardless of coding language and dependencies and deploy and serve it from a Service in Snowflake. While it may not be the ultimate, one-and-only choice for all service solutions, it does offer some clear benefits, like:

  • Fully managed environment — no maintaining infrastructure and effort spent on creating clusters for service applications
  • Integrated security — applications running on Snowpark Container Services are integrated in Snowflakes security model, so if you are already securing and governing your data on Snowflake, the applications served will benefit from the same
  • No data movement — the services running on Snowpark Container Services are running directly “on top of” the data, no movement of data is required, services connect seamlessly and securely to the data
  • Choose any language — while this is true for any environment where you can run a containerized application, it offers existing Snowflake developers virtually unlimited options to now build using the languages and tools that best suit their needs

Hosting the service on Snowflake

Snowpark Container Services offers all the tools we need to run the services. For developers it is a managed experience where Snowflake takes care of underlying infrastructure and maintenance operations, leaving the developers free to focus on the application building.

Image repositories

Containerized images can be stored in IMAGE REPOSITORIES to store OCI-compliant container images. Developers can build images locally and push them to a repository in the Snowflake account.

Compute pools

COMPUTE POOLS provide the pool of compute nodes that services are running on. For developers familiar with Kubernetes it is easy to understand the role of the compute pools. The big difference is that Snowflake manages scaling, maintenance and governance of the nodes, so developers are not required to do anything except set the type of nodes that are available for in a pool and how the resources are allowed to scale. There is a wide range of options for the nodes in a pool, very small CPUs to large high-memory enabled nodes to GPU enabled nodes. The latter are usually used in advanced ML & AI applications, like LLM training etc.

  • CPU tier — Small to large instances with 2–32 vCPU nodes available. For most types of applications, either single or multiple applications running on the same pool
  • High-Memory CPU tier — For hosting memory intensive applications, ranging from 8 -128 vCPU nodes with 64–1024 GiB memory
  • GPU tier — Access to the smallest NVIDIA GPUs to very large NVIDIA A100 GPUs and large amount of memory (2048 GiB for the largest)

Services

Once you have Image Repositories and Compute Pools available, SERVICES can be created. Each service can be made up of one or more containers, each container based on an image in an Image Repository. A Service is a new object in Snowflake, users can CREATE, DROP and SHOW Services, much like other types of objects.

Creating a Service involves providing a specification for how the containers, endpoints and variables (and additional details as needed) are outlined. It also indicates which Compute Pool should be used when the Service runs, and also how the Service is allowed to scale out the number of instances from the pool that can be used by it.

CREATE SERVICE simple_service
IN COMPUTE POOL simple_app_compute_pool
FROM SPECIFICATION $$
spec:
container:
- name: service
image: /SIMPLE_APP_DB/APP/SIMPLE_APP_REPOSITORY/simple_service_image:tutorial
endpoint:
- name: serviceendpoint
port: 8888
public: true
$$
MIN_INSTANCES=1
MAX_INSTANCES=1
;

Users will also need to look at the logs from the service, and also examine the logs from the individual containers running inside as Service. There are a number functions provided to help with that.

DESCRIBE SERVICE simple_service;
SHOW ENDPOINTS IN SERVICE simple_service;
SELECT SYSTEM$GET_SERVICE_STATUS(simple_service);
CALL SYSTEM$GET_SERVICE_LOGS(simple_service, '0', 'service', 50);

Connecting to the services

The services can expose public or private (i.e. it will only be reachable from inside Snowflake) endpoints. Service Endpoints creates the necessary DNS entries to connect to the service. This means that we can build and serve just about any type of application with a web interface, like rich HTML+javascript type of application (like in the quickstart, a React.js based frontend) and HTTP based APIs (in the quickstart a backend based on Node.js Express).

In the quickstart we show how you can connect services both internally and externally. There we have the frontend service publishing a public endpoint that serves the React.js application, and then the frontend service connects to the backend service.

In the quickstart guide, we are creating a few more endpoints to enable users to connect to the frontend, and for the frontend to talk to the backend.

For the public endpoints, Snowflake will generate a URL for every endpoint that has been declared as public. Each of these endpoints will be a unique subdomain under .snowflakecomputing.app. When accessing this users will be prompted to login with a user that is given permission to use the service. This means that all of security is taken care of by Snowflake, user management, password handling and so on, and the user authorization to use the application can be managed by simply updating the RBAC for the service. So when a user visits that URL, they are prompted to login, and Snowflake then checks if the USER has been granted a ROLE that has permissions to access that specific service. With that, user authentication and authorization taken care of by the platform.

When communicating between two services internally, i.e. two services that are both running on Snowpark Container Services, they can instead use internal endpoints to communicate, and we don’t have to expose public endpoints. Services are given internal DNS names that are made up of the database, schema and service name, and these can be used by other services to connect them.

In the quickstart guide we are actually introducing an NGINX server that routes calls between the frontend service internal container and the backend service. The reason for this is that the frontend service is a Client-side Rendered React.js application, meaning that the calls coming in are from the user’s browser, not from the frontend service, and due to the Cross-Origin Resource Sharing (CORS) policies the backend won’t accept that call. So how do we solve that? A simple routing server that rewrites and proxies calls coming in to the frontend service solves that.

User management

When an external user accesses an endpoint they log in with the user name and password of a Snowflake User. However, these are not the same credentials that can later be used to access Snowflake objects from the service. The actual user that logs in is however captured and can be used later on. The actual user name is then added to any request coming into the service as a unique header `Sf-Context-Current-User`. This can then be used to filter data, figure out what access and features are available in the application all based on the user that is logged in.

The http request header is a secure way of passing the user name, since all other custom headers are stripped away from requests coming in, so there is no way for a client to manually set this or try to modify the authenticated user.

Accessing Snowflake objects

A service running on Snowpark Container Services can directly connect to Snowflake through one of the connectors or drivers, there is one for most languages — Python, Go, JDBC, .NET, Node.js, ODBC Driver, PHP. When a container is run in a Service on Snowpark Container Services, the environment provides the container with some pre-populated information, both some environment variables and a file gets added to the file system of the container. A file at /snowflake/session/token contains an OAuth token that can be used to authenticate with Snowflake. This makes it very easy to connect to the data in Snowflake, and the developer does not have to manage credentials for the connection. Think of the model as a service account type of access to Snowflake — it is not the user logging in that is authenticated to Snowflake directly, but a dedicated account instead. This is a common model for many data applications and should fit well in with well established architectures.

Try it out

With all of these things, you can put together and run just about any application (well, any type of application that runs in a web browser and likes to work with data).

Snowpark Containers Services is now (January 2024) in public preview and available to a range of AWS accounts, and a quickly expanding list for that. With an account in one of these regions you can test it out by following the quickstart guide to build and deploy your first Snowpark Container Services powered application.

Resources:

--

--

Fredrik Göransson
Fredrik Göransson

Written by Fredrik Göransson

Have worked with innovation and architecture in IT for the last 20+ years. Really passionate getting cutting-edge technology, architecture and code

No responses yet