How to use Cloud Clone for datasets

Copy a dataset inside Sanity's infrastructure using either the CLI or HTTP API.

Enterprise Feature

This feature is part of our Advanced Dataset Management offering on the enterprise plan. Contact us if you need this feature and want to discuss our enterprise plan.

Cloud Clone provides a more efficient way of duplicating datasets and is ideal for situations when:

  • you want to run tests against real production data in a CI flow
  • you regularly copy datasets from production for developing new features

Instead of exporting and importing a dataset with the CLI, you can have that process happen inside of Sanity's infrastructure which will be more efficient and reliable.

There are two methods of initiating and monitoring the cloning of datasets in the cloud: through the Sanity CLI or with the HTTP API.

Copying a dataset with the CLI

The quickest way to begin developing with a freshly copied dataset is to use the CLI.

Gotcha

As with other project-specific CLI commands, this command will only work from within a configured Sanity project.

By default, the CLI command runs the copy synchronously. If you don't want to wait for the process to be completed, you can use the --detach flag to skip the progress. It will log a job ID that you can use to watch the progress again with the --attach <jobId> flag.

# Syntax:
# sanity dataset copy
# sanity dataset copy <source-dataset>
# sanity dataset copy <source-dataset> <target-dataset>

# This command will ask for which dataset to copy and what to call the new dataset
sanity dataset copy

# This command will copy the production dataset and request a name for the new dataset
sanity dataset copy production

# This command will copy the production dataset into a new dataset named newFeature
sanity dataset copy production newFeature

# This command will initiate the copy between production and newFeature
# It will run in the background and not display progress while it works
sanity dataset copy production newFeature --background

# This command will initiate the copy between production and newFeature
# It does not copy document history, speeding the copy action 
# at the expense of the history retention
sanity dataset copy production newFeature --skip-history

Gotcha

This process creates a new dataset given the specified name. If a dataset already exists with that name, the command will throw an error.

It's encouraged to use this feature instead of exporting/importing your data to another dataset. In most cases, this will be a faster method. On large datasets or datasets with a large number of assets or large assets, the process will take some time to complete.

Copying a Dataset with the HTTP API

PUT /v1/projects/:projectId/datasets/:datasetName/copy

If you'd rather integrate with the HTTP API instead of going through the CLI, there's an API endpoint for the copy functionality, as well as an endpoint for monitoring copy completion.

Initiating a Copy

In order to start a copy, a PUT request is sent to the specific dataset's /copy endpoint.

https://api.sanity.io/v1/projects/<project-id>/datasets/<dataset-name>/copy

The request needs to be authorized via a Bearer token, which can be generated from a project's dashboard.

The body of the request should be an object containing the targetDataset property with a string to use to name the new dataset.

{
    "targetDataset": "production-copy"
}

The full request

curl --location --request PUT 'https://api.sanity.io/v1/projects/<project-id>/datasets/<dataset-name>/copy' \
  -H 'Authorization: Bearer <token-here>' \
  -H 'Content-Type: application/json' \
  --data-raw '{
    "targetDataset": "production-copy"
  }'

The JSON response

{
    "datasetName": "production",
    "message": "Starting copying dataset production to production-copy...",
    "aclMode": "public",
    "jobId": "jobIdString"
}

Getting the current status of a copy

GET /v1/jobs/:jobId

When you run a copy via the HTTP API, you'll receive a Job ID. This ID can be used to query the status of the clone job.

The Full Request

curl --location --request GET 'https://api.sanity.io/v1/jobs/<jobid>' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <token here>'

The JSON response

// Running
{
    "id": "jacsfsmnxp",
    "state": "running",
    "authors": [
        "authorId"
    ],
    "created_at": "2020-11-09T17:34:28.071123Z",
    "updated_at": "2020-11-09T17:34:28.144826Z"
}

// Completed
{
    "id": "jarrwsdptf",
    "state": "completed",
    "authors": [
        "authorId"
    ],
    "created_at": "2020-11-09T17:07:41.304227Z",
    "updated_at": "2020-11-09T17:08:30.457692Z"
}

Listening for copy status

GET /v1/jobs/:jobId/listen

Each job has a /listen endpoint to allow you to monitor its status programmatically. Much like the static status endpoint, this endpoint accepts the Job ID that is returned by starting a copy action.

The full request

curl --location --request GET 'https://api.sanity.io/v1/jobs/<jobid>/listen' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <token here>'

The response

While listening, event data will be sent back at intervals providing updates on the status of your copy. The response contains the event name as well as a JSON object containing information about the current status of the copy.

event: welcome
data: {"listener_id": "ladaicdbdo"}

event: job
data: {"job_id":"jacsfsmnxp","state":"running","progress":60}

event: job
data: {"job_id":"jacsfsmnxp","state":"running","progress":80}

event: job
data: {"job_id":"jacsfsmnxp","state":"completed","progress":100}

Was this article helpful?