# Data Access

## Google Cloud Platform

{% hint style="success" %}
Halo catalogues and z = 0 snapshots are now available via Google cloud.
{% endhint %}

Storing the quantity of data produced by the Caterpillar pipeline is not without its headaches. Whilst the core data is stored on MIT computing infrastructure, the high value assets have been made available on Google Cloud infrastructure which allows for robust access for the lowest overhead. The only requirement is that in order for you to access all of the Caterpillar data, is that you install Google's Cloud Storage command line tools ([gsutil](https://cloud.google.com/storage/docs/gsutil)).

Please [install gsutil](https://cloud.google.com/storage/docs/gsutil_install) for your system.

Once installed, to list what is available in our bucket, you simply type:

```
$ gsutil ls -l gs://caterpillarproject/halos/
```

For more information on basic `gsutil` usage, please see [Google's quickstart documentation](https://cloud.google.com/storage/docs/quickstart-gsutil). A key function we've made to make this easier is as follows:

```python
import subprocess
def download_caterpillar(output_dir="./",lxs=[14],snapshots=[319],want_halos=True,want_particles=False):
    cmd_list = []
    for lx in lxs:
        for snapshot in snapshots:
            if want_halos:
                stdout = subprocess.check_output("gsutil ls -d gs://caterpillarproject/halos/H*/H*LX%s*/halos_bound/halos_%s/" % (lx,snapshot),shell=True).decode("utf-8")
                cmd_list.extend(["gsutil cp -r %s %s/%s" % (stouti,output_dir,"/".join(stouti.split("/")[3:])) for stouti in stdout.split()])
            if want_particles:
                stdout = subprocess.check_output("gsutil ls -d gs://caterpillarproject/halos/H*/H*LX%s*/outputs/snapdir_%s/" % (lx,snapshot),shell=True).decode("utf-8")
                cmd_list.extend(["gsutil cp -r %s %s/%s" % (stouti,output_dir,"/".join(stouti.split("/")[3:])) for stouti in stdout.split()])
    for cmdi in cmd_list:
        subprocess.call([cmdi],shell=True)
```

See the redshift-snapshot key below to work out which snapshots you actually need. For example, to obtain the halos for snapshot 319 (z = 0), you simply type in your `ipython` console and the download should begin.

```
download_caterpillar(output_dir="./",lxs=[14],snapshots=[319],want_halos=True,want_particles=False)
```

Please talk to one of the team at [Caterpillar's Slack channel](https://caterpillarproject.slack.com/) should you have any problems.

## Conventions & Structures

A halo suite in the bucket is identified by its ID. e.g. `H1387186`. These IDs are the Rockstar IDs from the parent simulation.&#x20;

### Halo Key

To link actual Caterpillar numbers (from papers), use the following reference:

| Name  | PID     | Name   | PID     | Name   | PID     | Name   | PID     |
| ----- | ------- | ------ | ------- | ------ | ------- | ------ | ------- |
| Cat-1 | 1631506 | Cat-7  | 94687   | Cat-13 | 1725272 | Cat-19 | 1292085 |
| Cat-2 | 264569  | Cat-8  | 1130025 | Cat-14 | 1195448 | Cat-20 | 95289   |
| Cat-3 | 1725139 | Cat-9  | 1387186 | Cat-15 | 1599988 | Cat-21 | 1232164 |
| Cat-4 | 447649  | Cat-10 | 581180  | Cat-16 | 796175  | Cat-22 | 1422331 |
| Cat-5 | 5320    | Cat-11 | 1725372 | Cat-17 | 388476  | Cat-23 | 196589  |
| Cat-6 | 581141  | Cat-12 | 1354437 | Cat-18 | 1079897 | Cat-24 | 1268839 |

Use the following legend to determine the parameters of the run:

```
H1387186_EB_Z127_P7_LN7_LX12_O4_NV4
  H1387186    # halo rockstar id from parent simulation
  EB          # initial conditions type, 'B' for box" etc.
  Z127        # starting redshift (z = 127)
  P7          # padding parameters (2^7)^3
  LN7         # level_min used in MUSIC (2^7)^3
  LX12        # level_max used in MUSIC (2^12)^3
  O4          # overlap parameter (2^4)^3
  NV4         # number of times the virial radius enclosed defining lagrangian volume
```

e.g. all the available assets for a single halo at the highest resolution would be as follows:

```
halos/H1387186/H1387186_EB_Z127_P7_LN7_LX14_O4_NV4/
```

All halos were run at LX11, LX12, LX13 and LX14 resolutions with one done at LX15 to z = 1. For more details on these parameters, see the contamination suite information.

### Folder Structure

A given directory may have the following components:

```
H1387186_EB_Z127_P7_LN7_LX14_O4_NV4
 -> halos_bound/  # rockstar and merger tree catalogues
  -> halos_0/     # each folder contains the catalogue for each snapshot
  -> halos_1/
  ...
  -> halos_319/
  -> outputs/
  -> trees/
    -> forests.list
    -> locations.dat
    -> tree_0_0_0.dat
    -> tree_0_0_1.dat
    ...
    -> tree_1_1_1.dat
    -> tree.bin
    -> treeindex.csv
 -> outputs/      # gadget raw snapshot output (particle data)
  -> snapdir_000/ # each folder contains the particle data for each snapshot
  -> snapdir_001/
  ...
  -> snapdir_319/
  -> groups_319/  # the subfind catalogues are also stored (mostly for the last snapshot)
  -> hsmldir_319/ # the smoothing lengths for the corresponding particle data
 -> analysis/     # post-processed output files (halo profiles, mass functions, minihalos etc.)
```

As you can see above, in this directory, you'll find both **halo catalogues** (e.g. out.list files),&#x20;

```
halos_bound/halos_[snapshot]/ # rockstar out.list files
```

and **particle snapshots** (e.g. HDF5 files).

```
outputs/snapdir_[snapshot]/ # Gadget HDF5 files
```

### Snapshot-Redshift Key

Lastly, we have a key to **link redshift and snapshot** (for those available):

| Snapshot | Approximate Redshift |
| :------: | :------------------: |
|    319   |         0.000        |
|    232   |         0.501        |
|    189   |         0.996        |
|    145   |         2.011        |
|    124   |         2.975        |
|    111   |         3.959        |
|    102   |         4.984        |
|    95    |         6.002        |
|    81    |         7.006        |
|    70    |         8.023        |
|    62    |         8.941        |
|    54    |        10.066        |
|    43    |        12.108        |
|    32    |        15.073        |
|    21    |        19.771        |

The full expansion factor list can be found at the following two links:

{% file src="/files/-MCjFLFtKRhUNh5COe38" %}
Expansion Factor List (LX14)
{% endfile %}

{% file src="/files/-MCjFP6SXIQebQ08MmCR" %}
Expansion Factor List (\<LX14), 256 snapshot runs
{% endfile %}

Gentle reminder that the expansion factor, a = 1/(1+redshift), the index of the expansion factor in the above files is the snapshot number (it is 0 indexed, e.g. the first row in the expansion list file is the the expansion factor of snapshot\_000, so for snapshot 0, the redshift is... 1/0.021276596 - 1 \~46).

## Obtaining Rockstar Halos

Once you have `gsutil` installed, to obtain all the Caterpillar `ROCKSTAR` catalogues (and retain the directory structure), simply use:

```
$ gsutil cp gs://caterpillarproject/halos/H*/H*/halos_bound/ ./
```

Alternatively, if you would like a specific Caterpillar halo's catalogues, use the function:

```python
download_caterpillar(output_dir="./",lxs=[14],snapshots=[319],want_halos=True,want_particles=False)
```

## Snapshot Particle Data (z = 0)

Using `gsutil`, again you can obtain a certain halo's Gadget HDF5 snapshot (z = 0) via;

```python
download_caterpillar(output_dir="./",lxs=[14],snapshots=[319],want_halos=True,want_particles=False)
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.caterpillarproject.org/usage/access.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
