Analyzing Data

Analysis Modules

Two helpful libraries are our modules and analysis tools.
1
git clone [email protected]:caterpillarproject/modules.git # Python 2.7+
2
git clone [email protected]:caterpillarproject/analysis.git # Python 2.7+
Copied!
Add these to your PYTHONPATH environment variable, e.g. for .cshrc add:
1
setenv PYTHONPATH /path/to/modules:$PYTHONPATH
2
setenv PYTHONPATH /path/to/analysis:$PYTHONPATH
Copied!
Lastly you will need to install asciitable, h5py and pandas.
These tools aren't critical, but they may make your life a lot easier.

Halo Catalogs

Once you have the modules ready, uou can load a ROCKSTAR catalogue for a given snapshot simply as:
1
2
rscat = htils.load_rscat(hpath,319,verbose=True) # snapshot = 319 (z = 0)
Copied!
Once you do this however, you will have access to the following methods:
1
def __init__(self, dir, snap_num, version=2, sort_by='mvir', base='halos_', digits=2, AllParticles=False):
2
def get_particles_from_halo(self, haloID):
3
# @param haloID: id number of halo. Not its row position in matrix
4
# @return: a list of particle IDs in the Halo
5
def get_subhalos_from_halo(self,haloID):
6
#Retrieve subhalos only one level deep.
7
#Does not get sub-sub halos, etc.
8
def get_subhalos_from_halos(self,haloIDs):
9
#Returns an array of pandas data frames of subhalos. one data frame
10
#for each host halo. returns only first level of subhalos.
11
def get_subhalos_from_halos_flat(self,haloIDs):
12
#Returns a flattened pandas data frame of all subhalos within
13
#the hosts given by haloIDs. Returns only first level of subhalos.
14
def get_hosts(self):
15
# Get host halo frame only
16
def get_subs(self):
17
# Get subhalo frame only
18
def get_all_subs_recurse(self,haloID):
19
# Retrieve all subhalos: sub and sub-sub, etc.
20
# just need mask of all subhalos, then return data frame subset
21
def get_all_subhalos_from_halo(self,haloID):
22
# Retrieve all subhalos: sub and sub-sub, etc.
23
# return pandas data frame of subhalos
24
def get_all_sub_particles_from_halo(self,haloID):
25
#returns int array of particle IDs belonging to all substructure
26
#within host of haloID
27
def get_all_particles_from_halo(self,haloID):
28
#returns int array of all particles belonging to haloID
29
def get_all_num_particles_from_halo(self,haloID):
30
# Get the actual number of particles 'total_npart' from halo as opposed to 'npart'.
31
# mainly for versions less than 7
32
def get_block_from_halo(self, snapshot_dir, haloID, blockname, allparticles=True):
33
# quick load a block (hdf5 block) of particles belong to halo.
34
# e.g. you want particle positions for haloid = 10 (use blockname="pos")
35
# this works fastest on snapshots ordered by id and requires import readsnapHDF5_greg
36
def H(self):
37
#returns hubble parameter for rockstar run
38
def get_most_gravbound_particles_from_halo(self,snapshot_dir, haloID):
39
# Gets most bound particles just based on potential energy for specific halo ID
40
def get_most_bound_particles_from_halo(self, snapshot_dir, haloID):
41
# Gets most bound particles for halo based on pot. energy and kin. energy
42
# if potential block does not exist, it is calculate assuming a spherical halo
43
def getversion(self):
44
# returns the version of rockstar the run was done within
45
# this will include versions made by Alex Ji, Greg Dooley & Brendan Griffen
46
Copied!
One workflow might look as follows:
1
import haloutils as htils
2
3
# load the first 24 halos
4
# (just change 14 to 11 for the lower resolution halos)
5
6
hpaths = htils.get_paper_paths_lx(14)
7
hpath = hpaths[0] # select first Caterpillar halo
8
9
# strip down the path to just the halo id
10
parentid = htils.get_parent_hid(hpath)
11
12
# get the central host id of the zoom-in halo
13
# parentid is named from the parent simulation
14
# the zooms have different ids
15
zoomid = htils.load_zoomid(hpath)
16
17
# Return the pandas data frame with all the halos,
18
# return mvir for example
19
lastsnap = htils.get_lastsnap(hpath)
20
halos = htils.load_rscat(hpath,lastsnap,verbose=True)
21
mvir_host = halos.ix[zoomid]['mvir'] # units of Msol/h
22
23
# get one specific parameter of the host at
24
# z = 0 (quick version of above)
25
mvir_host = htils.get_quant_zoom(hpath,'mvir') # units of Msol/h
26
27
# return all halo virial masses
28
all_mvir = halos['mvir'] # units of Msol/h
29
30
# get the merger tree of the host and all its subs
31
cat = htils.load_mtc(hpath,haloids=[zoomid])
32
33
# read in every halo's merger tree
34
all_trees = htils.load_mtc(hpath,indexbyrsid=True)
35
tree = all_trees[zoomid]
36
37
# if you feed more than one id to the above
38
# it would be cat[0],cat[1] etc.
39
40
# you can now access the main branch > try tree. then tab complete to see other functions
41
mainbranch = tree.getMainBranch()
42
43
# we also have a short hand version:
44
# mainbranch = htils.get_mainbranch(hpath)
45
# which is very fast and skips the reading of the entire progenitor tree if you don't need dit
46
# see further down this page for more options
47
# output the main branch virial mass
48
49
print mainbranch['mvir']
Copied!
Here is an example of some of these in action:
1
# load required modules
2
import haloutils as htils
3
import numpy as np
4
5
# select Cat-1 halo
6
hpaths = htils.get_paper_paths_lx(14)[0]
7
8
# select the last snapshot (z = 0)
9
snapshot = htils.get_lastsnap(hpath)
10
11
# load rockstar id of the host halo
12
zoomid = htils.load_zoomid(hpath)
13
14
#Load Halo Catalogue
15
halos = htils.load_rscat(hpath,snapshot)
16
17
#Select host halos
18
hosts = halos.get_hosts()
19
20
#Select subhalos
21
subs = halos.get_subs()
22
23
# Get positions of subs and hosts
24
print hosts[['posX','posY','posZ']]
25
print subs[['posX','posY','posZ']]
26
27
#Get particle ids from halo of interest (here it is the host)
28
print halos.get_particles_from_halo(zoomid)
29
30
#Get virial radius of a specific halo id (in this case the host)
31
print halos.ix[zoomid]['rvir'] # units of kpc/h
Copied!

Merger Trees

Similarly the merger tree catalogues (once loaded) have a number of its own functions.
1
def getMainBranch(self, row=0):
2
"""
3
@param row: row of the halo you want the main branch for. Defaults to row 0
4
Uses getSubTree, then finds the smallest dfid that has no progenitors
5
@return: all halos that are in the main branch of the halo specified by row (in a np structured array)
6
"""
7
def getMMP(self, row):
8
"""
9
@param row: row number (int) of halo considered
10
@return: row number of the most massive parent, or None if no parent
11
"""
12
def getNonMMPprogenitors(self,row):
13
"""
14
return row index of all progenitors that are not the most massive
15
These are all the subhalos that were destroyed in the interval
16
"""
Copied!
These can be used in the following example:
1
# load required modules
2
3
import haloutils as htils
4
import numpy as np
5
6
# select the caterpillar halo of interest
7
# based on halo id and resolution level
8
9
hid = 1387186
10
lx = 14
11
12
hpath = htils.hid_hpath_lx(hid,lx)
13
14
# select the last snapshot (z = 0)
15
snapshot = htils.get_lastsnap(hpath)
16
17
# load rockstar id of the host halo
18
zoomid = htils.load_zoomid(hpath)
19
20
# Load every tree (VERY slow)
21
trees = htils.load_mtc(hpath)
22
23
# Just look at the first tree
24
# tree = cat[0]
25
# You can access all trees via: cat[0], cat[1] etc.
26
27
# If you want to load the tree for a particular ID from the rockstar catalogue
28
trees = htils.load_mtc(haloids=[zoomid]) # in this case the host (quite slow)
29
tree = cat[0] # tree will contain all progenitors, including subhalos
30
31
# Just say you want to index the merger tree by the z = 0 root rockstar id
32
# (i.e. the base of the tree). This is quite powerful because you might select
33
# halos of interest in the rockstar catalogue then want to know, just for those
34
# what their merger tree is (e.g. say you just want dwarf systems of a particular size)
35
36
trees = htils.load_mtc(haloids=[zoomid],indexbyrsid=True)
37
tree = trees[zoomid]
38
39
# You can just loop through rockstar ids (if you gave it more than one id above)
40
# and get out the accretion histories for a small sample of trees quite quickly
41
42
# to get the main branch
43
44
main_branch = tree.getMainBranch()
45
46
# print mass evolution
47
for mass,scale in zip(main_branch['mvir'],main_branch['scale']):
48
print "%3.2f: %3.2e" % (scale,mass)
49
50
# output
51
1.00: 2.86e+14
52
0.99: 2.89e+14
53
0.98: 2.90e+14
54
0.97: 2.90e+14
55
0.95: 2.88e+14
56
0.94: 2.86e+14
57
0.93: 2.83e+14
58
0.92: 2.80e+14
59
0.91: 2.76e+14
60
0.90: 2.73e+14
61
0.89: 2.64e+14
62
0.88: 2.47e+14 ...
63
Copied!
A function to find the descendant branch of any halo in merger tree catalogue. You should use it as follows:
1
# get a tree of interest
2
mtc = haloutils.load_zoom_mtc(hpath)
3
host = mtc.Trees[0]
4
5
# make a dictionary that maps ids to rows
6
desc_map = host.get_desc_map()
7
8
# get the descendent branch.
9
desc_branch = host.getDescBranch(row, desc_map)
10
11
# similarly for main branches, you can use a dictionary that speeds up the
12
# getMainBranch call substantially.
13
mmp_map = host.get_mmp_map()
14
main_branch = host.getMainBranch(row, mmp_map)
15
16
# you can get the branches without making the map as below, but they will be much slower.
17
# only faster if you use ~ < 10 calls to the function.
18
desc_branch = host.getDescBranch(row)
19
main_branch = host.getMainBranch(row)
Copied!

Particle Data

If you want the Gadget header:
1
# get the first halo in the catalogue
2
3
hpaths = htils.get_paper_paths_lx(14)[0]
4
5
# read its Gadget header
6
header = htils.get_halo_header(hpath)
7
8
# header contains the typical Gadget info
9
header.boxsize header.massarr header.omegaL
10
header.cooling header.metals header.redshift
11
header.double header.nall header.sfr
12
header.feedback header.nall_highword header.stellar_age
13
header.filenum header.npart header.time
14
header.hubble header.omega0
Copied!
Be sure to divide the relevant quantities (pos, rvir etc.) by header.hubble. See the Gadget section in the sidebar for more information on the header and block types available.
If you wanted to get the postions of all the particles for a specific halo (or any block).
1
pos = htils.load_partblock(hpath,zoomid,"POS ") # units Mpc/h
2
# "VEL ", "ID ", "MASS" etc. also work (notice the space)
3
# check readsnapshots/readsnapHDF5.py for the other block names
4
# you can call in the caterpillar modules
5
6
print pos*1000. # kpc/h
7
8
# output
9
[ 32085.45117188 57312.79296875 44314.35546875]
10
[ 27002.18554688 10062.73242188 9899.70019531]
11
[ 26711.08789062 9560.22460938 10165.18847656]
12
[ 49757.3515625 21470.00195312 6461.90917969]
13
...
Copied!
If you want to read in the entire block, use the following:
1
import haloutils as htils
2
3
hid = 1387186
4
lx = 14
5
hpath = htils.hid_hpath_lx(hid,lx)
6
7
pos = htils.load_partblock(hpath,319,"POS ")
8
mass = htils.load_partblock(hpath,319,"MASS")
Copied!
Note that the mass block will have different values depending on how many layers of refinement there are for that zoom in simulation. If you use this code on a parent simulation it will be an array of length N all of the same value because there is only one particle type.
If you wanted just the ids for a selection of particle ids:
1
pos = htils.load_partblock(hpath,zoomid,"POS ",partids=[listofids]) # units Mpc/h
Copied!
Last modified 2yr ago