GBasf2 tutorial

Michel Villanueva, University of Mississippi

Based on the tutorials given by T. Hara, S. Cunliffe, J. Bennett, K. Huang.

How much data will we handle?

  • The Belle II experiment will produce x50 times the integrated luminosity reached by the previous B factories.

  • Belle II is expected to produce tens of petabytes of data per year.

  • To achieve the physics goals of the collaboration, data has to be distributed, reprocesed and analyzed.

  • Under this situation, it cannot be expected that one site will be able to provide all the computing resources for the collaboration.

  • O(10) TB per day. Where do you store that?

  • How do you distribute the information between 900 collaborators?

  • We need a sophisticated, customizable solution.

The Belle II distributed computing system

  • The Belle II collaboration has adopted a distributed computing model to perform such tasks (also known as grid computing).
  • It is a form of computing where a "super virtual computer" is composed of many networked loosely computers.
  • The grid of Belle II is conformed by central services (servers, databases, file catalogs) plus 60 sites providing resoures to the collaboration
    • Large facilities
    • Small sites
    • Cloud resources
    • etc.
  • The environment is heterogeneous, and running Basf2 jobs in several sites is a challenging task.

This is how a computing site usually looks:

This is how the grid "looks":

The Belle II computing model

"Wait... How do I login on the grid?"

  • Actually, you don't. There is no such thing as "login on grid".

 

"Then, how do I submit jobs?"

What is gBasf2?

  • gBasf2 is an extension of Basf2, from your desktop to the grid.
  • The same steering files used with Basf2 in your local environment can be used with gBasf2 on the grid.
  • The usual workflow is:

    • Developing a Basf2 steering file at first.
    • Testing it locally.
    • And then, submit the jobs to the grid with the same steering file.
  • A command line client, gbasf2, is used for submitting grid-based Basf2 jobs.

  • The gb2_ tools control and monitor the jobs and the files on the grid.

We will review briefly their usage in this tutorial.

GBasf2 and DIRAC

  • DIRAC uses X509 digital certificates to authenticate its users.
    • You will need a certificate to submit jobs to the grid.
  • Since the DIRAC user interface relies on some middleware components, this limits the operating environments in which it can function.
  • And the same happens with gBasf2. At this moment, only SL6 is supported.

 Before starting, some wise words

  • The grid is NOT a local computing system like KEKCC.
  • Once you submit jobs, your jobs will be assigned to the computing systems around the world.
  • If you job is a problematic one, it will be distributed to the world and all sites will be affected.
  • Therefore, you must check your jobs in a local computing system caferully before you submit jobs to the grid.

Prerequisites

Are you ready for working on the grid? Take a look at the following prerequisites.

  • A system with SL6 (or SLC6).
    • This requirement will change in the near future gbasf2 update.
  • You need to go through Computing GettingStarted. In short, you need:

    • A valid grid certificate issued within a year and installed in ~/.globus and on the web browser.

    • Belle Virtual Organization (VO) membership registered or renewed within a year at the VOMS server.

    • Registration in DIRAC.

A system with SL6 (or SLC6)

  • Don't you have one? If Singularity is avaiable in the site that you are working, it is easy to work with SL6:
singularity shell --cleanenv --bind /cvmfs:/cvmfs docker://sl:6

A valid grid certificate issued within a year and installed in ~/.globus and on the web browser

In [1]:
ls -l ~/.globus
total 16
-rw-r--r-- 1 michmx michmx 1273 Jan 31 08:35 617ff41b.0
-rw-r--r-- 1 michmx michmx 2765 Jan 31 08:35 cert.p12
-rw-r--r-- 1 michmx michmx 1635 Jan 31 08:35 usercert.pem
-r-------- 1 michmx michmx 1743 Jan 31 08:35 userkey.pem
In [2]:
openssl x509 -in ~/.globus/usercert.pem -noout -subject -dates
subject= /C=JP/O=KEK/OU=CRC/CN=HERNANDEZ Villanueva Michel Enrique
notBefore=Oct  2 20:38:32 2018 GMT
notAfter=Nov  6 20:38:32 2019 GMT
  • Be sure that your user key is readable only by you!     (chmod 400 userkey.pem)

Belle VO membership registered at the VOMS server


And registration in DIRAC

Do you have everything?

Good, let's start.

Part I: Submittting your first job to the grid

0. Confirm that your script works properly with a Basf2 release

  • Once again: before submitting jobs to the grid, be sure that your script works well on a local computer!

  • For this tutorial, we will use an example in the tutorials under /cvmfs called B2A101-Y4SEventGeneration.py, which generates e+e- -> Y(4S) -> BBbar events.

  • For convenience, we will store the location in a bash variable:

In [5]:
basf2TutorialsDir='/cvmfs/belle.cern.ch/sl6/releases/release-03-02-03/analysis/examples/tutorials'

Take a look inside B2A101-Y4SEventGeneration.py. It is a normal Basf2 steering file.

In [6]:
head -n 50 $basf2TutorialsDir/B2A101-Y4SEventGeneration.py && echo etc...
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

######################################################
#
# Stuck? Ask for help at questions.belle2.org
#
# Y(4S) -> BBbar event generation
#
# This tutorial demonstrates how to generate
#
# e+e- -> Y(4S) -> BBbar
#
# events with EvtGen in BASF2, where the decay of Y(4S)
# is specified by the given .dec file.
#
# The generated events are saved to the output ROOT file.
# In each event the generated particles (MCParticle objects)
# are stored in the StoreArray<MCParticle>.
#
# Contributors: A. Zupanc (June 2014), I.Komarov (Sept. 2018)
#
######################################################

import basf2 as b2
import generators as ge
import modularAnalysis as ma

# generation of 100 events according to the specified DECAY table
# Y(4S) -> Btag- Bsig+
# Btag- -> D0 pi-; D0 -> K- pi+
# Bsig+ -> mu+ nu_mu
#

# Defining custom path
my_path = b2.create_path()

# Setting up number of events to generate
ma.setupEventInfo(noEvents=100, path=my_path)

# Adding genberator
ge.add_evtgen_generator(path=my_path,
                        finalstate='signal',
                        signaldecfile=b2.find_file(
                            'analysis/examples/tutorials/B2A101-Y4SEventGeneration.dec'))

# If the simulation and reconstruction is not performed in the sam job,
# then the Gearbox needs to be loaded with the loadGearbox() function.
ma.loadGearbox(path=my_path)

etc...

You must run it in a Basf2 environment to confirm that it works properly.

  • On the grid, only the most recent libraries installed under /cvmfs/belle.cern.ch are avaiable.

  • Let's use release-03-02-03.

In [9]:
# Setting Basf2 environment in the Notebook
export BELLE2_NO_TOOLS_CHECK=1
source /cvmfs/belle.cern.ch/sl6/tools/b2setup release-03-02-03

basf2 -n 10 -l WARNING $basf2TutorialsDir/B2A101-Y4SEventGeneration.py
Warning: Changing existing PYTHONPATH from /cvmfs/belle.cern.ch/sl6/externals/v01-07-01/Linux_x86_64/opt/root/lib:/cvmfs/belle.cern.ch/sl6/releases/release-03-02-02/lib/Linux_x86_64/opt:/cvmfs/belle.cern.ch/tools to /cvmfs/belle.cern.ch/tools
Belle II software tools set up at: /cvmfs/belle.cern.ch/tools
Environment setup for release: release-03-02-03
Central release directory    : /cvmfs/belle.cern.ch/sl6/releases/release-03-02-03
[INFO] Steering file: /cvmfs/belle.cern.ch/sl6/releases/release-03-02-03/analysis/examples/tutorials/B2A101-Y4SEventGeneration.py
=================================================================================
Name                  |      Calls | Memory(MB) |    Time(s) |     Time(ms)/Call
=================================================================================
EventInfoSetter       |         10 |          0 |       0.00 |    0.03 +-   0.01
EvtGenInput           |         10 |          8 |      10.51 | 1050.61 +-3147.52
Gearbox               |         10 |          0 |       0.00 |    0.02 +-   0.00
RootOutput            |         10 |          0 |       0.01 |    1.04 +-   2.81
=================================================================================
Total                 |         10 |          8 |      10.52 | 1051.85 +-3150.37
=================================================================================

1. Set your gBasf2 enviroment

Running gBasf2 requieres some initial configuration

  • A gBasf2 installation. Long story short:
    cd <path where you want your installation>
    wget http://belle2.kek.jp/~dirac/dirac-install.py
    python dirac-install.py -V Belle-KEK
    source bashrc
    dirac-proxy-init -x  
    dirac-configure defaults-Belle-KEK.cfg
    
  • A more complete description of the procedure is here.
  • The gBasf2 environment set:
    source ~/gbasf2/BelleDIRAC/gbasf2/tools/setup
    gb2_proxy_init -g belle
    
  • After the gBasf2 installation, you will only need this two commands each time you start a session.
  • It is to be noted that Basf2 environment is incompatible with the gBbasf2 environment.

To confirm everything is properly set, we can simply run gb2_proxy_info.

In [3]:
gb2_proxy_info
subject      : /C=JP/O=KEK/OU=CRC/CN=HERNANDEZ Villanueva Michel Enrique/CN=2880491055/CN=1481862272
issuer       : /C=JP/O=KEK/OU=CRC/CN=HERNANDEZ Villanueva Michel Enrique/CN=2880491055
identity     : /C=JP/O=KEK/OU=CRC/CN=HERNANDEZ Villanueva Michel Enrique
timeleft     : 116:41:14
DIRAC group  : belle
rfc          : True
path         : /tmp/x509up_u1001
username     : michmx
properties   : NormalUser
VOMS         : True
VOMS fqan    : ['/belle']
Succeed with return value:
0
  • It shows information about your proxy, as the identity, the time left, the DIRAC group (belle), etc.

  • By default, your proxy will be valid for 24 hours.

2. Submit your first job to the Grid.

We will use gbasf2 to submit jobs to the grid. The basic usage is

gbasf2 your_script.py -p project_name -s available_basf2_release

where project_name is a name assigned by you, and avaiable_basf2_release is the avaiable Basf2 software version to use.

  • Please, do not use special characters in the project names ($, #, %, /, etc.), it could create problems with file names in some sites and in the databases.

You can always use the flags -h and --usage to see all the list of avaiable options and a list of examples.

In [7]:
gbasf2 --usage
Submit basf2 job to GRID sites.

All command line options can be embedded to steering file.
See --help-steering for available options.

Project name and basf2 software version are mandatory options,
and specified by --project and --setuprel, respectively.

If no destination site is specified by --site, DIRAC chooses best site.
However the sites specified by --banned_site are never used.
etc...
  • Remember that the sintax requires a project name and the release.

  • We are using release-03-02-03.

If you want to test your syntax before actually submiting the job, you can use the flag --dryrun:

In [10]:
gbasf2 -p gb2Tutorial_bbbarGeneration -s release-03-02-03 $basf2TutorialsDir/B2A101-Y4SEventGeneration.py --dryrun
Cannot obtain event number from steering file
************************************************
*************** Project summary ****************
** Project name: gb2Tutorial_bbbarGeneration
** Dataset path: /belle/user/michmx/gb2Tutorial_bbbarGeneration
** Steering file: /cvmfs/belle.cern.ch/sl6/releases/release-03-02-03/analysis/examples/tutorials/B2A101-Y4SEventGeneration.py
** Job owner: michmx @ belle (116:37:22)
** Preferred site / SE: None / None
** Input files for first job: None
** Processed events: 0 events
** Estimated CPU time per job: 0 min
************************************************

Notebooks with bash kernel does not support interactive input. I will use --force to skip confirmation.

This is fine for a tutorial, however, in your daily work (on the terminal) I strongly suggest to not avoid the confirmation.

(Or at least, test your syntax with --dryrun first, as we did here).

Everything looks fine. No typos. Let's submit the job.

In [37]:
gbasf2 -p gb2Tutorial_bbarGeneration -s release-03-02-03 $basf2TutorialsDir/B2A101-Y4SEventGeneration.py --force
Cannot obtain event number from steering file
************************************************
*************** Project summary ****************
** Project name: gb2Tutorial_bbarGeneration
** Dataset path: /belle/user/michmx/gb2Tutorial_bbarGeneration
** Steering file: /cvmfs/belle.cern.ch/sl6/releases/release-03-02-03/analysis/examples/tutorials/B2A101-Y4SEventGeneration.py
** Job owner: michmx @ belle (90:08:35)
** Preferred site / SE: None / None
** Input files for first job: None
** Processed events: 0 events
** Estimated CPU time per job: 0 min
************************************************
Initialize metadata for the project:
No attribute. Initialize Dataset...
Dataset initialization: OK
Dataset metadata attributes already exist (30): OK
Successfully finished.
<=====v4r6p3=====>
JobID = 117280956

You have submited your first job to the grid. Congratulations!

3. Monitoring your jobs

How to look at the status of your jobs? There are two ways, command line and web.

In command line, you can use gb2_project_summary and gb2_job_status (the flag -p will specify the project name).

In [38]:
gb2_project_summary -p gb2Tutorial_bbarGeneration
         Project             Owner    Status    Done   Fail   Run   Wait   Submission Time(UTC)   Duration 
==========================================================================================================
gb2Tutorial_bbarGeneration   michmx   Waiting   0      0      0     1      2019-07-24 21:58:03    00:00:00 
In [40]:
gb2_job_status -p gb2Tutorial_bbarGeneration
1 jobs are selected.

 Job id     Status    MinorStatus   ApplicationStatus       Site      
=====================================================================
117280956   Running   Application   Running             LCG.Napoli.it 

--- Summary of Selected Jobs ---
Completed:0  Deleted:0  Done:0  Failed:0  Killed:0  Running:1  Stalled:0  Waiting:0  

The second way is looking at the job monitor in the DIRAC web portal.

  • Open the portal, click on the logo at the bottom-left and go to Applications/Job Monitor.

  • You will have to click on 'Submit' to display the information.

You should see something like this:

4. Retrieving the output

Once the job has finished, you can list the output using gb2_ds_list.

The output file will be located below your user space (/belle/user/<username>/<project_name>)

In [11]:
gb2_ds_list|grep gb2Tutorial_bbarGeneration
/belle/user/michmx/gb2Tutorial_bbarGeneration
In [12]:
gb2_ds_list /belle/user/michmx/gb2Tutorial_bbarGeneration
/belle/user/michmx/gb2Tutorial_bbarGeneration/B2A101-Y4SEventGeneration-evtgen.root

To download the files, use gb2_ds_get.

In [17]:
# Let's create a directory to get the files of the tutorial under home
mkdir -p ~/gbasf2Tutorial && cd ~/gbasf2Tutorial

# Now, let's download the file
gb2_ds_get /belle/user/michmx/gb2Tutorial_bbarGeneration --force
Download 1 files from SE
Trying to download srm://dcblsrm.sdcc.bnl.gov:8443/srm/managerv2?SFN=/pnfs/sdcc.bnl.gov/data/bellediskdata/TMP/belle/user/michmx/bbar_generation_example/B2A101-Y4SEventGeneration-evtgen.root to /belle2/u/michmx/gbasf2Tutorial/bbar_generation_example/B2A101-Y4SEventGeneration-evtgen.root

Successfully downloaded files:
/belle/user/michmx/bbar_generation_example/B2A101-Y4SEventGeneration-evtgen.root in /belle2/u/michmx/gbasf2Tutorial/bbar_generation_example


Failed files:

(Again, with the flag to skip the confirmation since we are in a notebook)

You can confirm now the file is located at your local home, inside a directory with the name of the project's name.

In [18]:
ls -l ~/gbasf2Tutorial/gb2Tutorial_bbarGeneration
total 88
-rwxr-xr-x 1 michmx belle2 84449 Jul 23 17:03 B2A101-Y4SEventGeneration-evtgen.root

Keep in mind: as far as you have a gBasf2 installation, you can submit jobs or download files from any local machine.

Part II: A more realistic example

Input files on the grid

  • The most common task as user of the grid is the submission of jobs with input files

  • The files on the grid are distributed along the available resources.
  • Fortunatelly, as user you don't have to worry about the physical location. A file catalog keeps the record of where the files are located.

Let's take a look of how sets of data are handled on the grid.

Datasets and Datablocks on grid

  • On the grid, the files are clasified inside datasets.
  • Each dataset is located using a logical path name (LPN), which is a virtual path used to handle files distributed along the grid sites.
  • The first part of the LPN locates the dataset, starting always with /belle.

Examples of datasets are

  • /belle/MC/release-02-00-01/DB00000411/MC11/prod00005678/s00/e0000/4S/r00000/mixed/mdst

  • /belle/MC/release-03-00-00/DB00000487/SKIM10x1/prod00006915/e0000/4S/r00000/taupair/18570600/udst

  • /belle/Data/release-03-02-02/DB00000654/proc9/prod00008522/e0007/4S/r04119/mdst

Each dataset is subdivided in datablocks,

  • By design, each datablock contains a maximum of 1000 files.
  • If a dataset contains more than 1000 files, at least it will be subdivided in two datablocks.
  • The datablocks are labeled as subXX, with an incremental number per each one. For example:
In [21]:
gb2_ds_list /belle/MC/release-02-00-01/DB00000411/MC11/prod00005678/s00/e0000/4S/r00000/mixed/mdst
/belle/MC/release-02-00-01/DB00000411/MC11/prod00005678/s00/e0000/4S/r00000/mixed/mdst/sub00
/belle/MC/release-02-00-01/DB00000411/MC11/prod00005678/s00/e0000/4S/r00000/mixed/mdst/sub01
/belle/MC/release-02-00-01/DB00000411/MC11/prod00005678/s00/e0000/4S/r00000/mixed/mdst/sub02
/belle/MC/release-02-00-01/DB00000411/MC11/prod00005678/s00/e0000/4S/r00000/mixed/mdst/sub03

In gBasf2, the data handling unit is the datablock.

  • A gBasf2 project can be submitted per datablock, NOT per file.
  • Inside the project, gBasf2 will produce file-by-file jobs.
  • The number of output files in the project will be the number of files in the input datablock.

Find official MC/data samples

The information about produced MC, reprocessed data and skims is located at Confluence, under the Data Production WebHome.

  • Remember: the grid handles datasets and datablocks using the LPN.
  • Locating the files desired for analysis means getting the LPN of the datablock(s).

For example, for the MC12 campaing, the table of early phase 3 geometry samples) contains:

If we want to use

  • MC12
  • Early phase 3 geometry
  • Generic mixed sample
  • Without beam background (BGx0)

the LPN for the datablock desired is

/belle/MC/release-03-01-00/DB00000547/MC12b/prod00007393/s00/e1003/4S/r00000/mixed/mdst/sub00

Do you need additional info? We can use gb2_ds_query_dataset to retrieve the info stored in the metadata catalog.

In [1]:
gb2_ds_query_dataset -l /belle/MC/release-03-01-00/DB00000547/MC12b/prod00007393/s00/e1003/4S/r00000/mixed/mdst
mdst
	dataset: /belle/MC/release-03-01-00/DB00000547/MC12b/prod00007393/s00/e1003/4S/r00000/mixed/mdst
	creationDate: 2019-04-14 02:43:30
	lastUpdate: 2019-04-19 21:31:07
	nFiles: 54
	size: 114109310381
	status: good
	productionId: 7393
	transformationId: 26010
	owner: g:belle_mcprod
	mc: MC12b
	stream: 0
	dataType: mc
	dataLevel: mdst
	beamEnergy: 4S
	mcEventType: mixed
	generalSkimName: 
	skimDecayMode: 
	release: release-03-01-00
	dbGlobalTag: DB00000547
	sourceCode: 
	sourceCodeRevision: 
	steeringFile: MC/MC12b/release-03-01-00/DB00000547/4S/signal/mixed_eph3_BGx0.py
	steeringFileRevision: 
	experimentLow: 1003
	experimentHigh: 1003
	runLow: 0
	runHigh: 0
	logLfn: 
	parentDatasets: 
	description: MC12b production for phase e3 Y(4S) mixed mixed (BGx0)


Steering file and input datablock

We will use another example stored in the tutorials at analysis/examples/tutorials, called B2A602-BestCandidateSelection.py.

It takes as input BBbar "mixed" samples.

In [16]:
cat $basf2TutorialsDir/B2A602-BestCandidateSelection.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

#######################################################
#
# Stuck? Ask for help at questions.belle2.org
#
# This tutorial exemplifies how a best-candidate selection
# can be performed using rankByLowest()/rankByHighest() for
# different variables.
# The decay channel D0 -> K- pi+ (+ c.c.) is reconstructed,
# a vertex fit performed and variables dM and chiProb are then
# used to rank the candidates and saved via the CustomFloats
# ntuple tool.
#
# To look at the results, one might use:
# ntuple->Scan("D0_dM:D0_chiProb:D0_dM_rank:D0_chiProb_rank:D0_mcErrors")
#
#
# based on B2A403-KFit-VertexFit.py
#
# Contributors: C. Pulvermacher
#               I. Komarov (Demeber 2017)
#               I. Komarov (September 2018)
#
################################################################################

import basf2 as b2
import modularAnalysis as ma
import variables.collections as vc
import variables.utils as vu
import vertex as vx
import stdCharged as stdc
import variables as va
from stdPi0s import stdPi0s

# create path
my_path = b2.create_path()

# load input ROOT file
ma.inputMdst(environmentType='default',
             filename=b2.find_file('B2pi0D_D2hh_D2hhh_B2munu.root', 'examples', False),
             path=my_path)

# use standard final state particle lists
#
# creates "pi+:all" ParticleList (and c.c.)
stdc.stdPi('all', path=my_path)
# rank all pions of the event by momentum magnitude
# variable stored to extraInfo as pi_p_rank
ma.rankByLowest(particleList='pi+:all',
                variable='p',
                outputVariable='pi_p_rank',
                path=my_path)

va.variables.addAlias('pi_p_rank', 'extraInfo(pi_p_rank)')

# creates "K+:loose" ParticleList (and c.c.)
stdc.stdK(listtype='loose', path=my_path)

# keep only candidates with 1.8 < M(Kpi) < 1.9 GeV
ma.reconstructDecay(decayString='D0 -> K-:loose pi+:all',
                    cut='1.8 < M < 1.9',
                    path=my_path)

# perform D0 vertex fit
# keep candidates only passing C.L. value of the fit > 0.0 (no cut)
vx.vertexTree(list_name='D0',
              conf_level=-1,  # keep all cadidates, 0:keep only fit survivors, optimise this cut for your need
              ipConstraint=True,
              # pins the B0 PRODUCTION vertex to the IP (increases SIG and BKG rejection) use for better vertex resolution
              updateAllDaughters=True,  # update momenta off ALL particles
              path=my_path
              )

# smaller |M_rec - M| is better, add here a different output variable name, due to parentheses
ma.rankByLowest(particleList='D0',
                variable='abs(dM)',
                outputVariable='abs_dM_rank',
                path=my_path)

# maybe not the best idea, but might cut away candidates with failed fits
ma.rankByHighest(particleList='D0',
                 variable='chiProb',
                 path=my_path)

# Now let's do mixed ranking:
# First, we want to rank D candiadtes by the momentum of the pions
# Second, we want to rank those D candidates that were built with the highest-p by the vertex Chi2
# This doesn't have any sense, but shows how to work with consequetive rankings
#
# Let's add alias for the momentum rank of pions in D
va.variables.addAlias('D1_pi_p_rank', 'daughter(1,pi_p_rank)')
# Ranking D candidates by this variable.
# Candidates built with the same pion get the same rank (allowMultiRank=True).
ma.rankByHighest(particleList='D0',
                 variable='D1_pi_p_rank',
                 allowMultiRank=True,
                 outputVariable="first_D_rank",
                 path=my_path)
va.variables.addAlias('first_D_rank', 'extraInfo(first_D_rank)')
# Now let's rank by chiPrhob only those candiadtes that are built with the highest momentum pi
# Other canidadites will get this rank equal to -1
ma.rankByHighest(particleList="D0",
                 variable="chiProb",
                 cut="first_D_rank == 1",
                 outputVariable="second_D_rank",
                 path=my_path)
va.variables.addAlias('second_D_rank', 'extraInfo(second_D_rank)')


# add rank variable aliases for easier use
va.variables.addAlias('dM_rank', 'extraInfo(abs_dM_rank)')
va.variables.addAlias('chiProb_rank', 'extraInfo(chiProb_rank)')

# perform MC matching (MC truth asociation)
ma.matchMCTruth(list_name='D0', path=my_path)


# Select variables that we want to store to ntuple
fs_hadron_vars = vu.create_aliases_for_selected(list_of_variables=vc.mc_truth, decay_string='D0 -> ^K- ^pi+')

d0_vars = vc.vertex + \
    vc.mc_vertex + \
    vc.mc_truth + \
    fs_hadron_vars + \
    ['dM', 'chiProb', 'dM_rank', 'chiProb_rank', 'D1_pi_p_rank', 'first_D_rank', 'second_D_rank']


# Saving variables to ntuple
output_file = 'B2A602-BestCandidateSelection.root'
ma.variablesToNtuple(decayString='D0',
                     variables=d0_vars,
                     filename=output_file,
                     treename='D0',
                     path=my_path)


# Process the events
b2.process(my_path)

# print out the summary
print(b2.statistics)

We will use the BGx0 mixed sample mentioned before.

  • Before submiting the job, it is a good practice to confirm that the LPN of the datablock is correct.

Let's take a look to confirm that the files are there:

In [18]:
gb2_ds_list /belle/MC/release-03-01-00/DB00000547/MC12b/prod00007393/s00/e1003/4S/r00000/mixed/mdst/sub00 |head -n 5
/belle/MC/release-03-01-00/DB00000547/MC12b/prod00007393/s00/e1003/4S/r00000/mixed/mdst/sub00/mdst_000001_prod00007393_task10020000001.root
/belle/MC/release-03-01-00/DB00000547/MC12b/prod00007393/s00/e1003/4S/r00000/mixed/mdst/sub00/mdst_000002_prod00007393_task10020000002.root
/belle/MC/release-03-01-00/DB00000547/MC12b/prod00007393/s00/e1003/4S/r00000/mixed/mdst/sub00/mdst_000003_prod00007393_task10020000003.root
/belle/MC/release-03-01-00/DB00000547/MC12b/prod00007393/s00/e1003/4S/r00000/mixed/mdst/sub00/mdst_000004_prod00007393_task10020000004.root
/belle/MC/release-03-01-00/DB00000547/MC12b/prod00007393/s00/e1003/4S/r00000/mixed/mdst/sub00/mdst_000005_prod00007393_task10020000005.root

Submit a job with MC as input

Time to submit the job. The input datablock should be specify with the flag -i.

  • Don't forget the mandatory flags to set the project name and the release.

Again, I will use the flag --force to skip confirmation.

  • And again, I strongly recomend to not skip the confirmation unless there is a good reason.

  • (A Jupyter notebook in a tutorial which cannot handle interactive input is a good reason, right?).

Let's use --dryrun to test the syntax before actually submitting:

In [16]:
gbasf2 -p gb2Tutorial_BestCandidate -s release-03-02-03 \
       -i /belle/MC/release-03-01-00/DB00000547/MC12b/prod00007393/s00/e1003/4S/r00000/mixed/mdst/sub00 \
       $basf2TutorialsDir/B2A602-BestCandidateSelection.py --dryrun
************************************************
*************** Project summary ****************
** Project name: gb2Tutorial_BestCandidate
** Dataset path: /belle/user/michmx/gb2Tutorial_BestCandidate
** Steering file: /cvmfs/belle.cern.ch/sl6/releases/release-03-02-03/analysis/examples/tutorials/B2A602-BestCandidateSelection.py
** Job owner: michmx @ belle (94:03:14)
** Preferred site / SE: None / None
** Input files for first job: LFN:/belle/MC/release-03-01-00/DB00000547/MC12b/prod00007393/s00/e1003/4S/r00000/mixed/mdst/sub00/mdst_000001_prod00007393_task10020000001.root
** Number of data sets: 1
** Number of input files: 54
** Number of jobs: 54
** Processed data (MB): 108823
** Processed events: 10693783 events
** Estimated CPU time per job: 3301 min
************************************************

Everything seems good. Let's submit the jobs.

In [17]:
gbasf2 -p gb2Tutorial_BestCandidate -s release-03-02-03 \
       -i /belle/MC/release-03-01-00/DB00000547/MC12b/prod00007393/s00/e1003/4S/r00000/mixed/mdst/sub00 \
       $basf2TutorialsDir/B2A602-BestCandidateSelection.py --force
************************************************
*************** Project summary ****************
** Project name: gb2Tutorial_BestCandidate
** Dataset path: /belle/user/michmx/gb2Tutorial_BestCandidate
** Steering file: /cvmfs/belle.cern.ch/sl6/releases/release-03-02-03/analysis/examples/tutorials/B2A602-BestCandidateSelection.py
** Job owner: michmx @ belle (94:02:30)
** Preferred site / SE: None / None
** Input files for first job: LFN:/belle/MC/release-03-01-00/DB00000547/MC12b/prod00007393/s00/e1003/4S/r00000/mixed/mdst/sub00/mdst_000001_prod00007393_task10020000001.root
** Number of data sets: 1
** Number of input files: 54
** Number of jobs: 54
** Processed data (MB): 108823
** Processed events: 10693783 events
** Estimated CPU time per job: 3301 min
************************************************
Initialize metadata for the project:
No attribute. Initialize Dataset...
Dataset initialization: OK
Dataset metadata attributes already exist (30): OK
Successfully finished.
<=====v4r6p3=====>
JobID = 117274608 ... 117274661 (54 jobs)

Congratulations! You are now running jobs on the grid with official MC samples as input.

Let's take a look at the jobs in the project.

  • Remember, one job per file contained in the datablock will be running.

  • You can monitor your jobs either in

In [14]:
gb2_job_status -p gb2Tutorial_BestCandidate
54 jobs are selected.

 Job id     Status         MinorStatus         ApplicationStatus     Site   
===========================================================================
117274608   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274609   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274610   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274611   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274612   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274613   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274614   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274615   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274616   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274617   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274618   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274619   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274620   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274621   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274622   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274623   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274624   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274625   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274626   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274627   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274628   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274629   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274630   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274631   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274632   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274633   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274634   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274635   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274636   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274637   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274638   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274639   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274640   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274641   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274642   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274643   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274644   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274645   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274646   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274647   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274648   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274649   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274650   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274651   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274652   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274653   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274654   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274655   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274656   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274657   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274658   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274659   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274660   Waiting   Pilot Agent Submission   Unknown             Multiple 
117274661   Waiting   Pilot Agent Submission   Unknown             Multiple 

--- Summary of Selected Jobs ---
Completed:0  Deleted:0  Done:0  Failed:0  Killed:0  Running:0  Stalled:0  Waiting:54  

Once the jobs finish, you can download the output using gb2_ds_get, as we did in the first project.

Rescheduling jobs

Sometimes, things do not goes well. A few jobs can fail because a large list of reasons, like

  • A timeout in the transfer of a file between sites.
  • A central service not avaiable for a short period of time.
  • An issue in the site hosting the job.
  • etc.

If you find that some of your jobs failed, you need to reschedule these jobs by yourself.

You can use gb2_job_reschedule -p <project name>:

In [13]:
gb2_job_reschedule --usage | tail -n 13
Resubmit failed jobs or projects.
Only jobs which have fatal status (Failed, Killed, Stalled) are affected.
Exact same sandbox and parameters are reused. Thus you may need to submit different job if they are wrong.

By default, select only your jobs in current group.
Please switch group and user name by options.
All user's jobs are specified by '-u all'.

Examples:

% gb2_job_reschedule -j 723428,723429
% gb2_job_reschedule -p project1 -u user
    

Or you can use the job monitor in the DIRAC web portal, selecting the failed jobs and clicking the 'Reschedule' button:

What if all my jobs failed?

  • If ALL your jobs failed, most probably something is wrong with the steering file or the gBasf2 arguments.

    (Did you test your steering file locally before submiting the jobs?)

  • An useful way to track which was the problem is (if possible) downloading the output sandbox. It contains the logs related to your job.

It is also possible to retrieve the log files directly from the command line using gb2_job_output:

In [15]:
gb2_job_output -p gb2Tutorial_bbarGeneration
download output sandbox below ./log/PROJECT/JOBID
1 project are selected.
Please wait...
                                 downloaded project: gb2Tutorial_bbarGeneration                                  
================================================================================================================
Downloaded: "Job output sandbox retrieved in /home/michmx/gb2_tutorial/log/gb2Tutorial_bbarGeneration/117280956" 

In [16]:
ls -l /home/michmx/gb2_tutorial/log/gb2Tutorial_bbarGeneration/117280956/
total 20
-rw-r--r-- 1 michmx michmx  221 Jul 24 16:59 B2A101-Y4SEventGeneration-evtgen.metadata
-rw-r--r-- 1 michmx michmx 2412 Jul 24 16:58 job.info
-rw-r--r-- 1 michmx michmx 6229 Jul 24 17:00 Script1_basf2helper.py.log
-rw-r--r-- 1 michmx michmx  930 Jul 24 17:00 std.out
In [17]:
cat /home/michmx/gb2_tutorial/log/gb2Tutorial_bbarGeneration/117280956/Script1_basf2helper.py.log
<<<<<<<<<< basf2helper.py Standard Output >>>>>>>>>>
Unpack gb2Tutorial_bbarGeneration-inputsandbox.tar.bz2
2019-07-24 21:58:55 UTC Unknown INFO: initialization
2019-07-24 21:59:02 UTC Unknown INFO: basf2 running
2019-07-24 21:59:02 UTC Unknown INFO: ApplicationStatus = Running
Belle II software tools set up at: /cvmfs/belle.cern.ch/tools
Environment setup for release: release-03-02-03
Central release directory    : /cvmfs/belle.cern.ch/sl6/releases/release-03-02-03
Warning: The release release-03-02-03 is not supported any more. Please update to release-03-02-02
Environment setup for build option: opt
basf2 B2A101-Y4SEventGeneration-gb2Tutorial_bbarGeneration.py
[INFO] Steering file: B2A101-Y4SEventGeneration-gb2Tutorial_bbarGeneration.py
[INFO] Starting event processing, random seed is set to '89120803bb4ad58c43596234f0a4a613f1f9e4206cb6d14466d3a601396df8be'
=================================================================================
Name                  |      Calls | Memory(MB) |    Time(s) |     Time(ms)/Call
=================================================================================
EventInfoSetter       |        101 |          0 |       0.00 |    0.02 +-   0.00
EvtGenInput           |        100 |          8 |       1.53 |   15.31 +- 147.27
Gearbox               |        100 |          0 |       0.00 |    0.01 +-   0.00
RootOutput            |        100 |          0 |       0.00 |    0.05 +-   0.05
=================================================================================
Total                 |        101 |          8 |       1.55 |   15.32 +- 146.60
=================================================================================
Found following root files to be uploaded
 B2A101-Y4SEventGeneration-evtgen.root
2019-07-24 21:59:14 UTC Unknown INFO: metadata extration
bash basf2exec.sh 'python3 ./metadata_converter.py ./B2A101-Y4SEventGeneration-evtgen.root getDataDescription' 'release-03-02-03' 'N/A'
dataDescription= {}
bash basf2exec.sh 'python3 ./metadata_converter.py ./B2A101-Y4SEventGeneration-evtgen.root /belle/user/michmx/gb2Tutorial_bbarGeneration/B2A101-Y4SEventGeneration-evtgen.root' 'release-03-02-03' 'N/A'
Belle II software tools set up at: /cvmfs/belle.cern.ch/tools
Environment setup for release: release-03-02-03
Central release directory    : /cvmfs/belle.cern.ch/sl6/releases/release-03-02-03
Warning: The release release-03-02-03 is not supported any more. Please update to release-03-02-02
Environment setup for build option: opt
python3 ./metadata_converter.py ./B2A101-Y4SEventGeneration-evtgen.root /belle/user/michmx/gb2Tutorial_bbarGeneration/B2A101-Y4SEventGeneration-evtgen.root
rel= /cvmfs/belle.cern.ch/sl6/releases/release-03-02-03
b2file-metadata-add -l /belle/user/michmx/gb2Tutorial_bbarGeneration/B2A101-Y4SEventGeneration-evtgen.root ./B2A101-Y4SEventGeneration-evtgen.root
Metadata converter: Successfully update basf2 metadata
2019-07-24 21:59:19 UTC Unknown INFO: output verification
bash basf2exec.sh 'python3 ./output_verifier.py ./B2A101-Y4SEventGeneration-evtgen.root None 0' 'release-03-02-03' 'N/A'
Belle II software tools set up at: /cvmfs/belle.cern.ch/tools
Environment setup for release: release-03-02-03
Central release directory    : /cvmfs/belle.cern.ch/sl6/releases/release-03-02-03
Warning: The release release-03-02-03 is not supported any more. Please update to release-03-02-02
Environment setup for build option: opt
python3 ./output_verifier.py ./B2A101-Y4SEventGeneration-evtgen.root None 0
Output verifier: planned = 0, generated = 100, stored = 100
Output verifier: Skip to check number of events
Output verifier: Successfully verified output file ./B2A101-Y4SEventGeneration-evtgen.root
2019-07-24 21:59:25 UTC Unknown INFO: data uploading
2019-07-24 21:59:25 UTC Unknown INFO: ApplicationStatus = Preparing to upload
Metadata path /belle/user/michmx/gb2Tutorial_bbarGeneration: OK
2019-07-24 21:59:26 UTC Unknown INFO: Destination SEs=Napoli-TMP-SE
trying to upload/register /belle/user/michmx/gb2Tutorial_bbarGeneration/B2A101-Y4SEventGeneration-evtgen.root 
0 records found in /belle/user/michmx/gb2Tutorial_bbarGeneration
no AMGA metadata for /belle/user/michmx/gb2Tutorial_bbarGeneration/B2A101-Y4SEventGeneration-evtgen.root
No Replica found:/belle/user/michmx/gb2Tutorial_bbarGeneration/B2A101-Y4SEventGeneration-evtgen.root
2019-07-24 21:59:31 UTC Unknown INFO: GUIDs not found from POOL XML Catalogue (and were generated) for: ./B2A101-Y4SEventGeneration-evtgen.root
2019-07-24 21:59:31 UTC Unknown INFO: File Info: Size=85057 Checksum=62820f57 GUID=365D45A6-2843-E8CA-5CC8-63DCFBABEBCE
Upload: ./B2A101-Y4SEventGeneration-evtgen.root
2019-07-24 21:59:33 UTC Unknown INFO: Parameter DestinationSE = Napoli-TMP-SE
2019-07-24 21:59:33 UTC Unknown INFO: Parameter InitialDestinationSE = Napoli-TMP-SE
2019-07-24 21:59:33 UTC Unknown INFO: ApplicationStatus = Uploading
2019-07-24 21:59:39 UTC Unknown INFO: SE Napoli-TMP-SE: ready to transfer: 2 < 500
2019-07-24 21:59:39 UTC Unknown/FailoverTransfer INFO: Attempting dm.putAndRegister('/belle/user/michmx/gb2Tutorial_bbarGeneration/B2A101-Y4SEventGeneration-evtgen.root','/home/prdbelle010/home_cream_753543009/CREAM753543009/DIRAC_6WGfuGpilot/117280956/./B2A101-Y4SEventGeneration-evtgen.root','Napoli-TMP-SE',guid='365D45A6-2843-E8CA-5CC8-63DCFBABEBCE',catalog='None', checksum = '62820f57')
2019-07-24 21:59:53 UTC Unknown/FailoverTransfer INFO: dm.putAndRegister successfully uploaded and registered ./B2A101-Y4SEventGeneration-evtgen.root to Napoli-TMP-SE
2019-07-24 21:59:55 UTC Unknown INFO: Checksum confirmed: 62820f57
Upload: OK
file size: 85057
catalog size: 85057
file size: OK
Data upload: OK
Register metadata of ./B2A101-Y4SEventGeneration-evtgen.root
2019-07-24 21:59:59 UTC Unknown INFO: ApplicationStatus = Registering
Reading metadata...
Prepare dataset
Dataset attributes already exist (17): OK
bulkInsert: OK
Metadata registration: OK
2019-07-24 22:00:07 UTC Unknown INFO: Successfully handled /belle/user/michmx/gb2Tutorial_bbarGeneration/B2A101-Y4SEventGeneration-evtgen.root
Successfully upload/register all outputs.
2019-07-24 22:00:07 UTC Unknown INFO: ApplicationStatus = Done
2019-07-24 22:00:07 UTC Unknown INFO: Done

 Where to go for help?

Some pages at Confluence are prepared with additional information:

Take a look to the previous gBasf2 tutorials (they contain some advanced topics not covered here).

You are strongly recommended to join the comp users forum, where you can ask for help and receive announcements on releases and system issues.

You can also ask in questions.belle2.org. Even you can answer questions from other users!

And, we need your help!

  • Have you ever experience that
    • Your window has gone somewhere?
    • Your keyboard is frozen?
    • Your wifi connection is down?
  • Have you seen this before?

  • Computers are not so smart. Sometimes, they fail.
  • "Sometimes" x Huge Resources = "Often"
  • The computing system neeed 24 hour x 7 day care.
  • You will learn a lot about the computing system, and it is a very important service to the collaboration.

Some final remarks

  • gBasf2 is still under development. There must be many points which you do not satisfy with.

  • To imporve and make gbasf2 more user-friendly, we need your feedback and help.

  • But in parallel, the number of gbasf2 developers is limited now.

  • It is writen in Python. If you are interested in coding, please contact

    comp-dirac-devel@belle2.org

    and consider to contribute the improvement of gbasf2   :)

 Thank you

Backup

Basf2 releases avaiable on the grid

  • On the grid, only the most recent libraries installed under /cvmfs/belle.cern.ch are avaiable.

    • To explicity see the avabiable Basf2 releases, use the command gb2_check_release:
In [4]:
gb2_check_release
****************************************
* Local BelleDIRAC release : v4r6p3
* Master timestamp: 2019-07-25 12:41:07 
* Local timestamp: 2019-07-04 07:30:00 
****************************************
Your installation is up-to-date: v4r6p3

Available basf2 releases:
release-03-02-03
release-03-02-02
release-03-02-00
release-03-01-02
release-03-01-00
release-03-00-03
release-02-01-00
(It will also confirm that your gBasf2 installation is up-to-date, otherwise an update will be suggested).

If you have already a proxy generated (you run gb2_proxy_init recently), returning to the gBasf2 enviroment only requires:

In [2]:
source ~/gbasf2/BelleDIRAC/gbasf2/tools/setup
/home/michmx/gbasf2

Jupyter installed in the Gbasf2 enviroment and a bash kernel for Jupyter notebooks

  pip install jupyter bash_kernel --trusted-host pypi.python.org --trusted-host pypi.org --trusted-host files.pythonhosted.org
  python -m bash_kernel.install

(this will be included in future gBasf2 releases).

Once this is done, you can run the Jupyter notebook server.

  jupyter notebook

If you want to run in a remote site, you need to follow the instructions for running Jupyter notebooks at remote sites here.

In [ ]: