Thursday September 24, 2020 Liaison Meeting Minutes
Facility News
- Change in BNL status transition to additional staff return to work Phase2
- change in presentation 1 slide summary from each group
Network & Facility Operations
- CSI tape library assembled now, provisioning fiber and copper cable
- new HPSS core server pair scheduled for 10/5/2020
- NSLS-II HPC cluster 2 racks [33-4] [33-5] in QCDOC purchase and cabling in progress
- NSLS-II Lustre 2 racks [33-2] [33-3] in QCDOC purchase in progress
- sPHENIX central storage (5.4 PB) rack [46-8] purchase in progress
- migration of RHIC HPSS Movers from row16 to row 44S by mid Oct. 2020
- BCF HPSS Silo#2 relocation to CDCE Silo#8 expected in mid-Nov. 2020
- B515 <=> B725 basement fiber conduit expected sometime Oct. 2020
- B515 SciCore Cut-In Intervention tentatively scheduled Dec 1, 2020 06:00 - 18:00
Storage
- ATLAS: SRR and QoS enabled after upgrade; adding 2 new storage HW with ZFS
- Belle-II dCache: SRR enabled after upgrade calibration user added
- STAR XROOTd: working on adding user auth
- PHENIX dCache: retire old write pools; enable qgp006 as write pools
- HPSS: create two FF for PHENIX; relocations to CDCE; new CSI silo; new HPSS core
- Belle-II RUCIO migration work continues
- Belle-II Calibration farm setup ongoing
- PHENIX/sPHENIX question about adding new storage so sPHENIX upgrades won’t be hindered by legacy
Fabric
- NSLS-II HPC cluster PO dispatched to Supermicro:
- 2 racks, 30 nodes (12 with 2 x V100S GPU)
- 2 x Intel Xeon Gold 6252 Cascade Lake CPU @2.10 GHz (96 logical cores)
- 768 GB (12 x 64 GB DDR4-2933 DIMMs) memory
- 1 x EDR Infiniband
- 2 x 10 Gbps NICs (initially using 1 port)
- NSLS-II AD accounts/authentication requirement, working on it with ITD/Centrify
- HTC/HPC Singularity upgrade resolves ‘singularity shell’ in 3.6 release
- Test 3.6.x on rplay53 updated to 3.6.3 to resolve a security issue
- Also testing on ATLAS T1, planning to upgrade entire farm within 2 weeks
- Password changing interface went live
- https://web.sdcc.bnl.gov/apps/passwd
- closing 85 ATLAS T1 (2015 PO), moving to shared pool shortly
- creating a local Docker registry with pull-through caching
- discussing sPHENIX VOMS alternatives with SDCC ISSOs & federated auth admins
Tools Services
- BNLBox newly updated BNL usage policy paced in home directory of each user
- installed and testing anti-virus (ClamAV) on the BNLBox testbed
- AV scanning has been requested as a component of our cybersecurity profile
- ELK: monitor usage of BNLBox (file transfer stats, # users, etc), extended to Globus usage recently
- Digital Repositories: EIC/Zenodo application port (443) OPEN TO THE WORLD
- EIC digital repo is restricted to BNL people part of InCommon/COmanage, using SDCC/BNL Incommon IdPs
- Cybersecurity policy enabled in the login page
- Discussion on EIC Zenodo digital repository community manager (curator)
- Currently Zenodo only allows one curator per community
- Discussion of moving sPHENIX digital repository from CSI-based custom Invenio app to InvenioRDM
- VOMS: coordinating with OSG to provide a VOMS client solution for sPHENIX
- Service will be configured to allow sPHENIX jobs to run at remote sites via PanDA
General Services
- planning to retire rssh & atlasgw ssh gateways soon
- rftpexp gateways are not affected as of now still work to be done
- have 4 ssh.sdcc.bnl.gov ssh gateways in production now ( can add more if load requires)
- NX testing status? Need more testers & feedback
- Close to putting in production (need better OTP setup system first)
- Warning still testing not in production yet, more changes may still occur
- questions about a “Log out” button or feature where users can explicitly log out
- Status of password changes topical discussion will conclude the meeting
Topical Discussion: Compute Resource Allocation Policies and Procedures
- As of 9/24: 358 / 1782 total “active” accounts (~ 20% in 10 days)
- 1323 active + 459 active + expired principal (never set password since 2 years IPA-migration)
- There is a chance CYBER will enforce active accounts with expired principal be deactivated (no ssh)
- please spread the word to maximize conversion with this transition
- if any user doesn’t know their current password submit a ticket to RT useraccts to get a temp password
- RT-RACF-UserAccounts@bnl.gov
- Download slides for more detailed stats and by-group stats
- Reminders will go out with increasing frequency as the deadline approaches (10/12/2020)
There are minutes attached to this event.
Show them.