Batch processing with (embarrassingly) parallel workloads affords numerous possible architectures all with their own tradeoffs and challenges. We will discuss the various architectures we've employed at the RACF/SDCC to tackle those challanges and our experiences with them
The historical implementation of scientific computing infrastructure within the Diamond Light Source has been to support the initial stages of data acquisition, from immediate sample validation, to the first stages of data processing. For many users, once the visit period had completed there was little if any continued interaction with the data stored at Diamond.
As data volumes increase it...
In order to better support the scientists using STFC Facilities (including The Diamond LightSource, the ISIS neutron source and CLF laser Facility) STFC's Scientific Computing Department has been developing the Data Analysis as a Service platform.
DAaaS brings together the facilities data already archived by SCD with a flexible, extensible platform to deploy scientific workflows on STFC's...
Abstract: We present a general overview of the challenges and opportunities in marshaling computational intensity around bursts of data generated in a (mostly) scheduled manner. Where computation “fits” in the data analytic pipeline (between detector and actionable knowledge) is an important architectural concern for advanced instruments with bursty data. Design boundary conditions include...
LCLS and SLAC have done extensive analysis to determine the facility computing needs over the next decade. Based on the set of experiments currently planned and based on today’s understanding of the computing and data requirements we estimated the computing demand from LCLS-II. This presentation describes the methodology adopted for deriving the computing rates, throughput and storage for...
This presentation will give a short overview of the current setup and operation for the DAQ and offline analysis part. The second half will focus on current activities/improvements for the storage and online data analysis services.
A short update of the Status of the NSLS-II computing situation, its challenges and futures. Including data retention, intentions, remote access requirements, remote control now and further, Real time cluster needs, experimental support and post experiment analysis.
Some discussion of the things that work well and some things that don't.
This talk gives a summary of the current structure of network systems deployed in B515 based RACF/SDCC facility, emphasizing of the architecture of the Science Zone and its expected evolution in 2019-2023 period during which the new B725 based datacenter to be constructed under umbrella of the BNL Computing Facility Revitalization (CFR) project is expected to enter the fully operational state....
ESnet is the DOE’s High Performance Network (HPN). It is viewed as a scientific user facility and in many ways an instrument to accelerate research and discovery, with a history of alignment with national laboratory enterprise network organizations to accomplish facility missions and objectives, beyond the campus perimeter to collaborations scaling both nationally and globally. A discussion of...
Authentication and authorization is an integral part of any organization to maintain integrity and control over shared resources. Historically, RACF/SDCC projects were responsible for maintaining their own user accounts and IT resources to perform the experiments the project received funding for. We will discuss our current authentication and authorization architecture that we use at the...
Abstract: Inter-facility workflows by their nature cross facility boundaries thereby implying attention to how users are authenticated at each facility and how their workflow steps are authorized. A variety of approaches can be used to make these boundary crossings less intensive in terms of human effort. We suggest fine grained authorizations as a means to automation by forming a minimal set...
As the precision, energy, and output of scientific instruments such as particle colliders increase, so does the volume of data generated from science experiments. With this data has increasing rapidly, there is a serious need to keep the data in storage that is reliable and cost effective. Disk storage is efficient – ideal for frequently accessed data – but is often very costly, and is not...
An update on the latest developments towards production use of
BNL Box.
Advances in electronics has resulted in the explosion of scientific instruments that are capable of generating "Big Data". Traditionally, there were few big data experiments in operation at any given instant. These experiments were typically large endeavors with the financial, infrastructure, computing, and personnel resources to manage the data volumes generated by the experiment. With the...