Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
V1
Contents
Introduction to Scan 3
The Scan AI Ecosystem 4
Education and Guided Proof-of-Concept 5
The NVIDIA Deep Learning Institute 5
Guided Proof-of-Concept 6
Deep Learning Hardware 7
Intel AI Solutions 7
NVIDIA DGX-1 – Innovate Faster 8
NVIDIA DGX-2 – The Fastest GPU ever 10
DGX Station Desktop 12
DGX Software and Frameworks 14
IBM POWER9 Solutions 16
IBM POWER9 Platform 17
Optimised Storage 18
NetApp EF570 All-Flash Array 18
NetApp AFF A-series All-Flash Array 19
Pure Storage FlashBlade 20
Introducing AIRI – Artificial Intelligence Ready Infrastructure 22
scan.co.uk/business Contents 1
Contents
scan.co.uk/business Contents 2
Introduction to Scan
Although a high performance hardware-accelerated server is at the centre of deep learning & AI system performance, the efficiency and time
to results is affected by several other factors too.several other factors too. The output from a GPU or FPGA may be limited by the storage array
feeding the dataset, and the connectivity between the server and the storage appliance. Even if you have optimised hardware, deep learning
training results need to be correctly visualised, interpreted and understood – further acceleration of this phase can be delivered by deploying
software in this training phase. Lastly, the key part to building the most appropriate hardware / software solution relies on in-depth knowledge
of the AI environment and understanding what to deploy and when – skilled data scientists and Linux engineers complete the resource required
to deliver fast and insightful deep learning. This brochure introduces each of these areas in greater depth to provide an understanding of how the
various parts of our AI ecosystem make up the whole solution.
The knowledge gained on these courses will help drive interaction during a proof-of-concept trial and help inform decisions required in the set-up
of your own optimised deep learning or AI ecosystem.
Guided Proof-of-Concept
Deep Learning solutions such as the DGX range of deep learning appliances unlock new possibilities thanks to its unparalleled processing density.
We want you to be sure these solutions are right for you, so we provide the ability to try your own data on one of our deep learning servers as a
Proof of Concept. Hosted in a secure datacentre, we will provide you with remote access to the DGX system of your choice so you can evaluate
and benchmark it.
This service is fully supported by our team of data scientists and Linux engineers to ensure you get the best out of your trial including access to
our software ecosystem embedded on our cloud platform.
Every deep learning and AI project is different, and here at Scan we recognise that one size of hardware doesn’t fit all. Depending on the stage
of development you’re at, the applications being used and the time frames results are required in. With this in mind we offer solutions to address
every requirement.
5. Water Cooling
4. System Memory
512 GB DDR4 LRDIMM
5. CPUs
2X 20-Core Intel® Xeon® E5-2698 v4
2.2 GHz
6. Streaming Cache
The second generation of the DGX-1 is powered by eight 4X 1.92 TB SSDs RAID 0
NVIDIA Tesla V100 GPU accelerators which are based
on the new Volta architecture. These cutting-edge GPUs 7. Power
combine both CUDA cores (5120) and the latest Tensor 4X 1600 W PSUs
Cores (640) plus 32GB of RAM and are specifically [3500 W TOP)
designed for deep learning delivering a massive 5x speed
up compared to the first-generation Pascal-based DGX-1. 8. Cooling
Efficient Front-to-Back Airflow
Deep Learning frameworks are systems for the training and deployment of Deep Learning networks which provide the flexibility to design and
adapt the network to your specific task, frameworks for Deep Learning allow you to hit the ground running, to prototype and test ideas and
applications without the considerable overhead of writing significant amounts of your own code. All the major frameworks use cuDNN so you can
rely on optimised code, and each one has a community of developers and users who can help you not only get the most from the framework you
chose but also guide you in modifying the framework to provide new features you may require for your application.
Software
The DGX family is much more than a range of GPU appliances - they
are deep learning solutions comprising of a finely tuned combination of
hardware and software. Running a GPU-optimised version of Ubuntu
Server Linux, the software stack comprises drivers, the NVDocker
container tool, deep learning SDK, NVIDIA Cloud Management Service
plus NVIDIA DIGITS which is used to run deep learning frameworks such
as Caffe, Torch, TensorFlow and many more.
The deep learning frameworks provided with the system are especially
optimised to take advantage of the NVlink communication links among
other enhancements, in order to optimize multi-GPU communication in
the system.
Framework From
Berkley Caffe provides an easy and accessible way to define and train deep neural networks via a high-level scripting language (Google’s Protocol Buffer) describing the network.
Vision and
Learning Extending your application beyond the pre-defined Caffe modules is fairly straight forward but will require programming your own Caffe modules. Caffe is a powerful
Centre command line driven framework.
DIGITS isn’t really a framework in its own right but rather provides a powerful graphical front end to both Caffe and Torch, simplifying interaction, setup, and providing
useful visualisations of what’s going on in your deep neural network.
NVIDIA
DIGITS also helps optimise the use of multiple NVIDIA GPUs.
In contrast to Caffe’s high-level script, Torch throws you in at a deeper level requiring you to program your Deep Learning model while providing a powerful and useful
level of description.
Facebook
This can make Torch appear more flexible than Caffe as it is clearer how the entire learning process operates, as you have had to define each step. Torch is also a
command line and programming driven framework.
TensorFlow was developed as part of the Google Brain project as a framework to work with multi-dimensional arrays (tensors) utilising data flow graphs to solve machine
learning and deep learning problems.
Google
TensorFlow aims to provide a more professional tool for developing and managing Deep Learning, it also provides its own graphical interface simplifying interaction,
setup, and providing useful visualisations of what’s going on in your deep neural network.
CNTK (Cognitive toolkit) according to Microsoft is “a unified deep-learning toolkit that describes neural networks as a series of computational steps via
a directed graph”. It is another alternative for the other deep learning frameworks mentioned. However a key distinguishing factor is that the framework supports
Microsoft
parallelisation across both multiple machines and multiple GPUs (without regard to where the GPUs are located). The goal of this framework is to provide efficiency,
performance (fast training and productisation) and flexibility (application to speech, vision, text).
Accelerated Computing
5x Higher Energy Efficiency
AC922
System Packaging 2U
Processor Socket2 S
# of cores Up to 44 cores
Memory-Max 1TB
HDD/SDDT wo SFF 2.5” SATA Drives, Max 4TB HDD, Max 7.68TB SSD
Through extensive testing and benchmarking within our own proof-of concept environment we have optimised a number of flash-based data
management solutions that work alongside our AI server platforms to provide the required datasets at a fast enough pace to deliver training results
that address every budget and performance requirement.
15 Blades 267 TBs Usable 525 TBs Usable 1607 TBs Usable
You have the high performance server and optimised all-flash storage array – but how to be sure you create an infrastructure that doesn’t
introduce any bottlenecks when connecting the components together. Once again our experienced team has tested various configurations so
we’re able to advise the best options for your needs.
Switch-IB, the seventh generation of high performance switching from Mellanox, is renowned for industry leading bandwidth, low-latency,
and scalability. The devices supports up to 36 EDR 100Gb/s InfiniBand ports, and all ports comply with the InfiniBand specification for auto-
negotiation SDR to EDR.
Every deep learning and AI project is different, and here at Scan we recognise that one size of hardware doesn’t fit all. Depending on the stage
of development you’re at, the applications being used and the time frames results are required in. With this in mind we offer solutions to address
each phase of AI development.
It aims to achieve the highest predictive accuracy, comparable to expert data scientists, but in much shorter time thanks to end-to-end
automation. Driverless AI also offers automatic visualisations and machine learning interpretability. Especially in regulated industries, model
transparency and explanation are just as important as predictive performance.
The community version of Driverless AI is a fully functional version with a 30-day trial on DGX systems and supported NGC platforms. During
your proof-of-concept trial you can achieve up to 40x speedups on GPU-accelerated algorithms vs. on CPUs. Driverless AI provides speed,
accuracy and interpretability. Driverless AI empowers you to do automatic feature engineering and interpret debugging with reason codes in plain
English.
MapD is a GPU-accelerated platform with an open-source SQL engine called MapD Core and an integrated visualisation system called MapD
Immerse. Open source MapD Core is now containerised on DGX systems and in the NGC. Customers registering for a Scan proof-of-concept trial
on the DGX platforms can use the software as part of their experience, to understand how MapD could accelerate their insights. Data scientists
using MapD experience unparalleled analytic speed, constant innovation from the open source community, and interactive visual exploration of
the data used to build machine learning models.
MapD Immerse
Text Recognition
Archiving documents and recorded calls is one thing, but the real power comes in being to effectively and
efficiently search those archives and extract the key information required. ArgusSearch software from
Planet AI allows ‘search-engine-like’ display of information found in historical and complex handwritten
or hand-printed documents. Unlike conventional ICR that provides either a transcribed word or a ‘not read’
result, ArgusSearch provides ‘quasi’ matches for a particular search term based on the equivalence of the
search term and a particular instance of the term within a document.
Speech Recognition
Understanding speech content and making it searchable rather just having a huge depository of digital files
is key. ArgusSpeech software from Planet AI can detect phrases, written or spoken in millions of stored
audio recordings and video recordings returning results where the phrase is spoken in the recordings. No
more manual listening tohuge numbers of voice recordings to find specific keywords and then only listen to
those recordings where the keywords exists. Alternatively let the software do the hard work of continuously
trawling through recordings to search a big list of keywords.
Clusterone can speed up training times using clustered hardware by using Data Parallelism – where each machine trains on a separate set of
training data, leading to significant increases in training speed. The most common approach is to use one machine to store the model parameters.
This parameter server sends parameters of the model to multiple worker machines, who each run the training procedure on a small batch of data
and then return the updates to the model parameters back to the parameter server.
We’ve worked hard to understand what delivery of a complete AI project requires – and often the key element is the expertise to get all
component parts working seamlessly together. System power is nothing without fast data ingress; a short time to results has no bearing if a lack
of visualisation offers no insight – our team of data scientists, engineers and architects are there to help smooth all these technology interfaces to
guarantee the best experience – whatever the scale
In addition to the infrastructure we have a full team of expert consultants including data scientists, Linux engineers and hardware architects to
supported your AI development whether it be a proof-of-concept trial, project planning or full deployment. These specialist teams are available in
half-day increments to hand-hold your deep learning projects, or simply to compliment your in-house experts as and when you require.
Machine learning is at the core of a new wave of artificial intelligence applications limited only by our imagination. New algorithmic approaches,
recent jumps in processing power and large training data sets generated by internet users mean that, for the first time, machines can learn to solve
useful problems without explicit programming.
Like Scan’s own proof-of-concept platform, their platform runs on high-performance computing based around NVIDIA’s DGX-1 Deep Learning
Supercomputer and other GPU- and FPGA-accelerated servers, providing petaflop-scale compute on-site. This links to petabyte-scale local storage,
project-specific clouds and our continuous integration systems. When the organically grown deep learning is ready, Cambridge Consultants can
export models easily into customers’ own compute facilities or the cloud.