Tuesday, 9 August 2016, 10:20 HKT/SGT

Using "AlexNet," 64 GPUs in parallel achieve 27 times the speed of a single GPU for world's fastest processing

KAWASAKI, Japan, Aug 9, 2016 - (JCN Newswire) - Fujitsu Laboratories Ltd. today announced the development of software technology that uses multiple GPUs to enable high-speed deep learning powered by the application of supercomputer software parallelization technology.

Figure 1: Scheduling technology for data sharing

Figure 2: Differences in processing when the data to be shared is small (top) and large (bottom)

A conventional method to accelerate deep learning is to use multiple computers equipped with GPUs, networked and arranged in parallel. The issue with this method is that the effects of parallelization become progressively harder to obtain as the time required to share data between computers increases when more than 10 computers are used at the same time.

Fujitsu Laboratories has newly developed parallelization technology to efficiently share data between machines, and applied it to Caffe, an open source deep learning framework widely used around the world. To confirm effectiveness with a wide range of deep learning, Fujitsu Laboratories evaluated the technology on AlexNet(1), where it was confirmed to have achieved learning speeds with 16 and 64 GPUs that are 14.7 and 27 times faster, respectively, than a single GPU. These are the world's fastest processing speeds(2), representing an improvement in learning speeds of 46% for 16 GPUs and 71% for 64 GPUs. With this technology, machine learning that would have taken about a month on one computer can now be processed in about a day by running it on 64 GPUs in parallel.

With this technology, research and development periods using deep learning can be shortened, enabling the development of higher-quality learning models. Fujitsu Laboratories aims to commercialize this technology as part of Fujitsu Limited's AI technology, Human Centric AI Zinrai, as it works together with customers to put AI to use.

Details of this technology were announced at SWoPP 2016 (Summer United Workshops on Parallel, Distributed and Cooperative Processing), being held from August 8 to 10 in Matsumoto, Nagano Prefecture, Japan.

Development Background

In recent years, research into an AI method called deep learning has been ongoing, and the results have been rates of image, character and sound recognition that exceed those of humans.

Deep learning is a technology that has greatly improved the accuracy of recognition compared to previous technologies, but in order to achieve this it must repeatedly learn from huge volumes of data. This has meant that GPUs, which are better suited for high-speed operations than CPUs, have been widely used. However, huge amounts of time are required to learn from large volumes of data in deep learning, so deep learning software that operates multiple GPUs in parallel has begun to be developed.

Issues

Because there is an upper limit to the number of GPUs that can be installed in one computer, in order to use multiple GPUs, multiple computers have to be interconnected through a high-speed network, enabling them to share data while doing learning processing. Data sharing in deep learning parallel processing is complex, however, as shared data sizes and computation times vary, and operations are performed in order, simultaneously using the previous operating results. As a result, additional waiting time is required in communication between computers, making it difficult to achieve high-speed results, even if the number of computers is increased.

About the New Technology

By developing and applying two new technologies, Fujitsu Laboratories has achieved speed increases in learning processing. The first is a supercomputer software technology that executes communications and operations simultaneously and in parallel. The second changes processing methods according to the characteristics of the size of shared data and the sequence of deep learning processing. These two technologies limit the increase in waiting time between processing batches even with shared data of a variety of sizes.

1. Scheduling technology for data sharing

This technology automatically controls the priority order for data transmission so that data needed at the start of the next learning process is shared among the computers in advance for multiple continuous operations (Figure 1). With existing technology (Figure 1, left), because the data sharing processing of the first layer, which is necessary to begin the next learning process, is carried out last, the data sharing processing delay is even longer. With this newly developed technology (Figure 1, right), by carrying out the data sharing processing for the first layer during the data sharing processing for the second layer, the wait time until the start of the next learning process can be shortened.

2. Processing technology that optimizes operations for data size

For processing in which operation results are shared with all computers, when the original data volume is small, each computer shares data and then carries out the same operation, eliminating transmission time for the results. When the data volume is large, processing is distributed and processing results are shared with the other computers for use in the following operations. By automatically assigning the optimal operational method based on the amount of data, this technology minimizes the total operation time (Figure 2).

Effects

This newly developed technology was implemented in the Caffe deep learning framework, where, in a test measuring learning time using AlexNet on 64 GPU-equipped computers, it achieved a learning speed that is 27 times faster than a single GPU. Compared with before this technology was applied, it achieved learning speed improvements of 46% for 16 GPUs and 71% for 64 GPUs (according to internal comparisons).

Using this technology, the time required for deep learning R&D can be shortened, such as in the development of unique neural network models for the autonomous control of robots, automobiles, and so forth, or for healthcare and finance, such as with pathology classification or stock price forecasting, enabling the development of higher-quality models.

Future Plans

Fujitsu Laboratories aims to commercialize this newly developed technology as part of Fujitsu's AI technology, Human Centric AI Zinrai, during fiscal 2016. In addition, it is also working to improve the technology, in pursuit of greater learning speed.

(1) AlexNet

A multi-layered neural network for image recognition. It is used here as a general neural network sample because a number of neural networks are included. Having taken top honors in a 2012 competition for image recognition, today it forms the basis of image recognition neural networks.

(2) World's fastest processing

As of August 5, 2016 (according to Fujitsu's own research)

About Fujitsu Laboratories

Founded in 1968 as a wholly owned subsidiary of Fujitsu Limited, Fujitsu Laboratories Ltd. is one of the premier research centers in the world. With a global network of laboratories in Japan, China, the United States and Europe, the organization conducts a wide range of basic and applied research in the areas of Next-generation Services, Computer Servers, Networks, Electronic Devices and Advanced Materials. For more information, please see: http://www.fujitsu.com/jp/group/labs/en/.

Contact:

Fujitsu Limited
Public and Investor Relations
Tel: +81-3-3215-5259
URL: www.fujitsu.com/global/news/contacts/

Fujitsu Laboratories Ltd.
Computer Systems Laboratory
E-mail: ngcs-ai-press@ml.labs.fujitsu.com

Topic: Press release summary
Source: Fujitsu Ltd
Sectors: Enterprise IT
http://www.acnnewswire.com
From the Asia Corporate News Network

Fujitsu Ltd Links

http://www.fujitsu.com

https://plus.google.com/+Fujitsu

https://www.facebook.com/FujitsuJapan

https://twitter.com/Fujitsu_Global

https://www.youtube.com/user/FujitsuOfficial

https://www.linkedin.com/company/fujitsu/

Fujitsu Ltd

Jan 9, 2026 09:07 HKT/SGT

Fujitsu develops digital learning platform for JAL to support self-directed learning and training management

Jan 5, 2026 14:23 HKT/SGT

The Tampere city region chose Fujitsu to deliver and develop the area's ICT services

Dec 26, 2025 13:04 HKT/SGT

Fujitsu Develops Fujitsu Kozuchi Physical AI 1.0 for Seamless Integration of Physical and Agentic AI

Dec 23, 2025 13:58 HKT/SGT

Tohoku University and Fujitsu utilize Causal AI to discover superconductivity mechanism of promising new functional material

Dec 19, 2025 01:42 HKT/SGT

Fujitsu to showcase mobility and physical AI tech at CES 2026

Dec 19, 2025 01:06 HKT/SGT

Kirin and Fujitsu elucidate a novel gut-brain axis mechanism of citicoline for the first time worldwide through AI-based analysis and experimental validation leveraging drug discovery DX technology

Dec 4, 2025 18:08 HKT/SGT

Fujitsu and Scaleway partner to accelerate European sustainable transformation and data sovereignty with FUJITSU-MONAKA CPU-based AI inference

Dec 2, 2025 22:26 HKT/SGT

Fujitsu develops new technology to support human-robot collaboration

Dec 2, 2025 21:55 HKT/SGT

Fujitsu establishes international consortium to tackle disinformation/misinformation and new AI risks

Dec 1, 2025 22:04 HKT/SGT

Fujitsu achieves high-precision, long-duration molecular dynamics simulation for all-solid-state battery interphases with over 100,000 atoms

More news >>

News Alerts


Home \| About us \| Services \| Partners \| Events \| Login \| Contact us \| Privacy Policy \| Terms of Use \| RSS

US: +1 214 890 4418 \| China: +86 181 2376 3721 \| Hong Kong: +852 8192 4922 \| Singapore: +65 6549 7068 \| Tokyo: +81 3 6859 8575

Connect With us: